Extending MARIADB with user-defined functions Andrew Hutchings & Sylvain Arbaudie MariaDB Corporation
About Andrew (LinuxJedi) ● Andrew Hutchings, aka “LinuxJedi” ● Lead Software Engineer for MariaDB’s ColumnStore ● Previous worked for: ○ NGINX - Senior Developer Advocate / Technical Product Manager ○ HP - Principal Software Engineer (HP Cloud / ATG) ○ SkySQL - Senior Sustaining Engineer ○ Rackspace - Senior Software Engineer ○ Sun/Oracle - MySQL Senior Support Engineer ● Co-author of MySQL 5.1 Plugin Development ● IRC/Twitter: LinuxJedi ● EMail: linuxjedi@mariadb.com
About Sylvain ● Sylvain Arbaudie ● Principal consultant, senior trainer for MariaDB EMEA ● Previous worked for: ○ Airial conseil - Oracle Application performance consultant ○ Karavel - MySQL/MariaDB/Oracle DBA ● EMail: sylvain.arbaudie@mariadb.com
What is a UDF? ● A plugin written in C ● Provides a new function call for SQL queries, for example: SELECT my_function(b) FROM t1 WHERE a = 1000; ● Come in regular function or aggregate function form ● Very simple API
History of UDFs ● Appeared in MySQL 3.21.24 ○ Roughly 21 years ago ○ Older that InnoDB or even transactions in MySQL ● External contribution by Alexis Mikhailov ● Aggregate functions came soon after (by version 3.23) ● A precursor to MySQL and MariaDB plugin APIs ● Not much has changed since
Pros and Cons ● Very easy to develop with a little C knowledge ● Very rapid execution time Pro ● If a UDF crashes it takes the whole MariaDB server out with it ● Usually needs re-compiling with every MariaDB point release Con
Installing & Using UDFs
Installing a UDF CREATE [OR REPLACE] [AGGREGATE] FUNCTION [IF NOT EXISTS] function_name RETURNS {STRING|INTEGER|REAL|DECIMAL} SONAME shared_library_name
Viewing Installed Plugins MariaDB [test]> select * from mysql.func; +-----------------------------+-----+------------------+-----------+ | name | ret | dl | type | +-----------------------------+-----+------------------+-----------+ | calgetstats | 0 | libcalmysql.so | function | | calsettrace | 2 | libcalmysql.so | function | | calsetparms | 0 | libcalmysql.so | function | | calflushcache | 2 | libcalmysql.so | function | | calgettrace | 0 | libcalmysql.so | function | | calgetversion | 0 | libcalmysql.so | function | | calonlinealter | 2 | libcalmysql.so | function | | calviewtablelock | 0 | libcalmysql.so | function | | calcleartablelock | 0 | libcalmysql.so | function | | caldisablepartitions | 0 | libcalmysql.so | function | ...
Calling Functions MariaDB [test]> select my_example(my_ints) from my_tableG +---------------------+ | my_example(my_ints) | +---------------------+ | 99 | | 27 | +---------------------+ 2 rows in set (0.00 sec)
Calling Functions MariaDB [test]> select my_aggregate_example(my_ints) from my_tableG +-------------------------------+ | my_aggregate_example(my_ints) | +-------------------------------+ | 126 | +-------------------------------+ 1 row in set (0.00 sec)
Defining a UDF
API Calls ● name_init() - initialize at start of query ● name_deinit() - clean up at end of query ● name() - called on every row (or every group for aggregate) ● name_add() - called on every row of a group (Aggregate) ● name_clear() - called before the first row of a group (Aggregate) ● name_remove() - removes a row from a group (Aggregate Window Functions, 10.4 onwards)
Execution Flow Chart Start name_init() More rows ? name() name_deinit() End Yes No Normal Start name_init() More groups ? name_clear() name_deinit() End Yes No Aggregate More rows ? name_add() name() Yes No
Init Call my_bool name_init(UDF_INIT *initid, UDF_ARGS *args, char *message) ● initid - Supplies MariaDB with the return metadata ● args - MariaDB provides the argument metadata ● message - A pointer to add an error message, max MYSQL_ERRMSG_SIZE bytes ● return - 0 for success, 1 for error
Deinit Call void name_deinit(UDF_INIT *initid) ● initid - The metadata defined in the init call (contains an arbitrary pointer which deinit can free)
Function Call char *name(UDF_INIT *initid, UDF_ARGS *args, char *result, unsigned long *length, char *is_null, char *error) long long name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) double name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) ● initid - The init-time metadata object ● args - The input argument ● result - An optional pre-allocated 768 byte buffer to use ● length - The length of the result set ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred ● return - The return value for this row
Add / Remove Call void name_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) void name_remove(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) ● initid - The init-time metadata object ● args - The input argument ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred
Clear Call void name_clear(UDF_INIT *initid, char *is_null, char *error) ● initid - The init-time metadata object ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred
Further Reading ● Contains chapters on how to write UDF plugins as well as other plugins for MySQL and MariaDB. ● Both the authors work for MariaDB and are here this week. Come find us for more information.
Deploying a UDF live
Need Match hotel names in arabic and asian area from different sources like : Riyad Marrakesch, Riad Marrakech and/or Ryad Marakesh
Stored procedure Stored procedure works great but was too slow on MariaDB 10.0.12 : ~1 second/call
Prerequisites GCC MariaDB-devel packages Basic C knowledge
Finding the right tool : UDF Since performances are not good enough, what other tool do we have to extend MariaDB’s functionalities ? Answer : User-Defined Functions Example code on my personal github ad hoc repo : https://github.com/SylvainA77/levenshtein-udf
1st step : Source code adaptation Function headers : my_bool levenshteinratio_init(UDF_INIT *initid, UDF_ARGS *args, char *message); void levenshteinratio_deinit(UDF_INIT *initid); double levenshteinratio(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error);
1st step : Source code adaptation Functions code : my_bool levenshteinratio_init(UDF_INIT *initid, UDF_ARGS *args, char *message) { if ((args->arg_count != 2) || (args->arg_type[0] != STRING_RESULT || args->arg_type[1] != STRING_RESULT)) { strcpy(message, "Function requires 2 arguments, (string, string)"); return 1; }
1st step : Source code adaptation Functions code : //matrix for levenshtein calculations of size n+1 x m+1 (+1 for base values) int *d = (int *) malloc(sizeof(int) * (args->lengths[0] + 1) * (args->lengths[1] + 1)); if (d == NULL) { strcpy(message, "Failed to allocate memory"); return 1; }
1st step : Source code adaptation Functions code : initid->ptr = (char*) d; initid->max_length = LEVENSHTEIN_MAX; initid->maybe_null = 0; //doesn't return null return 0; }
1st step : Source code adaptation Functions code : void levenshteinUDF_deinit(UDF_INIT *initid) { if (initid->ptr != NULL) { free(initid->ptr); } }
2nd step : Compiling gcc -o /lib/levenshtein.so levenshtein.c `mysql_config --cflags` -shared -fPIC Then you have to link the library into MariaDB plugin directory : ln -s /lib/levenshtein.so /usr/lib64/mysql/plugin/ chmod 777 /lib/levenshtein.so
3rd step : Creating the function CREATE FUNCTION levesnhteinratio RETURNS REAL SONAME ‘levenshtein.so’;
THANK YOU!

Extending MariaDB with user-defined functions

  • 1.
    Extending MARIADB with user-defined functions AndrewHutchings & Sylvain Arbaudie MariaDB Corporation
  • 2.
    About Andrew (LinuxJedi) ●Andrew Hutchings, aka “LinuxJedi” ● Lead Software Engineer for MariaDB’s ColumnStore ● Previous worked for: ○ NGINX - Senior Developer Advocate / Technical Product Manager ○ HP - Principal Software Engineer (HP Cloud / ATG) ○ SkySQL - Senior Sustaining Engineer ○ Rackspace - Senior Software Engineer ○ Sun/Oracle - MySQL Senior Support Engineer ● Co-author of MySQL 5.1 Plugin Development ● IRC/Twitter: LinuxJedi ● EMail: linuxjedi@mariadb.com
  • 3.
    About Sylvain ● SylvainArbaudie ● Principal consultant, senior trainer for MariaDB EMEA ● Previous worked for: ○ Airial conseil - Oracle Application performance consultant ○ Karavel - MySQL/MariaDB/Oracle DBA ● EMail: sylvain.arbaudie@mariadb.com
  • 4.
    What is aUDF? ● A plugin written in C ● Provides a new function call for SQL queries, for example: SELECT my_function(b) FROM t1 WHERE a = 1000; ● Come in regular function or aggregate function form ● Very simple API
  • 5.
    History of UDFs ●Appeared in MySQL 3.21.24 ○ Roughly 21 years ago ○ Older that InnoDB or even transactions in MySQL ● External contribution by Alexis Mikhailov ● Aggregate functions came soon after (by version 3.23) ● A precursor to MySQL and MariaDB plugin APIs ● Not much has changed since
  • 6.
    Pros and Cons ●Very easy to develop with a little C knowledge ● Very rapid execution time Pro ● If a UDF crashes it takes the whole MariaDB server out with it ● Usually needs re-compiling with every MariaDB point release Con
  • 7.
  • 8.
    Installing a UDF CREATE[OR REPLACE] [AGGREGATE] FUNCTION [IF NOT EXISTS] function_name RETURNS {STRING|INTEGER|REAL|DECIMAL} SONAME shared_library_name
  • 9.
    Viewing Installed Plugins MariaDB[test]> select * from mysql.func; +-----------------------------+-----+------------------+-----------+ | name | ret | dl | type | +-----------------------------+-----+------------------+-----------+ | calgetstats | 0 | libcalmysql.so | function | | calsettrace | 2 | libcalmysql.so | function | | calsetparms | 0 | libcalmysql.so | function | | calflushcache | 2 | libcalmysql.so | function | | calgettrace | 0 | libcalmysql.so | function | | calgetversion | 0 | libcalmysql.so | function | | calonlinealter | 2 | libcalmysql.so | function | | calviewtablelock | 0 | libcalmysql.so | function | | calcleartablelock | 0 | libcalmysql.so | function | | caldisablepartitions | 0 | libcalmysql.so | function | ...
  • 10.
    Calling Functions MariaDB [test]>select my_example(my_ints) from my_tableG +---------------------+ | my_example(my_ints) | +---------------------+ | 99 | | 27 | +---------------------+ 2 rows in set (0.00 sec)
  • 11.
    Calling Functions MariaDB [test]>select my_aggregate_example(my_ints) from my_tableG +-------------------------------+ | my_aggregate_example(my_ints) | +-------------------------------+ | 126 | +-------------------------------+ 1 row in set (0.00 sec)
  • 12.
  • 13.
    API Calls ● name_init()- initialize at start of query ● name_deinit() - clean up at end of query ● name() - called on every row (or every group for aggregate) ● name_add() - called on every row of a group (Aggregate) ● name_clear() - called before the first row of a group (Aggregate) ● name_remove() - removes a row from a group (Aggregate Window Functions, 10.4 onwards)
  • 14.
  • 15.
    Init Call my_bool name_init(UDF_INIT*initid, UDF_ARGS *args, char *message) ● initid - Supplies MariaDB with the return metadata ● args - MariaDB provides the argument metadata ● message - A pointer to add an error message, max MYSQL_ERRMSG_SIZE bytes ● return - 0 for success, 1 for error
  • 16.
    Deinit Call void name_deinit(UDF_INIT*initid) ● initid - The metadata defined in the init call (contains an arbitrary pointer which deinit can free)
  • 17.
    Function Call char *name(UDF_INIT*initid, UDF_ARGS *args, char *result, unsigned long *length, char *is_null, char *error) long long name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) double name(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) ● initid - The init-time metadata object ● args - The input argument ● result - An optional pre-allocated 768 byte buffer to use ● length - The length of the result set ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred ● return - The return value for this row
  • 18.
    Add / RemoveCall void name_add(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) void name_remove(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error) ● initid - The init-time metadata object ● args - The input argument ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred
  • 19.
    Clear Call void name_clear(UDF_INIT*initid, char *is_null, char *error) ● initid - The init-time metadata object ● is_null - Set to 1 if the result is NULL ● error - Set to 1 if an error occurred
  • 20.
    Further Reading ● Containschapters on how to write UDF plugins as well as other plugins for MySQL and MariaDB. ● Both the authors work for MariaDB and are here this week. Come find us for more information.
  • 21.
  • 22.
    Need Match hotel namesin arabic and asian area from different sources like : Riyad Marrakesch, Riad Marrakech and/or Ryad Marakesh
  • 23.
    Stored procedure Stored procedureworks great but was too slow on MariaDB 10.0.12 : ~1 second/call
  • 24.
  • 25.
    Finding the righttool : UDF Since performances are not good enough, what other tool do we have to extend MariaDB’s functionalities ? Answer : User-Defined Functions Example code on my personal github ad hoc repo : https://github.com/SylvainA77/levenshtein-udf
  • 26.
    1st step :Source code adaptation Function headers : my_bool levenshteinratio_init(UDF_INIT *initid, UDF_ARGS *args, char *message); void levenshteinratio_deinit(UDF_INIT *initid); double levenshteinratio(UDF_INIT *initid, UDF_ARGS *args, char *is_null, char *error);
  • 27.
    1st step :Source code adaptation Functions code : my_bool levenshteinratio_init(UDF_INIT *initid, UDF_ARGS *args, char *message) { if ((args->arg_count != 2) || (args->arg_type[0] != STRING_RESULT || args->arg_type[1] != STRING_RESULT)) { strcpy(message, "Function requires 2 arguments, (string, string)"); return 1; }
  • 28.
    1st step :Source code adaptation Functions code : //matrix for levenshtein calculations of size n+1 x m+1 (+1 for base values) int *d = (int *) malloc(sizeof(int) * (args->lengths[0] + 1) * (args->lengths[1] + 1)); if (d == NULL) { strcpy(message, "Failed to allocate memory"); return 1; }
  • 29.
    1st step :Source code adaptation Functions code : initid->ptr = (char*) d; initid->max_length = LEVENSHTEIN_MAX; initid->maybe_null = 0; //doesn't return null return 0; }
  • 30.
    1st step :Source code adaptation Functions code : void levenshteinUDF_deinit(UDF_INIT *initid) { if (initid->ptr != NULL) { free(initid->ptr); } }
  • 31.
    2nd step :Compiling gcc -o /lib/levenshtein.so levenshtein.c `mysql_config --cflags` -shared -fPIC Then you have to link the library into MariaDB plugin directory : ln -s /lib/levenshtein.so /usr/lib64/mysql/plugin/ chmod 777 /lib/levenshtein.so
  • 32.
    3rd step :Creating the function CREATE FUNCTION levesnhteinratio RETURNS REAL SONAME ‘levenshtein.so’;
  • 33.