Writing MySQL User-Defined Functions
Welcome! ● @rolandbouman ● roland.bouman@gmail.com ● http://rpbouman.blogspot.com/ ● Ex-MySQL AB, Ex-Sun Microsystems ● Currently at http://www.pentaho.com/
Different kinds of MySQL Functions ● Built-in functions (and operators) – Native code part of the server binaries ● SQL Stored functions – SQL/PSM code stored in the database CREATE FUNCTION lightspeed() RETURNS INT UNSIGNED BEGIN RETURN 299792458; END; ● User-defined functions (UDF's) – Native code stored in an external binary library CREATE FUNCTION lightspeed RETURNS INTEGER SONAME 'lightspeed.so';
MySQL Function Kinds Comparison Built-in UDF Stored routine language C/C++ C/C++* SQL/PSM execution natively compiled natively compiled interpreted performance very good good not that good user-defined no yes yes installation NA CREATE stmt CREATE stmt deployment NA binary library schema object namespace global global schema grant-able no no yes SQL capabilities no no* yes Data types internal internal and C/C++ SQL data types Charset support yes* no yes argumentlist flexible* flexible fixed number & type aggregates yes* yes no
Quick and dirty example (1/2) ● Write C/C++ code: char lightspeed_init(){ return 0; } long long lightspeed(){ return 299792458; } ● Compile to a shared library: shell$ gcc -shared -o lightspeed.so lightspeed.c ● Move library to plugin dir: mysql> SHOW VARIABLES LIKE 'plugin_dir'; +---------------+------------------------------+ | Variable_name | Value | +---------------+------------------------------+ | plugin_dir | /home/mysql/mysql/lib/plugin | +---------------+------------------------------+ shell$ mv lightspeed.so /home/mysql/mysql/lib/plugin
Quick and dirty example (2/2) ● Register the function in MySQL mysql> CREATE FUNCTION lightspeed -> RETURNS INTEGER SONAME 'lightspeed.so'; ● Verify: mysql> SELECT * FROM mysql.func; +------------+-----+---------------+----------+ | name | ret | dl | type | +------------+-----+---------------+----------+ | lightspeed | 2 | lightspeed.so | function | +------------+-----+---------------+----------+ ● Run: mysql> SELECT lightspeed(); +--------- ----+ | lightspeed() | +--------------+ | 299792458 | +--------------+
UDF implementation functions ● General UDF Implementation functions – Init function – Row-level function – De-init function xxx_init() row? xxx() xxx_deinit()
UDF implementation functions ● Aggregate UDF Implementation functions – clear function – add function xxx_init() xxx() start group? xxx_clear() xxx_add() end group? xxx_deinit()
The init function ● Signature: my_bool xxx_init( UDF_INIT *initid, UDF_ARGS *args, char *message ); ● Set default properties – of the return value – of the arguments ● Validate arguments ● Allocate resources ● Signal an error – Return 1 to signal an error – Copy error message to *message
The de-init function ● Signature: void xxx_deinit(UDF_INIT *initid); ● De-allocate any resources
The row-level function ● Returns the value to the caller – INTEGER type (SQL: bigint): long long xxx( – UDF_INIT *initid, – UDF_ARGS *args, my_bool *is_null, – my_bool *error ); – – REAL type (SQL: double): double xxx( UDF_INIT *initid, UDF_ARGS *args, my_bool *is_null, my_bool *error );
The row-level function ● Returns the value to the caller – STRING type (SQL: varchar, text): char *xxx( – UDF_INIT *initid, – UDF_ARGS *args, char* result, – unsigned long length, my_bool *is_null, – my_bool *error ); – – DECIMAL type (SQL: decimal): ● Just like STRING type
Type systems SQL data type RETURNS Item_result C type Character strings (CHAR, VARCHAR) STRING STRING_RESULT char* Text (TEXT, LONGTEXT, MEDIUMTEXT, TINYTEXT) Binary strings (BINARY, VARBINARY) Blob (BLOB, LONGBLOB, MEDIUMBLOB, TINYBLOB) Temporal (DATE, TIME, DATETIME, TIMESTAMP) Structured (ENUM, SET) Floating point numbers (FLOAT, DOUBLE) REAL REAL_RESULT double Ints (BIGINT, MEDIUMINT, SMALLINT, TINYINT) INTEGER INT_RESULT long long NA NA ROW_RESULT NA Fixed point numbers (DECIMAL) DECIMAL DECIMAL_RESULT char *
Data Structures ● #include "mysql.h" – include dir beneath MySQL home dir – Actually defined in mysql_com.h ● UDF_INIT *initid ● UDF_ARGS *args
Data Structures: UDF_INIT ● Passed to all UDF implementation functions typedef struct st_udf_init { my_bool maybe_null; /* 1 if function can return NULL */ unsigned int decimals; /* for real functions */ unsigned long max_length; /* For string functions */ char *ptr; /* free pointer for function data */ my_bool const_item; /* 1 if always returns the same value*/ void *extension; } UDF_INIT; ● Conveys mainly info about return value ● char *ptr explicitly meant to hold state: – Resource allocation ● Can always be read ● Can be written in xxx_init()
Writing to UDF_INIT (1/3) ● 1st attempt at writing a REAL function: char planck_init(){ return 0; } double planck(){ return 6.62606957e-34; } mysql> CREATE FUNCTION planck -> RETURNS REAL SONAME 'planck.so'; mysql> SELECT planck(); +----------+ | planck() | +----------+ | 0. | +----------+ ● But is planck() really 0?
Writing to UDF_INIT (2/3) ● 2nd attempt at writing a REAL function: #include "mysql.h" my_bool planck_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ initid->decimals = NOT_FIXED_DEC; return 0; } double planck(){return 6.62606957e-34;} mysql> SELECT planck(); +----------------+ | planck() | +----------------+ | 6.62606957e-34 | +----------------+
Writing to UDF_INIT (3/3) ● More return value properties: #include "mysql.h" my_bool planck_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ initid->decimals = NOT_FIXED_DEC; initid->maybe_null = 0; initid->const_item = 1; return 0; } double planck(){return 6.62606957e-34;}
Data Structures: UDF_ARGS ● Passed to init and row-level function typedef struct st_udf_args { unsigned int arg_count; /* Number of arguments */ enum Item_result *arg_type; /* Pointer to item_results */ char **args; /* Pointer to value */ unsigned long *lengths; /* Length of string arguments */ char *maybe_null; /* 1 for all maybe_null args */ char **attributes; /* Pointer to attribute name */ unsigned long *attribute_lengths;/* Length of attribute */ void *extension; } UDF_ARGS; ● Conveys information about function arguments ● Can always be read ● Can be written in xxx_init()
Using UDF_ARGS (1/2) ● A hello world UDF (init function): #include "string.h" #include "stdio.h" #include "mysql.h" my_bool hello_world_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ if (args->arg_count != 1) { memcpy(msg, "Missing message argument.", 26); return 1; } if (args->arg_type[0] != STRING_RESULT) { args->arg_type = STRING_RESULT; } return 0; }
Using UDF_ARGS (2/2) ● A hello world UDF (row-level function): char *hello_world( UDF_INIT *initid, UDF_ARGS *args, char *result, unsigned long *length, my_bool *is_null, my_bool *error ){ char *arg_val = args->args[0]; char *arg_name = args->attributes[0]; char first = arg_name[0]; my_bool is_lit = first == ''' || first == '"'; my_bool is_same = !strcmp(arg_value, arg_name); char *greeting = is_literal||is_same?"Hello":arg_name; *length = args->lengths[0] + strlen(greeting) + 3; if (*length > 255) { *is_null = 1; return NULL; } sprintf(result, "%s %s!", greeting, arg_value); return result; }
The UDF Repository ● http://www.mysqludf.org/ ● Some highlights – lib_mysqludf_myxml – lib_mysqludf_sys – lib_mysqludf_stem – lib_mysqludf_preg ● For UDF authors: – http://www.mysqludf.org/lib_mysqludf_udf/
My Latest Baby ● https://github.com/rpbouman/mysqlv8udfs mysql> SELECT js(' var y = parseInt(arguments[0].substr(0,4), 10), a = y % 19, b = Math.floor(y / 100), c = y % 100, d = Math.floor(b / 4), e = b % 4, f = Math.floor((b + 8) / 25), g = Math.floor((b - f + 1) / 3), h = (19 * a + b - d - g + 15) % 30, i = Math.floor(c / 4), k = c % 4, L = (32 + 2 * e + 2 * i - h - k) % 7, m = Math.floor((a + 11 * h + 22 * L) / 451), n = h + L - 7 * m + 114; y + "-" + (Math.floor(n/31)) + "-" + ((n%31)+1);'), NOW());
Recommended Reading ● http://rpbouman.blogspot.nl/search?q=UDF

Writing MySQL UDFs

  • 1.
  • 2.
    Welcome! ● @rolandbouman ● roland.bouman@gmail.com ● http://rpbouman.blogspot.com/ ● Ex-MySQL AB, Ex-Sun Microsystems ● Currently at http://www.pentaho.com/
  • 3.
    Different kinds ofMySQL Functions ● Built-in functions (and operators) – Native code part of the server binaries ● SQL Stored functions – SQL/PSM code stored in the database CREATE FUNCTION lightspeed() RETURNS INT UNSIGNED BEGIN RETURN 299792458; END; ● User-defined functions (UDF's) – Native code stored in an external binary library CREATE FUNCTION lightspeed RETURNS INTEGER SONAME 'lightspeed.so';
  • 4.
    MySQL Function KindsComparison Built-in UDF Stored routine language C/C++ C/C++* SQL/PSM execution natively compiled natively compiled interpreted performance very good good not that good user-defined no yes yes installation NA CREATE stmt CREATE stmt deployment NA binary library schema object namespace global global schema grant-able no no yes SQL capabilities no no* yes Data types internal internal and C/C++ SQL data types Charset support yes* no yes argumentlist flexible* flexible fixed number & type aggregates yes* yes no
  • 5.
    Quick and dirtyexample (1/2) ● Write C/C++ code: char lightspeed_init(){ return 0; } long long lightspeed(){ return 299792458; } ● Compile to a shared library: shell$ gcc -shared -o lightspeed.so lightspeed.c ● Move library to plugin dir: mysql> SHOW VARIABLES LIKE 'plugin_dir'; +---------------+------------------------------+ | Variable_name | Value | +---------------+------------------------------+ | plugin_dir | /home/mysql/mysql/lib/plugin | +---------------+------------------------------+ shell$ mv lightspeed.so /home/mysql/mysql/lib/plugin
  • 6.
    Quick and dirtyexample (2/2) ● Register the function in MySQL mysql> CREATE FUNCTION lightspeed -> RETURNS INTEGER SONAME 'lightspeed.so'; ● Verify: mysql> SELECT * FROM mysql.func; +------------+-----+---------------+----------+ | name | ret | dl | type | +------------+-----+---------------+----------+ | lightspeed | 2 | lightspeed.so | function | +------------+-----+---------------+----------+ ● Run: mysql> SELECT lightspeed(); +--------- ----+ | lightspeed() | +--------------+ | 299792458 | +--------------+
  • 7.
    UDF implementation functions ● General UDF Implementation functions – Init function – Row-level function – De-init function xxx_init() row? xxx() xxx_deinit()
  • 8.
    UDF implementation functions ● Aggregate UDF Implementation functions – clear function – add function xxx_init() xxx() start group? xxx_clear() xxx_add() end group? xxx_deinit()
  • 9.
    The init function ● Signature: my_bool xxx_init( UDF_INIT *initid, UDF_ARGS *args, char *message ); ● Set default properties – of the return value – of the arguments ● Validate arguments ● Allocate resources ● Signal an error – Return 1 to signal an error – Copy error message to *message
  • 10.
    The de-init function ● Signature: void xxx_deinit(UDF_INIT *initid); ● De-allocate any resources
  • 11.
    The row-level function ● Returns the value to the caller – INTEGER type (SQL: bigint): long long xxx( – UDF_INIT *initid, – UDF_ARGS *args, my_bool *is_null, – my_bool *error ); – – REAL type (SQL: double): double xxx( UDF_INIT *initid, UDF_ARGS *args, my_bool *is_null, my_bool *error );
  • 12.
    The row-level function ● Returns the value to the caller – STRING type (SQL: varchar, text): char *xxx( – UDF_INIT *initid, – UDF_ARGS *args, char* result, – unsigned long length, my_bool *is_null, – my_bool *error ); – – DECIMAL type (SQL: decimal): ● Just like STRING type
  • 13.
    Type systems SQL datatype RETURNS Item_result C type Character strings (CHAR, VARCHAR) STRING STRING_RESULT char* Text (TEXT, LONGTEXT, MEDIUMTEXT, TINYTEXT) Binary strings (BINARY, VARBINARY) Blob (BLOB, LONGBLOB, MEDIUMBLOB, TINYBLOB) Temporal (DATE, TIME, DATETIME, TIMESTAMP) Structured (ENUM, SET) Floating point numbers (FLOAT, DOUBLE) REAL REAL_RESULT double Ints (BIGINT, MEDIUMINT, SMALLINT, TINYINT) INTEGER INT_RESULT long long NA NA ROW_RESULT NA Fixed point numbers (DECIMAL) DECIMAL DECIMAL_RESULT char *
  • 14.
    Data Structures ● #include "mysql.h" – include dir beneath MySQL home dir – Actually defined in mysql_com.h ● UDF_INIT *initid ● UDF_ARGS *args
  • 15.
    Data Structures: UDF_INIT ● Passed to all UDF implementation functions typedef struct st_udf_init { my_bool maybe_null; /* 1 if function can return NULL */ unsigned int decimals; /* for real functions */ unsigned long max_length; /* For string functions */ char *ptr; /* free pointer for function data */ my_bool const_item; /* 1 if always returns the same value*/ void *extension; } UDF_INIT; ● Conveys mainly info about return value ● char *ptr explicitly meant to hold state: – Resource allocation ● Can always be read ● Can be written in xxx_init()
  • 16.
    Writing to UDF_INIT(1/3) ● 1st attempt at writing a REAL function: char planck_init(){ return 0; } double planck(){ return 6.62606957e-34; } mysql> CREATE FUNCTION planck -> RETURNS REAL SONAME 'planck.so'; mysql> SELECT planck(); +----------+ | planck() | +----------+ | 0. | +----------+ ● But is planck() really 0?
  • 17.
    Writing to UDF_INIT(2/3) ● 2nd attempt at writing a REAL function: #include "mysql.h" my_bool planck_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ initid->decimals = NOT_FIXED_DEC; return 0; } double planck(){return 6.62606957e-34;} mysql> SELECT planck(); +----------------+ | planck() | +----------------+ | 6.62606957e-34 | +----------------+
  • 18.
    Writing to UDF_INIT(3/3) ● More return value properties: #include "mysql.h" my_bool planck_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ initid->decimals = NOT_FIXED_DEC; initid->maybe_null = 0; initid->const_item = 1; return 0; } double planck(){return 6.62606957e-34;}
  • 19.
    Data Structures: UDF_ARGS ● Passed to init and row-level function typedef struct st_udf_args { unsigned int arg_count; /* Number of arguments */ enum Item_result *arg_type; /* Pointer to item_results */ char **args; /* Pointer to value */ unsigned long *lengths; /* Length of string arguments */ char *maybe_null; /* 1 for all maybe_null args */ char **attributes; /* Pointer to attribute name */ unsigned long *attribute_lengths;/* Length of attribute */ void *extension; } UDF_ARGS; ● Conveys information about function arguments ● Can always be read ● Can be written in xxx_init()
  • 20.
    Using UDF_ARGS (1/2) ● A hello world UDF (init function): #include "string.h" #include "stdio.h" #include "mysql.h" my_bool hello_world_init( UDF_INIT *initid, UDF_ARGS *args, char *msg ){ if (args->arg_count != 1) { memcpy(msg, "Missing message argument.", 26); return 1; } if (args->arg_type[0] != STRING_RESULT) { args->arg_type = STRING_RESULT; } return 0; }
  • 21.
    Using UDF_ARGS (2/2) ● A hello world UDF (row-level function): char *hello_world( UDF_INIT *initid, UDF_ARGS *args, char *result, unsigned long *length, my_bool *is_null, my_bool *error ){ char *arg_val = args->args[0]; char *arg_name = args->attributes[0]; char first = arg_name[0]; my_bool is_lit = first == ''' || first == '"'; my_bool is_same = !strcmp(arg_value, arg_name); char *greeting = is_literal||is_same?"Hello":arg_name; *length = args->lengths[0] + strlen(greeting) + 3; if (*length > 255) { *is_null = 1; return NULL; } sprintf(result, "%s %s!", greeting, arg_value); return result; }
  • 22.
    The UDF Repository ● http://www.mysqludf.org/ ● Some highlights – lib_mysqludf_myxml – lib_mysqludf_sys – lib_mysqludf_stem – lib_mysqludf_preg ● For UDF authors: – http://www.mysqludf.org/lib_mysqludf_udf/
  • 23.
    My Latest Baby ● https://github.com/rpbouman/mysqlv8udfs mysql> SELECT js(' var y = parseInt(arguments[0].substr(0,4), 10), a = y % 19, b = Math.floor(y / 100), c = y % 100, d = Math.floor(b / 4), e = b % 4, f = Math.floor((b + 8) / 25), g = Math.floor((b - f + 1) / 3), h = (19 * a + b - d - g + 15) % 30, i = Math.floor(c / 4), k = c % 4, L = (32 + 2 * e + 2 * i - h - k) % 7, m = Math.floor((a + 11 * h + 22 * L) / 451), n = h + L - 7 * m + 114; y + "-" + (Math.floor(n/31)) + "-" + ((n%31)+1);'), NOW());
  • 24.
    Recommended Reading ● http://rpbouman.blogspot.nl/search?q=UDF