MySQL Full-Text Search Tutorial — by royalwzy
Personal Profile More than 5 years experience in database use and management Proficient in synchronizing data between heterogeneous database Skilled in designing and optimizing database Install and configure MySQL cluster and high-availability cluster environment Synchronize and merge data from Oracle to MySQL mysqlexp mysqlclone Oracle 10g/11g OCM MySQL 5.6 Database Administrator MCITP/MCTS DBA(MSSQL2008) RHCE Java Programmer Oracle ACEA, Oracle Young Expert ACOUG/SHOUG/OCMU core member ITPUB Moderator and Community Expert Senior lecturer in "ENMO" and “UPLOOKING" Active in OTN & ITPUB and other forums My blog:royalwzy.com
Table of Contents ❖ MySQL Full-Text Search Introduction ❖ Three Types of Full-Text Searches ❖ MySQL Full-Text Stopwords ❖ Fine-Tuning MySQL Full-Text Search ❖ MySQL Full-Text Restrictions
MySQL Full-Text Search Introduction ❖ Exciting feature ❖ Switch tables to InnoDB
Syntax of Full-Text Searches MATCH (col1,col2,...) AGAINST (expr [search_modifier]) search_modifier: { IN NATURAL LANGUAGE MODE | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION | IN BOOLEAN MODE | WITH QUERY EXPANSION }
Three Types of Full-Text Searches ❖ Natural Language Full-Text Searches ❖ Boolean Full-Text Searches ❖ Full-Text Searches with Query Expansion
Natural Language Full-Text Searches ❖ By default or with the IN NATURAL LANGUAGE MODE modifier ❖ MATCH() returns a relevance value ❖ AGAINST takes a search string and an optional modifier to search for
Example of Natural Language Full-Text Searches
What does Last Slide Show? ❖ By default, performed in case- insensitive fashion(latin1->latin1_bin) ❖ The rows returned are automatically sorted with the highest relevance first ❖ Relevance is computed based on ❖ The number of words in the row ❖ The number of unique words in that row ❖ The total number of words in the collection ❖ The number of documents (rows) that contain a particular word
Boolean Full-Text Searches ❖ What if you want to retrieve records that include word "YourSQL" and exclude word "MySQL"? SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+YourSQL -MySQL' IN BOOLEAN MODE);
Operators in Boolean Full-Text Search ❖ (no operator) ❖ + ❖ - ❖ ~ ❖ > < ❖ ( ) ❖ * ❖ “ ❖ @distance
Examples Using Boolean Full-Text Operators ❖ 'apple banana' ❖ '+apple +juice' ❖ '+apple macintosh' ❖ '+apple -macintosh' ❖ '+apple ~macintosh' ❖ '+apple +(>plus <minus)' ❖ 'apple*' ❖ '"some words"' ❖ '"word1 word2 word3" @8'
Quizzes ❖ What is the relevancy ranking of the following three search strings in row "The Macintosh is a series of personal computers (PCs) designed, developed, and marketed by Apple Inc." ❖ Which search string will return more results? A:'+apple macintosh’ B:'+apple -macintosh’ C:'+apple ~macintosh’ A:'+word +the*’ B:'+word +the'
How Relevancy Ranking is Calculated ❖ InnoDB full-text search's algorithms are based on BM25 and TF-IDF ranking algorithms ❖ TF-IDF(Term Frequency-Inverse Document Frequency) ❖ May differ from MyISAM relevancy rankings
How Relevancy Ranking is Calculated ❖ ${TF} ❖ ${IDF} = log10( ${total_records} / $ {matching_records} ) ❖ total_records is the number of records in the collection ❖ matching_records is the number of records that the search term appears in ❖ ${Ranking} = ${TF} * ${IDF} * ${IDF}
Example of Relevancy Ranking for a Single Word Search
Example of Relevancy Ranking for a Single Word Search ❖ total_records:8 ❖ matching_records:3 ❖ ${IDF}=log10(8/3) ❖ ${TF}=6 ❖ ${Ranking}=6*log10(8/3)*log10(8/3)
Example of Relevancy Ranking for a Multiple Word Search ❖ The relevancy ranking value is a sum of the relevancy ranking value for each word ❖ ${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}
Example of Relevancy Ranking for a Multiple Word Search ❖ total_records:8 ❖ matching_records:6 for "mysql" and 2 for "tutorial" ❖ mysql.${IDF}=log10(8/6),tutorial.${IDF} =log10(8/2) ❖ mysql.${TF}=1,tutorial.${TF}=2 ❖ ${Ranking}=(1*log10(8/6)*log10(8/6)) +(2*log10(8/2)*log10(8/2))
Full-Text Searches with Query Expansion ❖ Full-text search supports query expansion ❖ It's generally useful when a search phrase is too short ❖ Relying on implied knowledge that the full- text search engine lacks ❖ Enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION ❖ For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS”
What's is Implied Knowledge?
Example of Query Expansion
MySQL Full-Text Stopwords ❖ Stopwords for InnoDB Search Indexes ❖ INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table ❖ innodb_ft_server_stopword_table/innodb_ft_user_stopword_table ❖ Stopwords for MyISAM Search Indexes ❖ storage/myisam/ft_static.c ❖ ft_stopword_file
Fine-Tuning MySQL Full-Text Search ❖ Configuring Minimum and Maximum Word Length ❖ innodb_ft_min_token_size/innodb_ft_max_token_size ❖ ft_min_word_len/ft_max_word_lens ❖ Optimizing InnoDB Full-Text Indexes ❖ set GLOBAL innodb_optimize_fulltext_only=ON; ❖ OPTIMIZE TABLE table_name; ❖ Configuring the Natural Language Search Threshold ❖ storage/myisam/ftdefs.h ❖ #define GWS_IN_USE GWS_PROB -> #define GWS_IN_USE GWS_FREQ
Case Study
Case Study
MySQL Full-Text Monitor ❖ Tables in INFORMATION_SCHEMA database ❖ Variable:innodb_ft_aux_table
MySQL Full-Text Restrictions ❖ Full-text searches are supported for InnoDB and MyISAM tables only ❖ Full-text searches are not supported for partitioned tables ❖ Full-text indexes can be created only for CHAR, VARCHAR, or TEXT columns ❖ Ideographic languages limitations
MySQL Full-Text Restrictions ❖ All columns in a FULLTEXT index must use the same character set and collation. ❖ The MATCH() column list must match exactly the column definited in FULLTEXT index ❖ The argument to AGAINST() must be a string value ❖ Index hints are more limited for FULLTEXT searches than Non-FULLTEXT
Best Practices ❖ Drop FULLTEXT index before load large data sets ❖ InnoDB Full-Text Plugin ❖ InnoDB N-gram parser ❖ MeCab Parser “Do NOT alter the MySQL sources unless you know what you are doing!!!”
Variables for MyISAM ❖ ft_boolean_syntax ❖ ft_max_word_len ❖ ft_min_word_len ❖ ft_query_expansion_limit ❖ ft_stopword_file
Variables for InnoDB ❖ innodb_ft_aux_table ❖ innodb_ft_cache_size ❖ innodb_ft_enable_diag_print ❖ innodb_ft_enable_stopword ❖ innodb_ft_max_token_size ❖ innodb_ft_min_token_size
Variables for InnoDB ❖ innodb_ft_num_word_optimize ❖ innodb_ft_result_cache_limit ❖ innodb_ft_server_stopword_table ❖ innodb_ft_sort_pll_degree ❖ innodb_ft_total_cache_size ❖ innodb_ft_user_stopword_table ❖ innodb_optimize_fulltext_only
Q & A

MySQL Fulltext Search Tutorial

  • 1.
    MySQL Full-Text SearchTutorial — by royalwzy
  • 2.
    Personal Profile More than5 years experience in database use and management Proficient in synchronizing data between heterogeneous database Skilled in designing and optimizing database Install and configure MySQL cluster and high-availability cluster environment Synchronize and merge data from Oracle to MySQL mysqlexp mysqlclone Oracle 10g/11g OCM MySQL 5.6 Database Administrator MCITP/MCTS DBA(MSSQL2008) RHCE Java Programmer Oracle ACEA, Oracle Young Expert ACOUG/SHOUG/OCMU core member ITPUB Moderator and Community Expert Senior lecturer in "ENMO" and “UPLOOKING" Active in OTN & ITPUB and other forums My blog:royalwzy.com
  • 3.
    Table of Contents ❖MySQL Full-Text Search Introduction ❖ Three Types of Full-Text Searches ❖ MySQL Full-Text Stopwords ❖ Fine-Tuning MySQL Full-Text Search ❖ MySQL Full-Text Restrictions
  • 4.
    MySQL Full-Text SearchIntroduction ❖ Exciting feature ❖ Switch tables to InnoDB
  • 5.
    Syntax of Full-TextSearches MATCH (col1,col2,...) AGAINST (expr [search_modifier]) search_modifier: { IN NATURAL LANGUAGE MODE | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION | IN BOOLEAN MODE | WITH QUERY EXPANSION }
  • 6.
    Three Types ofFull-Text Searches ❖ Natural Language Full-Text Searches ❖ Boolean Full-Text Searches ❖ Full-Text Searches with Query Expansion
  • 7.
    Natural Language Full-TextSearches ❖ By default or with the IN NATURAL LANGUAGE MODE modifier ❖ MATCH() returns a relevance value ❖ AGAINST takes a search string and an optional modifier to search for
  • 8.
    Example of NaturalLanguage Full-Text Searches
  • 9.
    What does LastSlide Show? ❖ By default, performed in case- insensitive fashion(latin1->latin1_bin) ❖ The rows returned are automatically sorted with the highest relevance first ❖ Relevance is computed based on ❖ The number of words in the row ❖ The number of unique words in that row ❖ The total number of words in the collection ❖ The number of documents (rows) that contain a particular word
  • 10.
    Boolean Full-Text Searches ❖What if you want to retrieve records that include word "YourSQL" and exclude word "MySQL"? SELECT * FROM articles WHERE MATCH (title,body) AGAINST ('+YourSQL -MySQL' IN BOOLEAN MODE);
  • 11.
    Operators in BooleanFull-Text Search ❖ (no operator) ❖ + ❖ - ❖ ~ ❖ > < ❖ ( ) ❖ * ❖ “ ❖ @distance
  • 12.
    Examples Using BooleanFull-Text Operators ❖ 'apple banana' ❖ '+apple +juice' ❖ '+apple macintosh' ❖ '+apple -macintosh' ❖ '+apple ~macintosh' ❖ '+apple +(>plus <minus)' ❖ 'apple*' ❖ '"some words"' ❖ '"word1 word2 word3" @8'
  • 13.
    Quizzes ❖ What isthe relevancy ranking of the following three search strings in row "The Macintosh is a series of personal computers (PCs) designed, developed, and marketed by Apple Inc." ❖ Which search string will return more results? A:'+apple macintosh’ B:'+apple -macintosh’ C:'+apple ~macintosh’ A:'+word +the*’ B:'+word +the'
  • 14.
    How Relevancy Rankingis Calculated ❖ InnoDB full-text search's algorithms are based on BM25 and TF-IDF ranking algorithms ❖ TF-IDF(Term Frequency-Inverse Document Frequency) ❖ May differ from MyISAM relevancy rankings
  • 15.
    How Relevancy Rankingis Calculated ❖ ${TF} ❖ ${IDF} = log10( ${total_records} / $ {matching_records} ) ❖ total_records is the number of records in the collection ❖ matching_records is the number of records that the search term appears in ❖ ${Ranking} = ${TF} * ${IDF} * ${IDF}
  • 16.
    Example of RelevancyRanking for a Single Word Search
  • 17.
    Example of RelevancyRanking for a Single Word Search ❖ total_records:8 ❖ matching_records:3 ❖ ${IDF}=log10(8/3) ❖ ${TF}=6 ❖ ${Ranking}=6*log10(8/3)*log10(8/3)
  • 18.
    Example of RelevancyRanking for a Multiple Word Search ❖ The relevancy ranking value is a sum of the relevancy ranking value for each word ❖ ${rank} = ${TF} * ${IDF} * ${IDF} + ${TF} * ${IDF} * ${IDF}
  • 19.
    Example of RelevancyRanking for a Multiple Word Search ❖ total_records:8 ❖ matching_records:6 for "mysql" and 2 for "tutorial" ❖ mysql.${IDF}=log10(8/6),tutorial.${IDF} =log10(8/2) ❖ mysql.${TF}=1,tutorial.${TF}=2 ❖ ${Ranking}=(1*log10(8/6)*log10(8/6)) +(2*log10(8/2)*log10(8/2))
  • 20.
    Full-Text Searches withQuery Expansion ❖ Full-text search supports query expansion ❖ It's generally useful when a search phrase is too short ❖ Relying on implied knowledge that the full- text search engine lacks ❖ Enabled by adding WITH QUERY EXPANSION or IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION ❖ For example, a user searching for “database” may really mean that “MySQL”, “Oracle”, “DB2”, and “RDBMS”
  • 21.
  • 22.
  • 23.
    MySQL Full-Text Stopwords ❖Stopwords for InnoDB Search Indexes ❖ INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table ❖ innodb_ft_server_stopword_table/innodb_ft_user_stopword_table ❖ Stopwords for MyISAM Search Indexes ❖ storage/myisam/ft_static.c ❖ ft_stopword_file
  • 24.
    Fine-Tuning MySQL Full-TextSearch ❖ Configuring Minimum and Maximum Word Length ❖ innodb_ft_min_token_size/innodb_ft_max_token_size ❖ ft_min_word_len/ft_max_word_lens ❖ Optimizing InnoDB Full-Text Indexes ❖ set GLOBAL innodb_optimize_fulltext_only=ON; ❖ OPTIMIZE TABLE table_name; ❖ Configuring the Natural Language Search Threshold ❖ storage/myisam/ftdefs.h ❖ #define GWS_IN_USE GWS_PROB -> #define GWS_IN_USE GWS_FREQ
  • 25.
  • 26.
  • 27.
    MySQL Full-Text Monitor ❖Tables in INFORMATION_SCHEMA database ❖ Variable:innodb_ft_aux_table
  • 28.
    MySQL Full-Text Restrictions ❖Full-text searches are supported for InnoDB and MyISAM tables only ❖ Full-text searches are not supported for partitioned tables ❖ Full-text indexes can be created only for CHAR, VARCHAR, or TEXT columns ❖ Ideographic languages limitations
  • 29.
    MySQL Full-Text Restrictions ❖All columns in a FULLTEXT index must use the same character set and collation. ❖ The MATCH() column list must match exactly the column definited in FULLTEXT index ❖ The argument to AGAINST() must be a string value ❖ Index hints are more limited for FULLTEXT searches than Non-FULLTEXT
  • 30.
    Best Practices ❖ DropFULLTEXT index before load large data sets ❖ InnoDB Full-Text Plugin ❖ InnoDB N-gram parser ❖ MeCab Parser “Do NOT alter the MySQL sources unless you know what you are doing!!!”
  • 31.
    Variables for MyISAM ❖ft_boolean_syntax ❖ ft_max_word_len ❖ ft_min_word_len ❖ ft_query_expansion_limit ❖ ft_stopword_file
  • 32.
    Variables for InnoDB ❖innodb_ft_aux_table ❖ innodb_ft_cache_size ❖ innodb_ft_enable_diag_print ❖ innodb_ft_enable_stopword ❖ innodb_ft_max_token_size ❖ innodb_ft_min_token_size
  • 33.
    Variables for InnoDB ❖innodb_ft_num_word_optimize ❖ innodb_ft_result_cache_limit ❖ innodb_ft_server_stopword_table ❖ innodb_ft_sort_pll_degree ❖ innodb_ft_total_cache_size ❖ innodb_ft_user_stopword_table ❖ innodb_optimize_fulltext_only
  • 34.