What Your Database Query is Really doing
Your code sends off that Structured Query Language (SQL) to the database server and (hopefully) your get some data back. Back what is happening behind the curtain? If you are a novice or rock star developer, you may not know realize when you send off your query to a MySQL server. This session covers how your query (and maybe hundred of others) are collected at the server level, how the optimizer uses statistics plus other clues to build a query plan, how the cost based optimizer really figures out the best way to get the data, that 'best' may not be best, and how your data is return. You will learn how the query optimizer is like the global positioning system computer in your car with old maps, how to track queries, how to peek at someone else's query as it runs, and more. This session start with the assumption that the database server is a mystery to you and builds until you have a high level understanding of how queries are collected, processed, and returned. Your will gain valuable insights on how you can turn you queries, let the database do the heavy lifting, and become a more proficient user of a database 2
Hello!I am Dave Stokes MySQL Community Manager david.stokes@oracle.com @Stoker elephantanddolphin.blooger.com Slideshare.net/davidmstokes 3
21 Years Old MySQL has been part of Oracle’s family of databases for six years. MySQL 8 MySQl 5.7 is the current release but the next version will be MySQL 8. Big feature is real time data dictionary Group Replication Active master-master replication. JSON A new native JSON datatype to store documents in a column of a table Document Store Programmers not know SQL but need a database? X Devapi allows them to use RDMS from language of choice Encryption Use Oracle Key Vault to encrypt your data at rest. 4
1. So your write a query Then what?You have produced a statement in SQL for processing by a database 5
Structured Query Language is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system. -- Wikipedia 6
× Efficient storage of data, minimal duplication × Been around for decades × Based on set theory and relational theory Need normalized data where components are divided up into logical components -> Requires data to be architected and data integrity rules are enforced 7
Big Roughly 2% of developers receive any formal training in SQL, relational theory, etc. And they wonder why their queries perform poorly. 8
Goal: Using the MySQL World Database find the name of all cities and their corresponding country name Query: SELECT City.Name, Country.Name FROM City JOIN Country ON (County.Code = City.CountryCode); 9
Query: SELECT City.Name, Country.Name FROM City JOIN Country ON (County.Code = City.CountryCode); Previously someone has split the data for cities and countries into two tables. And they established a link between the two tables use three character country codes to link the two together. 10
A fair amount of developers see the database server as an arcane, messy, dank factory spewing smoke and mess. And they may be right. 11
$query = “SELECT City.Name, Country.Name FROM City JOIN Country On (County.Code = City.CountryCode”; $result = mysqli_query($link,$query); 12
Can you talk to server? 13
14
One: Is your system allowed to connect to server? Two: Are you using a valid account? Are there limits to this account? Three: Do you have proper permission to access the data you seek? 15
Login Okay System Okay Permissions Good 16
One: Is your system allowed to connect to server? Two: Are you using a valid account? Are there limits to this account? Three: Do you have proper permission to access the data you seek? 17
IS the Syntax correct?Checking the basics Build query planFind the best way to assemble the data Parse the queryFind the pieces needed 18
Cost ModelHow do we return the needed data the least expensive way? 19
Reading from disk is slow, 100,000 times slower than reading from memory. Note: Many vendors are looking at changing the cost model to accommodate new technologies, mixes of hardware technologies, and latencies. See mysql.server_cost and mysql.engine_cost tables for details 20
Where is cost determined? 21
The optimizer knows from stores statistics just what is has had to do on similar queries in the past and guesstimates What it will take to get this query done. 22
{ "query_block": { "select_id": 1, "cost_info": { "query_cost": "5132.14" }, "nested_loop": [ { "table": { "table_name": "Country", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 239, "rows_produced_per_join": 239, "filtered": "100.00", "cost_info": { "read_cost": "6.00", "eval_cost": "47.80", "prefix_cost": "53.80", "data_read_per_join": "61K" }, "used_columns": [ "Code", "Name" ] } }, { "table": { "table_name": "City", "access_type": "ref", "possible_keys": [ "CountryCode" ], "key": "CountryCode", "used_key_parts": [ "CountryCode" ], "key_length": "3", "ref": [ "world.Country.Code" ], "rows_examined_per_scan": 17, "rows_produced_per_join": 4231, "filtered": "100.00", "cost_info": { "read_cost": "4231.95", "eval_cost": "846.39", "prefix_cost": "5132.14", "data_read_per_join": "727K" }, "used_columns": [ "Name", "CountryCode" ] } } ] } } 23
"nested_loop": [ { "table": { "table_name": "Country", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 239, "rows_produced_per_join": 239, "filtered": "100.00", "cost_info": { "read_cost": "6.00", "eval_cost": "47.80", "prefix_cost": "53.80", "data_read_per_join": "61K" }, "used_columns": [ "Code", "Name" ] Nested loop join for Country Table We DO have an index for the JOIN :-) Statistics 24
Every new column sought adds a factorial of complexity to getting the data. Adding a third column would take our simple query from two to six possibilities!!!!!! 25
Some Databases allow locking query plans so once you have it optimized you can get consistent results. MySQL does not do this!!!! 26
27
28 MySQL Workbench (Our second most popular FREE download) has the ability to graphically display the output of the EXPLAIN command. This is the same query as the last page.
Use SHOW PROCESS_LIST(); to find a running query on the server. 29 Use EXPLAIN FOR CONNENCTION N to see output.
Indexes let you go to the exact record(s) desired. They do have some overhead and will slow down bulk inserts. Without indexes the entire table/file has to be read, AKA a FULL TABLE SCAN -- to be avoided if your goal is not to read the entire table. Compound indexes let you use multiple columns like YEAR- MONTH-DATE and allow YEAR-MONTH-DATE, YEAR- MONTH, and YEAR searches. The Optimizer looked to use indexes as much as possible! 30
Your data is then packaged up and sent to your application And that is the basics. 31
And here is the output 32
And that is how a query Is supposed to be processed 33
Common problems And how to avoid them 34
N+1 ProblemBad practice, may come from your ORM Can be hard to catchLots of fussy little queries Avoid by thinking in setsLet database do heavy lifting 35
You chain a set of queries together to answer a question - Look up employees, then the ones who live near you, and then the ones who have a parking permit so you can get a ride to work Each dive into the database has a cost. Databases can handle one BIG request better than a lot of little. So One query for someone who lives near you with parking permit. 36
foreach (sale_emp in sales_employees) $pay = $pay * .20; UPDATE employees SET pay_rate = pay_rate * .20 WHERE department = ‘sales’; Your boss asks you to give all the sales staff a 20% pay increase 37
foreach (sale_emp in sales_employees) $pay = $pay * .20; START TRANSACTION; UPDATE employees SET pay_rate = pay_rate * .20 WHERE department = ‘sales’; COMMIT; A transaction does all the work at the same time, can be rolled back!! 38
QUIZ TIME!!!!!! 39
SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code); SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) LIMIT 5; 40
SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) ORDER BY City.Population; SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) GROUP BY Country.Name; 41
You can not tell without looking at the And using explain on a query. There is no Way to tell if a query is good or bad Just by looking. 42
Databases have many useful functions for maximums, minimums, averages, standard deviations, and more. No Need to crunch numbers in your app. 43
MySQL and other RDBMS offer a JSON data type. You can store a JSON document without schema. Great for JSON formatted data and those who just don’t care! This works but is not as fast as normalized data, no rigor is applied to the data, the standard is still being worked on, and this frustrates old timers, dag-nabbit!! 44
MySQL offers the X Devapi and acts as a docstore from your language of choice for CRUD (Create, Replace, Update, Delete) so you do not need knowledge of SQL. Works very well but you may need a DBA/Architect to later normalize some of the data with generated columns for performance. 45
Data The first step in getting great performance out of your database is proper data normalization. Indexes Indexes greatly speed searches but take overhead. General rule: Index primary keys and columns used on right of WHERE clause Heavy Lifting Databases are great at handling big chunks of data and can perform transactions. Try to maximize effects of each query. Disks Do not go cheap on disks. SSDs do pay for themselves and move heavy uses to separate disks/controllers Slow Query Log Turn on the slow query log and pay attention. Note: Some slow queries are there because they run a long time and just not slow Sys Schema The Sys Schema was designed to let you peer into the heart of your instances and answer questions like which indexes are not used, who is hogging I/O 46
JOIN US FOR CONTRIBUTION SPRINTS First Time Sprinter Workshop - 9:00-12:00 - Room Wicklow 2A Mentored Core Sprint - 9:00-18:00 - Wicklow Hall 2B General Sprints - 9:00 - 18:00 - Wicklow Hall 2A 47
THANKS!Any questions? You can find me at @stoker & david.stokes@oracle.com 48

What Your Database Query is Really Doing

  • 1.
  • 2.
    Your code sendsoff that Structured Query Language (SQL) to the database server and (hopefully) your get some data back. Back what is happening behind the curtain? If you are a novice or rock star developer, you may not know realize when you send off your query to a MySQL server. This session covers how your query (and maybe hundred of others) are collected at the server level, how the optimizer uses statistics plus other clues to build a query plan, how the cost based optimizer really figures out the best way to get the data, that 'best' may not be best, and how your data is return. You will learn how the query optimizer is like the global positioning system computer in your car with old maps, how to track queries, how to peek at someone else's query as it runs, and more. This session start with the assumption that the database server is a mystery to you and builds until you have a high level understanding of how queries are collected, processed, and returned. Your will gain valuable insights on how you can turn you queries, let the database do the heavy lifting, and become a more proficient user of a database 2
  • 3.
    Hello!I am DaveStokes MySQL Community Manager david.stokes@oracle.com @Stoker elephantanddolphin.blooger.com Slideshare.net/davidmstokes 3
  • 4.
    21 Years Old MySQLhas been part of Oracle’s family of databases for six years. MySQL 8 MySQl 5.7 is the current release but the next version will be MySQL 8. Big feature is real time data dictionary Group Replication Active master-master replication. JSON A new native JSON datatype to store documents in a column of a table Document Store Programmers not know SQL but need a database? X Devapi allows them to use RDMS from language of choice Encryption Use Oracle Key Vault to encrypt your data at rest. 4
  • 5.
    1. So your writea query Then what?You have produced a statement in SQL for processing by a database 5
  • 6.
    Structured Query Languageis a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system. -- Wikipedia 6
  • 7.
    × Efficient storageof data, minimal duplication × Been around for decades × Based on set theory and relational theory Need normalized data where components are divided up into logical components -> Requires data to be architected and data integrity rules are enforced 7
  • 8.
    Big Roughly 2% ofdevelopers receive any formal training in SQL, relational theory, etc. And they wonder why their queries perform poorly. 8
  • 9.
    Goal: Using the MySQLWorld Database find the name of all cities and their corresponding country name Query: SELECT City.Name, Country.Name FROM City JOIN Country ON (County.Code = City.CountryCode); 9
  • 10.
    Query: SELECT City.Name, Country.Name FROM City JOINCountry ON (County.Code = City.CountryCode); Previously someone has split the data for cities and countries into two tables. And they established a link between the two tables use three character country codes to link the two together. 10
  • 11.
    A fair amountof developers see the database server as an arcane, messy, dank factory spewing smoke and mess. And they may be right. 11
  • 12.
    $query = “SELECTCity.Name, Country.Name FROM City JOIN Country On (County.Code = City.CountryCode”; $result = mysqli_query($link,$query); 12
  • 13.
    Can you talkto server? 13
  • 14.
  • 15.
    One: Is your system allowedto connect to server? Two: Are you using a valid account? Are there limits to this account? Three: Do you have proper permission to access the data you seek? 15
  • 16.
  • 17.
    One: Is your system allowedto connect to server? Two: Are you using a valid account? Are there limits to this account? Three: Do you have proper permission to access the data you seek? 17
  • 18.
    IS the Syntaxcorrect?Checking the basics Build query planFind the best way to assemble the data Parse the queryFind the pieces needed 18
  • 19.
    Cost ModelHow dowe return the needed data the least expensive way? 19
  • 20.
    Reading from disk isslow, 100,000 times slower than reading from memory. Note: Many vendors are looking at changing the cost model to accommodate new technologies, mixes of hardware technologies, and latencies. See mysql.server_cost and mysql.engine_cost tables for details 20
  • 21.
  • 22.
    The optimizer knowsfrom stores statistics just what is has had to do on similar queries in the past and guesstimates What it will take to get this query done. 22
  • 23.
    { "query_block": { "select_id": 1, "cost_info":{ "query_cost": "5132.14" }, "nested_loop": [ { "table": { "table_name": "Country", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 239, "rows_produced_per_join": 239, "filtered": "100.00", "cost_info": { "read_cost": "6.00", "eval_cost": "47.80", "prefix_cost": "53.80", "data_read_per_join": "61K" }, "used_columns": [ "Code", "Name" ] } }, { "table": { "table_name": "City", "access_type": "ref", "possible_keys": [ "CountryCode" ], "key": "CountryCode", "used_key_parts": [ "CountryCode" ], "key_length": "3", "ref": [ "world.Country.Code" ], "rows_examined_per_scan": 17, "rows_produced_per_join": 4231, "filtered": "100.00", "cost_info": { "read_cost": "4231.95", "eval_cost": "846.39", "prefix_cost": "5132.14", "data_read_per_join": "727K" }, "used_columns": [ "Name", "CountryCode" ] } } ] } } 23
  • 24.
    "nested_loop": [ { "table": { "table_name":"Country", "access_type": "ALL", "possible_keys": [ "PRIMARY" ], "rows_examined_per_scan": 239, "rows_produced_per_join": 239, "filtered": "100.00", "cost_info": { "read_cost": "6.00", "eval_cost": "47.80", "prefix_cost": "53.80", "data_read_per_join": "61K" }, "used_columns": [ "Code", "Name" ] Nested loop join for Country Table We DO have an index for the JOIN :-) Statistics 24
  • 25.
    Every new columnsought adds a factorial of complexity to getting the data. Adding a third column would take our simple query from two to six possibilities!!!!!! 25
  • 26.
    Some Databases allow lockingquery plans so once you have it optimized you can get consistent results. MySQL does not do this!!!! 26
  • 27.
  • 28.
    28 MySQL Workbench (Our secondmost popular FREE download) has the ability to graphically display the output of the EXPLAIN command. This is the same query as the last page.
  • 29.
    Use SHOW PROCESS_LIST(); to finda running query on the server. 29 Use EXPLAIN FOR CONNENCTION N to see output.
  • 30.
    Indexes let yougo to the exact record(s) desired. They do have some overhead and will slow down bulk inserts. Without indexes the entire table/file has to be read, AKA a FULL TABLE SCAN -- to be avoided if your goal is not to read the entire table. Compound indexes let you use multiple columns like YEAR- MONTH-DATE and allow YEAR-MONTH-DATE, YEAR- MONTH, and YEAR searches. The Optimizer looked to use indexes as much as possible! 30
  • 31.
    Your data isthen packaged up and sent to your application And that is the basics. 31
  • 32.
  • 33.
    And that ishow a query Is supposed to be processed 33
  • 34.
  • 35.
    N+1 ProblemBad practice,may come from your ORM Can be hard to catchLots of fussy little queries Avoid by thinking in setsLet database do heavy lifting 35
  • 36.
    You chain aset of queries together to answer a question - Look up employees, then the ones who live near you, and then the ones who have a parking permit so you can get a ride to work Each dive into the database has a cost. Databases can handle one BIG request better than a lot of little. So One query for someone who lives near you with parking permit. 36
  • 37.
    foreach (sale_emp in sales_employees) $pay= $pay * .20; UPDATE employees SET pay_rate = pay_rate * .20 WHERE department = ‘sales’; Your boss asks you to give all the sales staff a 20% pay increase 37
  • 38.
    foreach (sale_emp in sales_employees) $pay= $pay * .20; START TRANSACTION; UPDATE employees SET pay_rate = pay_rate * .20 WHERE department = ‘sales’; COMMIT; A transaction does all the work at the same time, can be rolled back!! 38
  • 39.
  • 40.
    SELECT City.Name asCity, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code); SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) LIMIT 5; 40
  • 41.
    SELECT City.Name asCity, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) ORDER BY City.Population; SELECT City.Name as City, Country.name as Country FROM City JOIN ON (City.CountryCode = Country.code) GROUP BY Country.Name; 41
  • 42.
    You can nottell without looking at the And using explain on a query. There is no Way to tell if a query is good or bad Just by looking. 42
  • 43.
    Databases have many usefulfunctions for maximums, minimums, averages, standard deviations, and more. No Need to crunch numbers in your app. 43
  • 44.
    MySQL and otherRDBMS offer a JSON data type. You can store a JSON document without schema. Great for JSON formatted data and those who just don’t care! This works but is not as fast as normalized data, no rigor is applied to the data, the standard is still being worked on, and this frustrates old timers, dag-nabbit!! 44
  • 45.
    MySQL offers theX Devapi and acts as a docstore from your language of choice for CRUD (Create, Replace, Update, Delete) so you do not need knowledge of SQL. Works very well but you may need a DBA/Architect to later normalize some of the data with generated columns for performance. 45
  • 46.
    Data The first stepin getting great performance out of your database is proper data normalization. Indexes Indexes greatly speed searches but take overhead. General rule: Index primary keys and columns used on right of WHERE clause Heavy Lifting Databases are great at handling big chunks of data and can perform transactions. Try to maximize effects of each query. Disks Do not go cheap on disks. SSDs do pay for themselves and move heavy uses to separate disks/controllers Slow Query Log Turn on the slow query log and pay attention. Note: Some slow queries are there because they run a long time and just not slow Sys Schema The Sys Schema was designed to let you peer into the heart of your instances and answer questions like which indexes are not used, who is hogging I/O 46
  • 47.
    JOIN US FOR CONTRIBUTIONSPRINTS First Time Sprinter Workshop - 9:00-12:00 - Room Wicklow 2A Mentored Core Sprint - 9:00-18:00 - Wicklow Hall 2B General Sprints - 9:00 - 18:00 - Wicklow Hall 2A 47
  • 48.
    THANKS!Any questions? You canfind me at @stoker & david.stokes@oracle.com 48