MariaDB / MySQL: Developing Scalable Applications
$ whoami Hi, I’m Federico Razzoli, founder of Vettabase Database consultant, open source supporter, long time MariaDB and MySQL user vettabase.com federico.razzoli@vettabase.com
Agenda
● What is High Availability ● Read-Write splitting ● Transactions and locks in a HA environment ● Deadlocks and how to avoid them ● Primary keys and HA All from a perspective of a Backend Developer Agenda
High Availability
High Availability for databases means that: ● The same database exists on multiple servers ● And any of them can process SQL queries High Availability
This is essential because: ● If a server crashes there is no outage ● The workload can be distributed over multiple servers High Availability
This is essential because: ● If a server crashes there is no outage ● The workload can be distributed over multiple servers This is database scalability High Availability
Asynchronous Replication
Galera Cluster
● RDS replication is asynchronous ● RDS multi-AZ is synchronous ● Ec2 multi-AZ is synchronous ● Aurora has synchronous storage, asynchronous cache High Availability
Read-Write splitting
● Writes must be directed to the master ● Reads, in general, should be directed to a replica. Problems: ○ Some reads can tolerate some lag, others can't ○ Some reads need to "see" a new row immediately after writing it Read/Write Splitting
For reads, ask yourself two questions: ● Can some lag (perhaps < 1 second) be tolerated? ● If so, how much lag can you tolerate? Only developers can take these decisions, and the answer depends on the use case. Examples: ● Ecommerce: read the list of buyable products. No lag tolerated. ● Show how many users are online now. A some minutes lag is ok. ● Show sales tends from 2020 to today. Lag doesn't matter. Read/Write Splitting
Generally you have different hostnames or credentials to send different types of SQL queries: ● Writes and regular reads: a load balancer will split them ● OPTIONAL: Reads that can tolerate a lag ● OPTIONAL: Reads that can tolerate a longer lag Read/Write Splitting
In the simplest case, the load balancer will send: ● INSERT, UPDATE, DELETE to a Master ● SELECT queries to a Replica Read/Write Splitting
Some queries need be sent to the same server and, because the next query might be a write, they're sent to the master: ● Transactions ● Stored procedures / functions ● Queries after CREATE TEMPORARY TABLE ● Queries after a user-variable is created To scale better, try to avoid these things when not really needed Read/Write Splitting
Some balancers (ProxySQL) allow to add a rule to force the query to a Master or a Replica if they contain a comment: /*READ*/ CALL my_procedure(); /*WRITE*/ SELECT x, y, z FROM tbl; Read/Write Splitting
Transactions
● Use transactions to make sure that a series of queries is sent to the master in the same connection: START TRANSACTION; INSERT … ; SELECT … ; COMMIT; ● Or to the same replica, in the same connection: START TRANSACTION READ ONLY; SELECT … ; SELECT … ; COMMIT; But sometimes, you can do even better: use one SQL statement instead of several of them. Transactions
● Insert row if it doesn't exist: INSERT IGNORE … ; if (mysql_affected_rows() > 0) … ● Delete row if it doesn't exist: DELETE IGNORE … ; if (mysql_affected_rows() > 0) … ● Insert or update: INSERT (email, phone) VALUES ('abc@hey.com', '12345') ON DUPLICATE KEY UPDATE email = 'abc@hey.com', phone = '12345'; Transactions
● Insert + select (many ORMs do this): INSERT INTO user ... RETURNING id, email; if (mysql_affected_rows() > 0) … ● Delete + select: DELETE FROM user WHERE id IN (1, 2, 3) RETURNING id, email; if (mysql_affected_rows() > 0) … Transactions
● Insert multiple rows with 1 INSERT INSERT INTO tbl (a, b, c) VALUES (10, 20, 30), (11, 22, 33), … ; ● Delete or update from multiple tables at once: DELETE t1, t2 FROM t1 INNER JOIN t2 WHERE t1.id = t2.id AND t1.x = 1; Transactions
● If you don't use multi-statement transactions, set autocommit=1 ● Some statements (especially SET) can confuse the balancer, or even be rejected. If it happens, you can put them, and the following queries, into stored procedures Transactions
Remember to keep transactions short (when possible): ● In terms of rows read ● In terms of time elapsed Though, a mass insert/update/delete is faster if it happens in one transaction Transactions
Locks
● Data in a transactional database are versioned ● They are updated asynchronously in the data files ● Transaction logs contain the information to update data or to restore reasonably old versions (ROLLBACK) Locks
● Writes acquire locks on modified rows ● Reads acquire a shared lock and a view on data Locks
To reduce deadlocks: ● Reduce foreign keys ● Keep transactions short ● Transactions that read the same table should do it always in the same order ● Use isolation levels wisely Extreme measure to make deadlocks impossible: ● Set innodb_lock_wait_timeout=0 Locks
A transaction uses a transaction isolation level: ● REPEATABLE READ All queries in the transaction use the same view, acquired at transaction start ● READ COMMITTED Each query acquires a new data view ● READ UNCOMMITTED Similar to READ COMMITTED, but queries can see uncommitted data Locks
Example: SET TRANSACTION ISOLATION LEVEL READ COMMITTED; START TRANSACTION; … COMMIT; Locks
Primary keys
AUTO_INCREMENT primary keys are common but have some drawbacks: ● Shouldn't be used to sort chronologically (failover) ● Numbers have a limit, even with BIGINT UNSIGNED the limit can be reached ● An AUTO_INCREMENT lock governs the primary keys ● Easy to guess Locks
● Use UUID instead: CREATE OR REPLACE TABLE employee ( uuid UUID DEFAULT UUID() PRIMARY KEY, ... ); Locks
It’s question time!
Questions?

MariaDB/MySQL_: Developing Scalable Applications

  • 1.
    MariaDB / MySQL: DevelopingScalable Applications
  • 2.
    $ whoami Hi, I’mFederico Razzoli, founder of Vettabase Database consultant, open source supporter, long time MariaDB and MySQL user vettabase.com federico.razzoli@vettabase.com
  • 3.
  • 4.
    ● What isHigh Availability ● Read-Write splitting ● Transactions and locks in a HA environment ● Deadlocks and how to avoid them ● Primary keys and HA All from a perspective of a Backend Developer Agenda
  • 5.
  • 6.
    High Availability fordatabases means that: ● The same database exists on multiple servers ● And any of them can process SQL queries High Availability
  • 7.
    This is essentialbecause: ● If a server crashes there is no outage ● The workload can be distributed over multiple servers High Availability
  • 8.
    This is essentialbecause: ● If a server crashes there is no outage ● The workload can be distributed over multiple servers This is database scalability High Availability
  • 9.
  • 10.
  • 11.
    ● RDS replicationis asynchronous ● RDS multi-AZ is synchronous ● Ec2 multi-AZ is synchronous ● Aurora has synchronous storage, asynchronous cache High Availability
  • 12.
  • 13.
    ● Writes mustbe directed to the master ● Reads, in general, should be directed to a replica. Problems: ○ Some reads can tolerate some lag, others can't ○ Some reads need to "see" a new row immediately after writing it Read/Write Splitting
  • 14.
    For reads, askyourself two questions: ● Can some lag (perhaps < 1 second) be tolerated? ● If so, how much lag can you tolerate? Only developers can take these decisions, and the answer depends on the use case. Examples: ● Ecommerce: read the list of buyable products. No lag tolerated. ● Show how many users are online now. A some minutes lag is ok. ● Show sales tends from 2020 to today. Lag doesn't matter. Read/Write Splitting
  • 15.
    Generally you havedifferent hostnames or credentials to send different types of SQL queries: ● Writes and regular reads: a load balancer will split them ● OPTIONAL: Reads that can tolerate a lag ● OPTIONAL: Reads that can tolerate a longer lag Read/Write Splitting
  • 16.
    In the simplestcase, the load balancer will send: ● INSERT, UPDATE, DELETE to a Master ● SELECT queries to a Replica Read/Write Splitting
  • 17.
    Some queries needbe sent to the same server and, because the next query might be a write, they're sent to the master: ● Transactions ● Stored procedures / functions ● Queries after CREATE TEMPORARY TABLE ● Queries after a user-variable is created To scale better, try to avoid these things when not really needed Read/Write Splitting
  • 18.
    Some balancers (ProxySQL)allow to add a rule to force the query to a Master or a Replica if they contain a comment: /*READ*/ CALL my_procedure(); /*WRITE*/ SELECT x, y, z FROM tbl; Read/Write Splitting
  • 19.
  • 20.
    ● Use transactionsto make sure that a series of queries is sent to the master in the same connection: START TRANSACTION; INSERT … ; SELECT … ; COMMIT; ● Or to the same replica, in the same connection: START TRANSACTION READ ONLY; SELECT … ; SELECT … ; COMMIT; But sometimes, you can do even better: use one SQL statement instead of several of them. Transactions
  • 21.
    ● Insert rowif it doesn't exist: INSERT IGNORE … ; if (mysql_affected_rows() > 0) … ● Delete row if it doesn't exist: DELETE IGNORE … ; if (mysql_affected_rows() > 0) … ● Insert or update: INSERT (email, phone) VALUES ('abc@hey.com', '12345') ON DUPLICATE KEY UPDATE email = 'abc@hey.com', phone = '12345'; Transactions
  • 22.
    ● Insert +select (many ORMs do this): INSERT INTO user ... RETURNING id, email; if (mysql_affected_rows() > 0) … ● Delete + select: DELETE FROM user WHERE id IN (1, 2, 3) RETURNING id, email; if (mysql_affected_rows() > 0) … Transactions
  • 23.
    ● Insert multiplerows with 1 INSERT INSERT INTO tbl (a, b, c) VALUES (10, 20, 30), (11, 22, 33), … ; ● Delete or update from multiple tables at once: DELETE t1, t2 FROM t1 INNER JOIN t2 WHERE t1.id = t2.id AND t1.x = 1; Transactions
  • 24.
    ● If youdon't use multi-statement transactions, set autocommit=1 ● Some statements (especially SET) can confuse the balancer, or even be rejected. If it happens, you can put them, and the following queries, into stored procedures Transactions
  • 25.
    Remember to keeptransactions short (when possible): ● In terms of rows read ● In terms of time elapsed Though, a mass insert/update/delete is faster if it happens in one transaction Transactions
  • 26.
  • 27.
    ● Data ina transactional database are versioned ● They are updated asynchronously in the data files ● Transaction logs contain the information to update data or to restore reasonably old versions (ROLLBACK) Locks
  • 28.
    ● Writes acquirelocks on modified rows ● Reads acquire a shared lock and a view on data Locks
  • 29.
    To reduce deadlocks: ●Reduce foreign keys ● Keep transactions short ● Transactions that read the same table should do it always in the same order ● Use isolation levels wisely Extreme measure to make deadlocks impossible: ● Set innodb_lock_wait_timeout=0 Locks
  • 30.
    A transaction usesa transaction isolation level: ● REPEATABLE READ All queries in the transaction use the same view, acquired at transaction start ● READ COMMITTED Each query acquires a new data view ● READ UNCOMMITTED Similar to READ COMMITTED, but queries can see uncommitted data Locks
  • 31.
    Example: SET TRANSACTION ISOLATIONLEVEL READ COMMITTED; START TRANSACTION; … COMMIT; Locks
  • 32.
  • 33.
    AUTO_INCREMENT primary keysare common but have some drawbacks: ● Shouldn't be used to sort chronologically (failover) ● Numbers have a limit, even with BIGINT UNSIGNED the limit can be reached ● An AUTO_INCREMENT lock governs the primary keys ● Easy to guess Locks
  • 34.
    ● Use UUIDinstead: CREATE OR REPLACE TABLE employee ( uuid UUID DEFAULT UUID() PRIMARY KEY, ... ); Locks
  • 35.
  • 36.