4

I'm having an intermittent issue connecting to a database slave with InnoDB. Intermittently I get connections taking longer than 2 seconds. These servers are hosted on Amazon's EC2.

The app server is PHP 5.2/Apache running on Ubuntu. The DB slave is running Percona's XtraDB 5.1 on Ubuntu 9.10. It's using an EBS Raid array for the data storage.

We already use skip name resolve and bind to address 0.0.0.0.

This is a stub of the PHP code that's failing

 $tmp = mysqli_init(); $start_time = microtime(true); $tmp->options(MYSQLI_OPT_CONNECT_TIMEOUT, 2); $tmp->real_connect($DB_SERVERS[$server]['server'], $DB_SERVERS[$server]['username'], $DB_SERVERS[$server]['password'], $DB_SERVERS[$server]['schema'], $DB_SERVERS[$server]['port']); if(mysqli_connect_errno()){ $timer = microtime(true) - $start_time; mail($errors_to,'DB connection error',$timer); } 

There's more than 300Mb available on the DB server for new connections and the server is nowhere near the max allowed (60 of 1,200). Loading on both servers is < 2 on 4 core m1.xlarge instances.

Some highlights from the mysql config

 max_connections = 1200 thread_stack = 512K thread_cache_size = 1024 thread_concurrency = 16 innodb-file-per-table innodb_additional_mem_pool_size = 16M innodb_buffer_pool_size = 13G 

Any help on tracing the source of the slowdown is appreciated.

[EDIT] I have been updating the sysctl values for the network but they don't seem to be fixing the problem. I made the following adjustments on both the database and application servers.

 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_fin_timeout = 20 net.ipv4.tcp_keepalive_time = 180 net.ipv4.tcp_max_syn_backlog = 1280 net.ipv4.tcp_synack_retries = 1 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 

[EDIT] Per jaimieb's suggestion, I added some tracing and captured the following data using time. This server handles about 51 queries/second at this the time of day. The connection error was raised once (at 13:06:36) during the 3 minute window outlined below. Since there was 1 failure and roughly 9,200 successful connections, I think this isn't going to produce anything meaningful in terms of reporting.

Script:

 date >> /root/database_server.txt (time mysql -h database_Server -D schema_name -u appuser -p apppassword -e '') > /dev/null 2>> /root/database_server.txt 

Results:

 === Application Server 1 === Mon Feb 22 13:05:01 EST 2010 real 0m0.008s user 0m0.001s sys 0m0.000s Mon Feb 22 13:06:01 EST 2010 real 0m0.007s user 0m0.002s sys 0m0.000s Mon Feb 22 13:07:01 EST 2010 real 0m0.008s user 0m0.000s sys 0m0.001s === Application Server 2 === Mon Feb 22 13:05:01 EST 2010 real 0m0.009s user 0m0.000s sys 0m0.002s Mon Feb 22 13:06:01 EST 2010 real 0m0.009s user 0m0.001s sys 0m0.003s Mon Feb 22 13:07:01 EST 2010 real 0m0.008s user 0m0.000s sys 0m0.001s === Database Server === Mon Feb 22 13:05:01 EST 2010 real 0m0.016s user 0m0.000s sys 0m0.010s Mon Feb 22 13:06:01 EST 2010 real 0m0.006s user 0m0.010s sys 0m0.000s Mon Feb 22 13:07:01 EST 2010 real 0m0.016s user 0m0.000s sys 0m0.010s 

[EDIT] Per a suggestion received on a LinkedIn question, I tried setting the back_log value higher. We had been running the default value (50) and increased it to 150. We also raised the kernel value /proc/sys/net/core/somaxconn (maximum socket connections) to 256 on both the application and database server from the default 128. We did see some elevation in processor utilization as a result but still received connection timeouts.

2
  • For what it's worth, I'm exploring connectivity of other services (memcached etc) to see if this is isolated to PHP->MySQL or if there is an issue with TCP/IP connectivity in general. Commented Feb 23, 2010 at 14:43
  • Shouldn't it be innodb_file_per_table instead of innodb-file-per-table? Commented Apr 3, 2018 at 3:16

3 Answers 3

2

How well does it work if you eliminate PHP from the equation? Use the CLI mysql client to connect to the server. Try it from both the db server itself and from the app server:

time mysql -h localhost -D dbname -u username -ppassword -e '' 
1
  • I've added this to one app server and to one of the DB servers I've seen this on and will see if I can catch what's happening at this tier. Thanks for the suggestion. Commented Feb 22, 2010 at 16:28
1

Check your DNS servers, I think mysql might be trying to resolve the reverse DNS of the connecting host. Also make sure /etc/hosts is sane and has "127.0.0.1 localhost"

0

This may not even be close, but could you be waiting for flush to disk? Maybe timing out?

Bear in mind you could lose up to 1 minute of data in a failure.

innodb_flush_log_at_trx_commit = 0 (Default is 1)

This will cause InnoDB to only write and flush the log buffer once per second. : http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit

2
  • Thanks, for the suggestion, we're already running with innodb_flush_log_at_trx_commit = 0 :( Commented Feb 23, 2010 at 14:42
  • I would suggest innodb_flush_log_at_trx_commit = 2. It's similar for performance, but more slightly more durable. Commented Nov 14, 2010 at 2:56

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.