0

After days of debugging and tweaking around with settings, I'm getting exhausted & unable to find a solution. Kindly guide.

I've the following server on DigitalOcean:

64GB Memory 8 Core processor 200GB SSD drive 

And I'm running a single Wordpress site on it. Site gets high traffic. (2000 to 3000 concurrent realtime users) And I'm sure due to my bad settings I'm losing traffic & unable to serve pages to users. I expect the realtime users to be 5000+ but it always stays around 2000.

I constantly get OOM errors and due to which mysql or php5-fpm gets killed and the site goes down. If I tweak php-fpm and nginx I get 502 and 503 errors. Or I get upstream timed out (110: Connection timed out)' or FastCGI sent in stderr: PHP message: PHP Fatal error: Maximum execution time of 30 seconds exceeded error.

Now, I've tweaked the settings so that I don't get any error but the traffic has dropped to around 1500 concurrent users and it refuses to go up. So I'm sure there's something wrong in my settings.

/etc/php5/fpm/pool.d/www.conf settings:

pm = dynamic pm.max_children = 500 pm.start_servers = 150 pm.min_spare_servers = 100 pm.max_spare_servers = 200 pm.max_requests = 5000 

FastCGI settings: /etc/nginx/conf.d/default.conf

location ~ \.php$ { try_files $uri =404; # proxy buffers - no 502 errors! proxy_buffer_size 128k; proxy_buffers 4 256k; proxy_busy_buffers_size 256k; fastcgi_buffers 256 16k; fastcgi_buffer_size 128k; fastcgi_max_temp_file_size 0; fastcgi_intercept_errors on; fastcgi_keep_conn off; include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_pass unix:/dev/shm/php-fpm-www.sock; } 

APC setting: /etc/php5/fpm/php.ini

[apc] apc.write_lock = 1 apc.slam_defense = 0 apc.shm_size = "1024M" 

I've noticed that php5-fpm processes take a lot of memory. E.g. when I calculate the average memory per process I get: ps --no-headers -o "rss,cmd" -C php5-fpm | awk '{ sum+=$1 } END { printf ("%d%s\n", sum/NR/1024,"M") }' gives me 238M for a concurrent traffic of 1100.

Please guide me where my config is incorrect. Because I'm 100% sure my traffic is choking.


Additional info

Nginx config: /etc/nginx/nginx.conf

worker_processes 12; worker_rlimit_nofile 20000; events { worker_connections 3000; use epoll; multi_accept on; } 

But I've noticed that ulimit on the server is:

ulimit -n shows 1024 only. Is this related to my issue?

5
  • 1
    Is your Wordpress up-to-date? Have you checked all your plugins and their memory usage? Commented Sep 13, 2016 at 19:01
  • What fraction of the traffic is being served directly from the Varnish cache? What processes are using CPU? Unless your users are all logged you should be able to get 95+% cache hits, massively reducing resources required. I do this with the Nginx page cache. Google "Nginx microcaching" if you're worried about pages that change frequently. You should also be using a CDN (eg CloudFlare) for a website this busy. Check out this tutorial, parts 4 and 6. photographerstechsupport.com/tutorials/… Commented Sep 13, 2016 at 19:08
  • @TeroKilkanen Yes, WP is latest. I use just the bare minimum plugins, no shady plugins. All of them are from top developers. Commented Sep 14, 2016 at 3:38
  • @Tim Here are my hitrate avg: from varnishstat 0.9168 0.8600 0.8600 Only the authors (maximum 15 users) are logged in.Have not used nginx microcaching, will check it out. I use CloudFlare business plan and it has saved my a** more times than I can count. It's good. Are my settings correct? Do you want to have a look at the Varnish VCL file? Commented Sep 14, 2016 at 3:43
  • Main questions are what processes are using CPU (ie show us the top / htop), and why is it hitting PHP and the database so often? Look at the caching headers coming out of Wordpress, they're often wrong and will destroy cache stats, but you can rewrite them in Nginx. I have no knowledge of Varnish, I've always used Nginx page caching, but if you're serving 91% cached pages that's great. If you do caching right you could even have CloudFlare serve the pages, with a TTL of as little as 2 mins or as much as 12 hours. Commented Sep 14, 2016 at 4:18

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.