Skip to content

Commit a78cf3c

Browse files
committed
doc: added a new section "Check List for Issues".
1 parent 3051f57 commit a78cf3c

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

README.markdown

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Table of Contents
2929
* [Load Balancing and Failover](#load-balancing-and-failover)
3030
* [Debugging](#debugging)
3131
* [Automatic Error Logging](#automatic-error-logging)
32+
* [Check List for Issues](#check-list-for-issues)
3233
* [Limitations](#limitations)
3334
* [Installation](#installation)
3435
* [TODO](#todo)
@@ -565,6 +566,18 @@ handling in your own Lua code, then you are recommended to disable this automati
565566

566567
[Back to TOC](#table-of-contents)
567568

569+
Check List for Issues
570+
=====================
571+
572+
1. Ensure you configure the connection pool size properly in the [set_keepalive](#set_keepalive) . Basically if your NGINX handle `n` concurrent requests and your NGINX has `m` workers, then the connection pool size should be configured as `n/m`. For example, if your NGINX usually handles 1000 concurrent requests and you have 10 NGINX workers, then the connection pool size should be 100.
573+
2. Ensure the backlog setting on the Redis side is large enough. For Redis 2.8+, you can directly tune the `tcp-backlog` parameter in the `redis.conf` file (and also tune the kernel parameter `SOMAXCONN` accordingly at least on Linux).
574+
3. Ensure you are not using too short timeout setting in the [set_timeout](#set_timeout) method. If you have to, try redoing the operation upon timeout and turning off [automatic error logging](#automatic-error-logging) (because you are already doing proper error handling in your own Lua code).
575+
4. If your NGINX worker processes' CPU usage is very high under load, then the NGINX event loop might be blocked by the CPU computation too much. Try sampling a [C-land on-CPU Flame Graph](https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt) and [Lua-land on-CPU Flame Graph](https://github.com/agentzh/stapxx#ngx-lj-lua-stacks) for a typical NGINX worker process. You can optimize the CPU-bound things according to these Flame Graphs.
576+
5. If your NGINX worker processes' CPU usage is very low under load, then the NGINX event loop might be blocked by some blocking system calls (like file IO system calls). You can confirm the issue by running the [epoll-loop-blocking-distr](https://github.com/agentzh/stapxx#epoll-loop-blocking-distr) tool against a typical NGINX worker process. If it is indeed the case, then you can further sample a [C-land off-CPU Flame Graph](https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt-off-cpu) for a NGINX worker process to analyze the actual blockers.
577+
6. If your `redis-server` process is running near 100% CPU usage, then you should consider scale your Redis backend by multiple nodes or use the [C-land on-CPU Flame Graph tool](https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt) to analyze the internal bottlenecks within the Redis server process.
578+
579+
[Back to TOC](#table-of-contents)
580+
568581
Limitations
569582
===========
570583

0 commit comments

Comments
 (0)