4

I am currently getting high CPU on a server that is just running a couple of sites with very low traffic. One of the sites is in still development going live soon. However, this site is very very slow...When browsing through its pages I can see that the CPU goes from 30% to 100% for httpd (see top output below).

I have tuned httpd & MySQL, Apache Solr, Tomcat for high performance, and I am using APC.

Not sure what to do from here or how to find the culprit as I have a bunch of messages on the httpd log and have been chasing dead ends for some time...any help is greatly appreciated.

Server: AuthenticAMD, Quad-Core AMD Opteron(tm) Processor 2352, RAM 16GB

Linux 2.6.27 64-bit, Centos 5.5

Plesk 9.5.4, MySQL 5.1.48, PHP 5.2.17

Apache/2.2.3 (CentOS) DAV/2 mod_jk/1.2.15 mod_ssl/2.2.3 OpenSSL/0.9.8e-fips-rhel5 PHP/5.2.17 mod_perl/2.0.4 Perl/v5.8.8

Tomcat6-6.0.29-1.jpp5, Tomcat-native-1.1.20-1.el5, Apache Solr

top

17595 apache 20 0 1825m 507m 10m R 100.4 3.2 0:17.50 httpd 17596 apache 20 0 1565m 247m 9936 R 83.1 1.5 0:10.86 httpd 17598 apache 20 0 1430m 110m 6472 S 54.5 0.7 0:08.66 httpd 17599 apache 20 0 1438m 124m 12m S 37.2 0.8 0:11.20 httpd 16197 mysql 20 0 13.0g 2.0g 5440 S 9.6 12.6 297:12.79 mysqld 17617 root 20 0 12748 1172 812 R 0.7 0.0 0:00.88 top 8169 tomcat 20 0 4613m 268m 6056 S 0.3 1.7 6:40.56 java 

httpd error_log

[debug] prefork.c(991): AcceptMutex: sysvsem (default: sysvsem) [info] mod_fcgid: Process manager 17593 started [debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17594 for worker proxy:reverse [debug] proxy_util.c(1967): proxy: initialized single connection worker 0 in child 17594 for (*) [debug] proxy_util.c(1854): proxy: grabbed scoreboard slot 0 in child 17595 for worker proxy:reverse [debug] proxy_util.c(1873): proxy: worker proxy:reverse already initialized [notice] child pid 22782 exit signal Segmentation fault (11) [error] (43)Identifier removed: apr_global_mutex_lock(jk_log_lock) failed [debug] util_ldap.c(2021): LDAP merging Shared Cache conf: shm=0x7fd29a5478c0 rmm=0x7fd29a547918 for VHOST: example.com [info] APR LDAP: Built with OpenLDAP LDAP SDK [info] LDAP: SSL support available [info] Init: Seeding PRNG with 256 bytes of entropy [info] Init: Generating temporary RSA private keys (512/1024 bits) [info] Init: Generating temporary DH parameters (512/1024 bits) [debug] ssl_scache_shmcb.c(374): shmcb_init allocated 512000 bytes of shared memory [debug] ssl_scache_shmcb.c(554): entered shmcb_init_memory() [debug] ssl_scache_shmcb.c(576): for 512000 bytes, recommending 4265 indexes [debug] ssl_scache_shmcb.c(619): shmcb_init_memory choices follow [debug] ssl_scache_shmcb.c(621): division_mask = 0x1F [debug] ssl_scache_shmcb.c(623): division_offset = 96 [debug] ssl_scache_shmcb.c(625): division_size = 15997 [debug] ssl_scache_shmcb.c(627): queue_size = 2136 [debug] ssl_scache_shmcb.c(629): index_num = 133 [debug] ssl_scache_shmcb.c(631): index_offset = 8 [debug] ssl_scache_shmcb.c(633): index_size = 16 [debug] ssl_scache_shmcb.c(635): cache_data_offset = 8 [debug] ssl_scache_shmcb.c(637): cache_data_size = 13853 [debug] ssl_scache_shmcb.c(650): leaving shmcb_init_memory() 

4 Answers 4

1

Try adding %P (and %D) to your log files - then you should be able to correlate what you see in 'top' with your acess log.

0

[notice] child pid 22782 exit signal Segmentation fault (11)

Something is definitely wrong here, you should add ulimit -c unlimited to the beginning of /etc/init.d/httpd to get a coredump next time it fails with segfault. Probably mod_jk is a root of the problem since there is an error related to mod_jk in the log.

2
  • Hello Alex, I've added the ulimit and will report back as soon as I have a coredum for the segfault. Thanks! Commented Feb 4, 2011 at 7:51
  • I've been monitoring it since yesterday and I am not getting the segmentation fault error or the mod_jk errors. What I am getting this though codedebug] util_ldap.c(2021): LDAP merging Shared Cache conf:code and also codessl_scache_shmcb.c code Commented Feb 6, 2011 at 11:57
0

I see mod_perl in the list. Is this site an application written in PERL? If so, then poorly written PERL code will be at the root of the issue.

Same comment applies to PHP. PHP applications are not known for performance and CMS applications have a reputation as resource hogs. If you are a hosting provider it would be best to either ban this CMS package or charge a higher rate to cover the extra resources.

But, if you are running this CMS for your own use, since it is opensource, you should post another question on StackOverflow, naming the package and asking how to track down and fix the poorly written code.

2
  • Hello Michael, the application is an open source CMS written in PHP and I am not aware of any custom PERL code used in this application. Commented Feb 4, 2011 at 7:50
  • KHWeb: CMS applications are pretty heavy by nature. What CMS you have installed and do you have any custom modules installed to it? For example, a 3rd party RSS subscriber statistics module in Drupal did a SELECT COUNT(*) SQL query during every page load and that made everything sloooowww ... in the same manner some component in your CMS can make it very taxing to CPU. If everything else fails, profile the code with XDebug and KCachegrind or some other utils. Commented Feb 4, 2011 at 12:44
-1

I have not seen the segmentation fault error again, but I am still seeing the high CPU coming from httpd. I was able to run a strace on the httpd process with the CPU and I got following:

 # strace -c -p 28964 Process 28964 attached - interrupt to quit ^CProcess 28964 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 88.94 0.006093 0 98299 4562 lstat 3.01 0.000206 0 2740 getcwd 2.28 0.000156 0 2158 2 read 2.26 0.000155 0 541 37 open 1.68 0.000115 0 1321 1321 readlink 1.52 0.000104 0 1678 822 access 0.32 0.000022 0 502 fstat 0.00 0.000000 0 25 write 0.00 0.000000 0 507 close 0.00 0.000000 0 547 478 stat 0.00 0.000000 0 23 poll 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 2 rt_sigprocmask 0.00 0.000000 0 2 writev 0.00 0.000000 0 3 setitimer 0.00 0.000000 0 1 sendfile ... ------ ----------- ----------- --------- --------- ---------------- 100.00 0.006851 108381 7224 total 

The 4562 errors from lstat are the same type of errors and show up like this on the log file:

# strace -f -t -o /var/log/strace.output -p 28964 

strace.output

28964 07:10:38 lstat("/var", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www", {st_mode=S_IFDIR|0755, st_size=94, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs", {st_mode=S_IFDIR|0777, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites", {st_mode=S_IFDIR|0755, st_size=30, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all", {st_mode=S_IFDIR|0755, st_size=66, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules", {st_mode=S_IFDIR|0755, st_size=12288, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 28964 07:10:38 lstat("/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites", 0x7fff1e627370) = -1 ENOENT (No such file or directory) 

The folders listed above are all in this website directory and part of the Drupal CMS. However the last one listed

/var/www/vhosts/example.com/httpdocs/sites/all/modules/views/includes/sites 

does not exists and it should really be

/var/www/vhosts/example.com/httpdocs/sites 

which does exist. Looks like lstat is trying to read a directory that does not exists....?

-1 ENOENT (No such file or directory) 

What would be the best way to troubleshoot this and find the source of this error for the missing directory?

2
  • What is the PHP log telling you, and what Drupal modules are running? Commented Feb 10, 2011 at 0:53
  • Hello Ben, The PHP log has a few errors but it looks like they don't correlate to the time I see this high cpu (I see this high cpu when I browse any page on this site). There are about 270 modules running on this site. Commented Feb 10, 2011 at 2:44

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.