0

I have an Asterisk server doing automated calls and I'm noticing an unexplained high load in it. The server is only running Asterisk. Database and other support applications run in a different machine.

What can be causing this high load?

If the load gets too high for too long (over the maximum of 8) there is a drop in calls and general unresponsiveness.

The hardware is an 8 core "Intel Xeon E3-1230 V2 @ 3.30GHz" with 16GB of RAM

I read the other posts about similar problems so I'll post all the information requested in them.

Following are the results of the monitoring tools I have been using. They system is currently handling about 200 channels.

It doesn't scale linearly, at 400 channels the load gets to 8 and things go downhill from there.

"ps auxf" shows nothing in D state.

top - 10:50:28 up 7 days, 23:20, 3 users, load average: 2.16, 2.44, 1.93 Tasks: 341 total, 1 running, 340 sleeping, 0 stopped, 0 zombie Cpu(s): 4.3%us, 5.9%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 16303732k total, 7179980k used, 9123752k free, 264836k buffers Swap: 8224760k total, 0k used, 8224760k free, 5759716k cached 2512 root 20 0 4744m 173m 46m S 50.8 1.1 396:55.01 asterisk 

A typical iostat -x. sda and sdb are in raid 1 and sdc is an ssd storing some sound files used very often.

avg-cpu: %user %nice %system %iowait %steal %idle 4.30 0.00 5.94 0.38 0.00 89.38 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.00 0.00 9.00 0.00 112.00 12.44 0.04 4.33 3.67 3.30 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdb 0.00 7.00 0.00 8.00 0.00 112.00 14.00 0.04 5.25 4.62 3.70 md127 0.00 0.00 0.00 15.00 0.00 112.00 7.47 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 14.00 0.00 112.00 8.00 0.05 3.50 2.64 3.70 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 

cat /proc/interrupts

 CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 127 0 0 0 0 0 0 0 IO-APIC-edge timer 1: 2 0 0 0 0 0 0 0 IO-APIC-edge i8042 3: 2 0 0 0 0 0 0 0 IO-APIC-edge 4: 2 0 0 0 0 0 0 0 IO-APIC-edge 8: 1 0 0 0 0 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi 10: 2 0 0 0 0 0 0 0 IO-APIC-edge 12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042 16: 59 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1 23: 97 0 0 0 0 0 0 40 IO-APIC-fasteoi ehci_hcd:usb2 24: 2298 0 0 0 0 0 0 0 HPET_MSI-edge hpet2 25: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet3 26: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet4 27: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet5 28: 0 0 0 0 0 0 0 0 HPET_MSI-edge hpet6 29: 412442 0 0 36 1175680 0 0 208 PCI-MSI-edge ahci 30: 74 335176384 0 0 0 0 0 0 PCI-MSI-edge eth0 31: 5 10 344792 0 0 0 0 0 PCI-MSI-edge eth1-rx-0 32: 0 0 0 0 0 0 0 0 PCI-MSI-edge eth1-tx-0 33: 3 0 0 0 0 0 0 0 PCI-MSI-edge eth1 NMI: 7784 14329 4689 5198 7033 7387 6069 6332 Non-maskable interrupts LOC: 46833697 44931615 30462128 37088906 47922986 44201942 27616867 37813275 Local timer interrupts SPU: 0 0 0 0 0 0 0 0 Spurious interrupts PMI: 7784 14329 4689 5198 7033 7387 6069 6332 Performance monitoring interrupts IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts RES: 897464 372249 589429 570768 646428 605601 478042 484381 Rescheduling interrupts CAL: 92 292 281 289 267 292 288 291 Function call interrupts TLB: 206630086 265955778 173872460 156528749 143771724 221909392 129664286 115580760 TLB shootdowns TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts MCE: 0 0 0 0 0 0 0 0 Machine check exceptions MCP: 2300 2300 2300 2300 2300 2300 2300 2300 Machine check polls ERR: 0 

MIS: 0

vmstat 1 20

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 9103556 265224 5776272 0 0 1 1 2 8 0 1 99 0 0 0 0 0 9103356 265224 5776268 0 0 0 24 101813 25156 5 9 86 0 0 1 0 0 9103224 265224 5776324 0 0 0 2 139955 25205 5 12 83 0 0 3 0 0 9103968 265224 5776280 0 0 0 54 102550 24442 5 9 85 1 0 2 0 0 9103084 265224 5776304 0 0 0 0 84384 22729 3 7 90 0 0 2 0 0 9102116 265224 5776412 0 0 0 0 84072 24705 6 8 87 0 0 1 0 0 9103432 265224 5776328 0 0 0 0 108438 24144 5 9 86 0 0 0 0 0 9102924 265224 5776340 0 0 0 2 41961 23168 3 5 92 0 0 0 0 0 9102608 265224 5776364 0 0 0 90 76298 26135 5 7 87 1 0 2 0 0 9078068 265224 5776444 0 0 0 0 83315 24891 5 8 87 0 0 0 0 0 9103344 265224 5776436 0 0 0 0 67256 26539 6 7 87 0 0 1 0 0 9094300 265224 5776444 0 0 0 0 54944 24834 3 6 91 0 0 0 0 0 9103352 265224 5776460 0 0 0 2 92988 26388 5 9 86 0 0 2 0 0 9103592 265224 5776440 0 0 0 46 76231 27186 5 7 87 1 0 1 0 0 9103744 265224 5776500 0 0 0 0 67153 26006 5 7 88 0 0 3 0 0 9103056 265224 5776520 0 0 0 76 86165 26895 5 8 87 0 0 1 0 0 9094384 265224 5776568 0 0 0 84 59498 26179 4 6 90 0 0 1 0 0 9088632 265224 5776556 0 0 0 0 103184 27236 6 9 85 0 0 1 0 0 9102532 265224 5776608 0 0 0 40 94010 27663 6 9 85 0 0 1 0 0 9091052 265224 5776648 0 0 0 0 93813 26675 9 9 82 0 0 

mpstat 1

11:05:18 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 11:05:19 AM all 8.22 0.00 11.88 0.00 0.00 0.63 0.00 0.00 79.27 11:05:20 AM all 3.05 0.00 4.96 0.00 0.00 0.38 0.00 0.00 91.60 11:05:21 AM all 5.64 0.00 7.27 0.63 0.00 0.38 0.00 0.00 86.09 11:05:22 AM all 5.44 0.00 6.96 0.00 0.00 0.25 0.00 0.00 87.34 11:05:23 AM all 3.76 0.00 7.14 0.00 0.00 0.25 0.00 0.00 88.85 11:05:24 AM all 4.80 0.00 9.86 0.00 0.00 0.51 0.00 0.00 84.83 11:05:25 AM all 3.80 0.00 5.58 0.00 0.00 0.38 0.00 0.00 90.24 11:05:26 AM all 6.58 0.00 7.72 0.51 0.00 0.51 0.00 0.00 84.68 11:05:27 AM all 6.67 0.00 8.43 0.00 0.00 0.50 0.00 0.00 84.40 11:05:28 AM all 4.32 0.00 5.97 0.00 0.00 0.25 0.00 0.00 89.45 11:05:29 AM all 5.04 0.00 7.06 0.00 0.00 0.50 0.00 0.00 87.39 11:05:30 AM all 3.93 0.00 6.34 0.13 0.00 0.51 0.00 0.00 89.10 11:05:31 AM all 4.07 0.00 5.60 0.38 0.00 0.25 0.00 0.00 89.69 11:05:32 AM all 7.08 0.00 9.48 0.00 0.00 0.51 0.00 0.00 82.93 11:05:33 AM all 4.19 0.00 8.51 0.00 0.00 0.51 0.00 0.00 86.79 11:05:34 AM all 2.67 0.00 4.45 0.00 0.00 0.25 0.00 0.00 92.63 
2
  • 1
    What would really be useful is if you captured the output from all those tools when the server is under high load. Right now all the output shows a box with plenty of spare resources. Commented Sep 18, 2013 at 16:52
  • The load of 2 is disproportional considering the CPU and IO use. I hoped it would be enough. Can this load be explained by the provided info? Commented Sep 18, 2013 at 16:58

4 Answers 4

1

Maybe its too late to answer to this, and may be the reason is not the same one as in my case.

But I also abserved such high CPU load when updating my asterisk 1.8 to asterisk 11.5 (on Fedora 14 to Fedora 20), but keeping my old configuration file! In asterisk.conf, the line:

;console = yes ; Run as console (same as -c at startup).

was not commented with a semicolon as soon as the line was commented, and asterisk restarted, the CPU go back to a normal activity !!!!

0

It seems you have a quite high number of context switches and interrupts. Also if you look at the load of 2 and the running queue I think that the scenario could be caused by quick bursts of threads that finish their job very quickly so they don't spend much time in the CPU but they make grow up the load.

P.S. Excuse my english please.

2
  • Thinking about the application itself this makes a lot of sense. Would you have a suggestion that helps determine where such threads come from? I'm not sure if they are inate to Asterisk or something else Asterisk spawns that I could possibly change. Commented Sep 18, 2013 at 17:17
  • Hi Allan, I don't know Asterisk in deep but I think this behaviour could fit with it. Maybe with "strace" you could try to determine if threads come from Asterisk or not. Commented Sep 19, 2013 at 15:27
0

You may not see all processes with simple ps auxf, becasue with these options ps hides threads of multi-threaded processes. Try using ps -eLo pid,stat,comm, this will show all threads where some threads could have R or D states. And load average is average amount of R + D theads (or tasks, in linux terminology) per sample interval (which could be 100 times per second).

0

Only thing that Asterisk spends CPU time is recoding when changing codecs (not your case, but when the phones are ALAW/ULAW, and the trunk is G.729), when doing conferences and mixing audio channels into one, MusicOnHold etc. Try recoding your sound files into ALAW/ULAW. If you record calls, try to record it into *LAW (onto RAM disk) and recode to MP3 or some network storage during off-hours. If you have a voice trunk other than SIP, that also can make a difference.

This CPU is way over-dimensioned for that amount of calls. Check out Asterisk dimensioning and you will see that you can do 1000 simultaneous calls with even worse hardware.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.