5

Since a reboot yesterday, one of our virtual servers (Debian Lenny, virtualized with Xen) is constantly running out of entropy, leading to timeouts etc. when trying to connect over SSH / TLS-enabled protocols. Is there any way to check which process(es) is(/are) eating up all the entropy?

Edit:

What I tried:

  • Adding additional entropy sources: time_entropyd, rng-tools feeding urandom back into random, pseudorandom file accesses – netted about 1 MiB additional entropy per second, problems still persisted
  • Checking for unusual activity via lsof, netstat and tcpdump – nothing. No noticeable load or anything
  • Stopping daemons, restarting permanent sessions, rebooting the entire VM – no change in behaviour

What in the end worked:

  • Waiting. Since about yesterday noon, there are no connection problems anymore. Entropy is still somewhat low (128 Bytes peak), but TLS/SSH sessions have no noticeable delay anymore. I'm slowly switching our clients back to TLS (all five of them!), but I don't expect any change in behavior now. All clients are now using TLS again, no problems. Really, really strange.
3
  • Have you possibly suffering from an attack? Someone repeatedly trying to connect to an SSL-enabled service and establishing a secure connection therby drawing entropy? But correlation to the reboot? Coincidence? Commented Jul 12, 2012 at 5:41
  • The server is completely internal and not accessible from the outside. It is, however, a backup domain controller. The only thing I could think of was a background replication job (over encrypted connection) that ate up resources – as said, there was no suspicious activity. I'll file it under "shit happens". Commented Jul 12, 2012 at 9:03
  • Check this out - the kernel change is the reason unix.stackexchange.com/questions/704737/… Commented Jul 19, 2022 at 16:38

4 Answers 4

4

With lsof out as a source of diagnostic utility, would setting up something using audit work? There's no way to deplete the entropy pool without opening /dev/random, so if you audit on processing opening /dev/random, the culprit (or at least the set of candidates for further examination) should drop out fairly rapidly.

2
  • Might try that, thanks. Though you're sure that, say, the kernel's crypto sustem uses /dev/random directly? Commented Jul 9, 2012 at 14:54
  • 1
    I'm not aware of anything in the kernel that heavily consumes entropy on an ongoing basis. The things that use randomness that I'm aware of (TCP sequence numbers, for example) are all PRNG-driven, and the crypto APIs are more about getting access to underlying hardware than eating entropy. At the very least, if nothing's opening /dev/random, you'll have ruled out one big possibility, and can go digging into the kernel. Commented Jul 9, 2012 at 15:00
3

Normally for a public-facing server needing 'enough' entropy I would suggest something like an entropy-key, a hardware device (USB) adding random bits to the linux entropy pool. But you don't talk to the outside world.

Virtual machines can have a problem with lack of external randomness.

Your remark 'backup domain controller' does add a possible use of entropy: windows domains do use random numbers in certificates.

2
  • Agreed, we use SafeNet HSMs to do this with great success. Commented Sep 25, 2012 at 12:22
  • Since the server is really, really old (Pentium 4-era Xeon), and newer Xeons have a hardware RNG builtin, I don't really want to spend money on that – else I'd have bought one. Commented Sep 25, 2012 at 16:12
1

Perhaps lsof (list open files) might help. This shows, which process currently holds what files open. In your case this only helps when you catch your process(es) draining entropy, if that process does not hold the handle open for longer.

$ lsof /dev/urandom COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME xfce4-ses 1787 to 15r CHR 1,9 0t0 8199 /dev/urandom applet.py 1907 to 9r CHR 1,9 0t0 8199 /dev/urandom scp-dbus- 5028 to 10r CHR 1,9 0t0 8199 /dev/urandom firefox 6603 to 23r CHR 1,9 0t0 8199 /dev/urandom thunderbi 12218 to 23r CHR 1,9 0t0 8199 /dev/urandom 

Just a sample from my workstation. But diving deeper into lsof might help.

2
  • Empty output for urandom, and the only programs having random open are – surprise – the entropy daemons. Commented Jul 9, 2012 at 14:07
  • Well, neither lsof nor netstat turned up anything suspicious. If any, there's suspiciously low activity on the system. Commented Jul 9, 2012 at 14:27
0

If there is no better solution you might bring the big guns in and globally wrap the open() syscall to log the processes that try to open /etc/[u]random.

Just(tm) write a lib defining open() thats logs and afterwards calls the original libc open().

Hint for that: man ld.so and /etc/ld.so.preload.

We've had something similar here: https://stackoverflow.com/questions/9614184/how-to-trace-per-file-io-operations-in-linux

CAVEAT: Never did this myself. Might break your system since every open() will run through your lib. Possibly okay in debug-environments or if you're R.M. Stallman.

1
  • Well, it is a production machine, so if I'm going to test this, I'll have to wait until the weekend. But thanks for the pointer. Commented Jul 10, 2012 at 15:28

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.