We observed that messages to syslog were taking a long time to show up. What triggered our interest was messages from the end of a previous test run showing up the next one, causing spurious failure reports!
As an experiment I pushed 300 messages at syslog fairly rapidly (over the network) and observed that the last message took about 15 seconds to work its way through. But tcpdump showed that the messages were being delivered to the server promptly. Something in the syslog daemon was slowing things up. The culprit turned out to be that syslogd is, by default, configured to do synchronous writes to its log file. So as each message came in, it was written to the file, the daemon waited for the write to complete, and then proceeded to handle the next message.
Synchronous writes make sense if you want to ensure that messages don't get lost due to power failure. When you're logging locally this might make some sort of sense. The case you want to avoid is:
application -----> syslog daemon ------> disk buffer ------> disk M6 M5 M4 M3 M1 M2 ^ | Important message that tells you what went wrong just before the machine catches on fire, now mysteriously disappeared although the application thought it had been delivered.
But for messages coming in from over the network you can get yourself into this situation:
remote app ----> network buffer ----> syslog daemon ----> disk buffer ----> disk M6 M5 M4 M3 M1 M2
So when the machine catches fire, you've still lost the message because it was sitting in memory, it was just sitting in a different piece of the machine's memory. Meanwhile you're progressing very, very slowly because you're writingly only one message at a time and waiting for it to make it to disk.
There's a reasonable solution here which is to completely drain the network buffer before doing your synchronous disk write! Then you move M6,M5,M4 to disk in a single operation instead of three operations, while you haven't compromised whatever property it was you were trying to preserve. (To cap it all off, syslog over the network is not reliable anyway, so making very sure that the message makes it to disk is useless.)
But our syslogd doesn't do that, so we may just end up switching it to do asynchronous writes.