Recently I noticed that one of our MX machines was struggling to perform as well as it had done in the past, and after a lot of debugging I determined it was IO-bound.
The ultimate cause of the slowdown was repeated disk access which didn't need to be made, and this has now been fixed.
Each message arriving at our server has a number of tests applied to it. Some of these tests are very lightweight, for example looking at the IP address which made the connection, whilst others require access to the complete incoming message body and are more resource-intensive.
The tests which require access to the complete incoming message include things like the virus-scanning, and Bayesian testing.
What should happen when a test requires access to the message body is :
- Message is spooled to a temporary file on disk.
- Message is tested.
Then the next test which requires access to the body of the message can use the file which is already present on the disk (this spooled file is removed when the remote server disconnects).
Unfortunately one of the tests I was performing was making a second copy of the message on disk, rather than using the pre-existing copy.
The extra overhead of spooling to disk, having the command read it, and then deleting the output and the copy of the message was just enough to slow the system down.
I've now updated things such that all the tests which require access to the message body make use of a single temporary copy on disk, if the lightweight tests all succeed and the message looks good.
(When filtering for spam the general practise is to perform the lightweight tests first - if they flag a message as spam then the heavier tests don't even need to be executed.)