This week I stumbled upon some behaviour I wasn't expecting from Sendmail.

First, what we already know... since Sendmail 8.12.0, Sendmail has split the MTA and MSP functions into 2 different processes using two different config files (sendmail.cf and submit.cf respectively). Many companies still have archaic security policies that have NOT been updated to accommodate this change, and thus they still require absolutely NO Sendmail processes running, despite the fact that Sendmail can be configured to ONLY allow connections from the MSP on the local host.

This is perfectly understandable, and doable by stopping all Sendmail daemons and then configuring the MSP (via the submit.cf file) to pass mails directly to a SMARTHOST or a different MTAhost, thus effectively running in MSP only mode.

Generally this does the trick and the security auditors get the functionality they want, at the expense of not being able to deliver ANY mail locally (not even root) or perform any aliasing (brief summary of caveats).

This week I stumbled upon a problem where running Sendmail in MSP only mode was causing emails to be queued on some hosts, but not others, despite all hosts using exactly the same submit.cf.

Well, a lot of investigating pursued and I worked out what was causing this odd behaviour...

... the system load.

Whilst running through the "/usr/lib/sendmail -v -d0-99.99" debug outputs (not for the faint hearted) I noticed the following for the hosts that succeed:

redefine(load_avg as 34)
shouldqueue: CurrentLA=34, pri=30242: false (by calculation)

I then checked the hosts that always fails and found:

redefine(load_avg as 68)
shouldqueue: CurrentLA=68, pri=30242: true (by calculation)

Essentially what is happening here is the load on the machines is above the value of QueueLA multiplied by the number of CPUs, which in turn is causing Sendmail to perform a calculation to determine if the mail should be queued or delivered immediately.

In the first instance, the calculation allowed immediate delivery, but didn't on the latter.

The thinking can be loosely equated as follows:

if [ CurrentLA > (QueueLA * CPUs) ]
then
   if [ MsgPriority > QueueFactor / (CurrentLA - (QueueLA * CPUs) + 1) ]
   then
    queue for later delivery
   else
    deliver immediately
   fi
else
   deliver immediately
fi

Using the above information, and the fact that QueueLA and QueueFactor are commented out in submit.cf by default, thus they are using the default values of 8 and 600000 respectively, we can perform the calculations manually ourselves.

First successful delivery:

- the initial test:

CurrentLA > QueueLA * CPUs
       34 > 8 * 2
       34 > 16

- as CurrentLA is > 16, we move onto the second calculation:

MsgPriority > QueueFactor / (CurrentLA - (QueueLA * CPUs) + 1)
      30242 > 600000 / (34 - (8 * 2) + 1)
      30242 > 600000 / (34 - 17 )
      30242 > 35294

As the RHS is greater than the LHS, immediate delivery takes place.

Now a failure:

- the initial test:

CurrentLA > QueueLA * CPUs
       68 > 8 * 2
       68 > 16

- as CurrentLA is > 16, we move onto the second calculation:

MsgPriority > QueueFactor / (CurrentLA - (QueueLA * CPUs) + 1)
      30242 > 600000 / (68 - (8 * 2) + 1)
      30242 > 600000 / (68 - 17 )
      30242 > 11765

As the RHS is NOT greater than the LHS, the message is queued for later delivery.

Oh, I forgot to mention - the CurrentLA is the number of processes in the run queue over the last minute, which is essentially the first column in the load average values shown by /usr/bin/uptime.

You can resolve this issue by increasing the value of QueueLA. Note, when manually specifying the QueueLA, Sendmail treats the figure as an absolute value and does NOT multiply it by the number of CPUs.

You can change this value by adding the following to your submit.mc file:

define(`confQUEUE_LA', `50')dnl (I've chosen a figure at random)

... and regenerate the submit.cf as usual.

Please bear in mind, you'll need to weigh up the risks of increasing this value too high as setting this value too high can result in Sendmail bogging down the host if it starts sending a lot of emails for some unknown reason.

It's generally advisable to set this value to a sane level, eg 50, and then run a periodic client queue checker.

FYI: This is all briefly discussed at
https://www.sendmail.org/~ca/email/doc8.12/op-sh-4.html#sh-4.4