On Tue, 2021-01-19 at 14:11 -0500, Paul Moore wrote:
On Tue, Jan 19, 2021 at 3:18 AM Burn Alting <
burn.alting@iinet.net.au
> wrote:
I tend to have a rigorous auditing posture (see the rules loaded in 
https://github.com/linux-audit/audit-userspace/issues/148
) which is not normal for most. Perhaps, Paul, you have hit the nail on the head by stating that this 'severe delay' is not that unreasonable given my rules posture and we just need to 'deal with it' in user space.
We still get the event data, I just need to adjust the user space tools to deal with this occurrence.
As for what the system is doing, in my home case it's a Centos 7 VM running a tomcat service which only gets busy every 20 minutes and the other is a HPE Z800 running Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist in 'validating'/testing the delay. Advise and I will run.

I took a (slightly) closer look at the queuing code just now and the
corner case I suspected doesn't look very promising here.  On failure
to send a record to userspace, the record is put back at the front of
the queue to preserve ordering so the kernel *should* still emit
records in order even when auditd is under pressure, dead, or in the
process of a restart.  Also, in this case the kernel kicks the auditd
connection, disconnecting auditd's netlink socket; I imagine that
would be a notable event on your systems.

However, looking at the timestamps in the audit events you posted, I
noticed something a little odd.  If I align the timestamps a bit
better let's see if it jumps out at you guys ...

type=XXX msg=audit(1609519900.159:44606): ...
type=XXX msg=audit(1609519900.161:44607): ...
type=XXX msg=audit(1609519900.163:44608): ...
type=XXX msg=audit(1609519896.829:44609): ...
type=XXX msg=audit(1609519900.170:44610): ...

Let me guess Burn, you're running NTP/PTP :)  I'm not sure how auditd
handles things like this, but it looks like there was a small negative
time correction between events which caused the odd scenario where
event N+1 actually occurred before event N according to the wall
clock.  In other words, your system is time traveling ;)
.
All systems use chrony (current NTP daemon). One is a VM (on top of KVM) and the other a bare metal deployment. Does the above explain my second data set (in the issue) from a bare metal Centos 8 host?
Perhaps Lenny's comments bare investigation. Either way, I will offer a patch to the user space code to, based on a configuration value, manage correctly such activity.

msg=audit(1609920994.481:1787844):
msg=audit(1609920994.481:1787844):
msg=audit(1609920994.481:1787844):
msg=audit(1609920994.481:1787844):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787845):
msg=audit(1609920994.481:1787846):
msg=audit(1609920994.481:1787846):
msg=audit(1609920994.481:1787846):
msg=audit(1609920994.481:1787846):
msg=audit(1609920994.481:1787846):
msg=audit(1609920994.482:1787847):
msg=audit(1609920994.482:1787847):
msg=audit(1609920994.482:1787847):
msg=audit(1609920994.482:1787847):
msg=audit(1609920994.482:1787847):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609921000.636:1787850):
msg=audit(1609921000.636:1787850):
msg=audit(1609921000.636:1787850):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.837:1787853):
msg=audit(1609921010.838:1787854):
msg=audit(1609921010.838:1787854):
msg=audit(1609921010.838:1787854):
msg=audit(1609921010.838:1787854):
msg=audit(1609921010.838:1787854):
msg=audit(1609921010.839:1787855):
msg=audit(1609921010.839:1787855):
msg=audit(1609921010.839:1787855):
msg=audit(1609921010.839:1787855):
msg=audit(1609921010.839:1787855):
msg=audit(1609921010.840:1787856):
msg=audit(1609921010.840:1787856):
msg=audit(1609921010.840:1787856):
msg=audit(1609921010.840:1787856):
msg=audit(1609921010.840:1787856):
msg=audit(1609921010.841:1787857):
msg=audit(1609921010.841:1787857):
msg=audit(1609921010.841:1787857):
msg=audit(1609921010.841:1787857):
msg=audit(1609921010.841:1787857):