All,

How is the following for a way forward.

a. I will author a patch to the user space code to correctly parse this condition and submit it on the weekend. It will be via a new configuration item to auditd.conf just in case placing a fixed extended timeout (15-20 secs) affects memory usage for users of the auparse library. This solves the initial problem of ausearch/auparse failing to parse generated audit.
b. I am happy to instrument what ever is recommended on my hosts at home (vm's and bare metal) to provide more information, should we want to 'explain' the occurrence, given I see this every week or two and report back.

On Tue, 2021-01-19 at 16:51 -0500, Steve Grubb wrote:
On Tuesday, January 19, 2021 3:26:04 PM EST Paul Moore wrote:
On Tue, Jan 19, 2021 at 2:38 PM Burn Alting <
burn.alting@iinet.net.au
> 
wrote:
All systems use chrony (current NTP daemon). One is a VM (on top of KVM)
and the other a bare metal deployment. Does the above explain my second
data set (in the issue) from a bare metal Centos 8 host? Perhaps Lenny's
comments bare investigation. Either way, I will offer a patch to the
user space code to, based on a configuration value, manage correctly
such activity.
...

msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609921000.636:1787850):
msg=audit(1609921000.636:1787850):
msg=audit(1609921000.636:1787850):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609921008.456:1787851):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609920994.484:1787849):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):

msg=audit(1609921010.837:1787852):
Looking at the extracted snippet above where event 1787849 is out of

order we see the following timestamps:
msg=audit(1609920994.483:1787848):
msg=audit(1609920994.484:1787849):
msg=audit(1609921000.636:1787850):
msg=audit(1609921008.456:1787851):

msg=audit(1609921010.837:1787852):

... which looks correct in as much that the time doesn't appear to go
backwards between events.  As I said before, I'm not sure how Steve's
userspace works so the time may be a red herring.

It only handles one record at a time. No chance to mix things up.

The github issue says that 30-stig.rules is being used. If the system time 
changed with chrony, I would expect syscall events with adjtimex. But the 
only ones given are execve.

-Steve

Barring some weird condition where auditd disconnects and quickly
reconnects to the kernel, and/or dies and is replaced quickly, I'm not
seeing anything obvious in the kernel which would cause this.  I'm not
saying there isn't anything there, just that it isn't obvious to me at
the moment :)