All of lore.kernel.org
 help / color / mirror / Atom feed
* Occasional delayed output of events
@ 2021-01-01 21:22 Burn Alting
  2021-01-03 15:41 ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-01 21:22 UTC (permalink / raw)
  To: Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 4544 bytes --]

All,

Sometimes, events recorded in /var/log/audit/audit.log appear some seconds past co-
located events which results in auparse:au_check_events() marking these events
complete before they are. An example of this can be seen below with the offending
event id 44609.
This has been plaguing me for a year or two and this morning was the first time I
still had access to the raw audit.log files (I monitor a lot of event types and the
log files roll over fairly quickly). 
The example below is from a fully patched Centos 7 but I have also seen this on a
patched Fedora 32.

Has this been seen before? Do we need to re-evaluate how auparse 'completes' an
event (ie 2 seconds is too quick).

type=SYSCALL msg=audit(1609519896.824:44603):
type=EXECVE msg=audit(1609519896.824:44603):
type=CWD msg=audit(1609519896.824:44603):
type=PATH msg=audit(1609519896.824:44603):
type=PATH msg=audit(1609519896.824:44603):
type=PROCTITLE msg=audit(1609519896.824:44603):
type=SYSCALL msg=audit(1609519896.826:44604):
type=EXECVE msg=audit(1609519896.826:44604):
type=CWD msg=audit(1609519896.826:44604):
type=PATH msg=audit(1609519896.826:44604):
type=PATH msg=audit(1609519896.826:44604):
type=PROCTITLE msg=audit(1609519896.826:44604):
type=SYSCALL msg=audit(1609519801.338:44605):
type=EXECVE msg=audit(1609519801.338:44605):
type=CWD msg=audit(1609519801.338:44605):
type=PATH msg=audit(1609519801.338:44605):
type=PATH msg=audit(1609519801.338:44605):
type=PROCTITLE msg=audit(1609519801.338:44605):
type=SYSCALL msg=audit(1609519900.159:44606):
type=EXECVE msg=audit(1609519900.159:44606):
type=CWD msg=audit(1609519900.159:44606):
type=PATH msg=audit(1609519900.159:44606):
type=PATH msg=audit(1609519900.159:44606):
type=PROCTITLE msg=audit(1609519900.159:44606):
type=SYSCALL msg=audit(1609519900.161:44607):
type=EXECVE msg=audit(1609519900.161:44607):
type=CWD msg=audit(1609519900.161:44607):
type=PATH msg=audit(1609519900.161:44607):
type=PATH msg=audit(1609519900.161:44607):
type=PROCTITLE msg=audit(1609519900.161:44607):
type=SYSCALL msg=audit(1609519896.829:44609):
type=EXECVE msg=audit(1609519896.829:44609):
type=CWD msg=audit(1609519896.829:44609):
type=SYSCALL msg=audit(1609519900.163:44608):
type=PATH msg=audit(1609519896.829:44609):
type=EXECVE msg=audit(1609519900.163:44608):
type=CWD msg=audit(1609519900.163:44608):
type=PATH msg=audit(1609519896.829:44609):
type=PROCTITLE msg=audit(1609519896.829:44609):
type=PATH msg=audit(1609519900.163:44608):
type=PATH msg=audit(1609519900.163:44608):
type=PROCTITLE msg=audit(1609519900.163:44608):
type=CRED_DISP msg=audit(1609519900.170:44610):
type=SYSCALL msg=audit(1609519900.170:44611):
type=EXECVE msg=audit(1609519900.170:44611):
type=CWD msg=audit(1609519900.170:44611):
type=PATH msg=audit(1609519900.170:44611):
type=PATH msg=audit(1609519900.170:44611):
type=PROCTITLE msg=audit(1609519900.170:44611):

which results in an ausearch -i output sequence of
----
type=PROCTITLE msg=audit(02/01/21 03:51:36.826:44604)
type=PATH msg=audit(02/01/21 03:51:36.826:44604)
type=PATH msg=audit(02/01/21 03:51:36.826:44604)
type=CWD msg=audit(02/01/21 03:51:36.826:44604)
type=EXECVE msg=audit(02/01/21 03:51:36.826:44604)
type=SYSCALL msg=audit(02/01/21 03:51:36.826:44604)
----
type=CWD msg=audit(02/01/21 03:51:36.829:44609)
type=EXECVE msg=audit(02/01/21 03:51:36.829:44609)
type=SYSCALL msg=audit(02/01/21 03:51:36.829:44609)
----
type=PROCTITLE msg=audit(02/01/21 03:51:36.829:44609)
type=PATH msg=audit(02/01/21 03:51:36.829:44609)
type=PATH msg=audit(02/01/21 03:51:36.829:44609)
----
type=PROCTITLE msg=audit(02/01/21 03:51:40.159:44606)
type=PATH msg=audit(02/01/21 03:51:40.159:44606)
type=PATH msg=audit(02/01/21 03:51:40.159:44606)
type=CWD msg=audit(02/01/21 03:51:40.159:44606)
type=EXECVE msg=audit(02/01/21 03:51:40.159:44606)
type=SYSCALL msg=audit(02/01/21 03:51:40.159:44606)
----
type=PROCTITLE msg=audit(02/01/21 03:51:40.161:44607)
type=PATH msg=audit(02/01/21 03:51:40.161:44607)
type=PATH msg=audit(02/01/21 03:51:40.161:44607)
type=CWD msg=audit(02/01/21 03:51:40.161:44607)
type=EXECVE msg=audit(02/01/21 03:51:40.161:44607)
type=SYSCALL msg=audit(02/01/21 03:51:40.161:44607)
----
type=PROCTITLE msg=audit(02/01/21 03:51:40.163:44608)
type=PATH msg=audit(02/01/21 03:51:40.163:44608)
type=PATH msg=audit(02/01/21 03:51:40.163:44608)
type=CWD msg=audit(02/01/21 03:51:40.163:44608)
type=EXECVE msg=audit(02/01/21 03:51:40.163:44608)
type=SYSCALL msg=audit(02/01/21 03:51:40.163:44608)
----
type=CRED_DISP msg=audit(02/01/21 03:51:40.170:44610)


Regards
Burn

[-- Attachment #1.2: Type: text/html, Size: 5690 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-01 21:22 Occasional delayed output of events Burn Alting
@ 2021-01-03 15:41 ` Steve Grubb
  2021-01-04  7:55   ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-03 15:41 UTC (permalink / raw)
  To: Linux Audit

On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> Sometimes, events recorded in /var/log/audit/audit.log appear some seconds
> past co- located events which results in auparse:au_check_events() marking
> these events complete before they are. An example of this can be seen
> below with the offending event id 44609.
>
> This has been plaguing me for a year or two and this morning was the first
> time I still had access to the raw audit.log files (I monitor a lot of
> event types and the log files roll over fairly quickly).
> The example below is from a fully patched Centos 7 but I have also seen
> this on a patched Fedora 32.
> 
> Has this been seen before? Do we need to re-evaluate how auparse
> 'completes' an event (ie 2 seconds is too quick).

I have never seen this. But on the way to disk, auditd only does light 
processing of the event.  If the format is enriched, it looks things up on a 
record by record basis. It does not collect events until they are complete - 
it dumps it to disk as soon as it can tack on the extra information.

So, the question would be, does this delay happen on the way to disk? Or is 
this an artifact of post processing the logs with an auparse based utility? 
Can this be observed repeatedly on the same raw logs? If so, then maybe 
auparse does have some issue. But if this is a post processing issue, then 
the wall clock doesn't matter because this event should have collected up 
together.

I'd say this merits some investigation.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-03 15:41 ` Steve Grubb
@ 2021-01-04  7:55   ` Burn Alting
  2021-01-04 14:46     ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-04  7:55 UTC (permalink / raw)
  To: Steve Grubb, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 2838 bytes --]

On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > Sometimes, events recorded in /var/log/audit/audit.log appear some seconds
> > past co- located events which results in auparse:au_check_events() marking
> > these events complete before they are. An example of this can be seen
> > below with the offending event id 44609.
> > 
> > This has been plaguing me for a year or two and this morning was the first
> > time I still had access to the raw audit.log files (I monitor a lot of
> > event types and the log files roll over fairly quickly).
> > The example below is from a fully patched Centos 7 but I have also seen
> > this on a patched Fedora 32.
> > 
> > Has this been seen before? Do we need to re-evaluate how auparse
> > 'completes' an event (ie 2 seconds is too quick).
> 
> I have never seen this. But on the way to disk, auditd only does light 
> processing of the event.  If the format is enriched, it looks things up on a 
> record by record basis. It does not collect events until they are complete - 
> it dumps it to disk as soon as it can tack on the extra information.
> 
> So, the question would be, does this delay happen on the way to disk? Or is 
> this an artifact of post processing the logs with an auparse based utility? 
> Can this be observed repeatedly on the same raw logs? If so, then maybe 
> auparse does have some issue. But if this is a post processing issue, then 
> the wall clock doesn't matter because this event should have collected up 
> together.
> 
> I'd say this merits some investigation.

OK. I think this needs to be addressed on two fronts. There may be more.
A.  Within post processing ... a 2 second timeout is not sufficient. I would suggest we modify auparse.c:au_check_events() to
  i) perform the event type checks first, then
  ii) increase the timeout of 2 seconds to be a larger value based on empirical tests.

B. I will build a temporary auditd daemon to perform some empirical testing to see how long events can reside within the daemon. I may need some advice on this.
I assume that the code that sets the timestamp is in src/auditd.c:send_audit_event(). If so, I will see if I can put orchestration debug code in to monitor an event's
'time in daemon' until this point. I will then report on this.

I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread, then the testing switch in A. will not be a big issue (time cost wise). It will also mean that if we
over compensate the timeout that would cause additional memory cost in auparse() then this is mittigated.

With respect to 'There may be more' fronts. Are there other points in the 'audit ecosystem' that makes use of the '2 second timeout'.

I will start work on this, this coming weekend if the above makes sense.

Regards
Burn


[-- Attachment #1.2: Type: text/html, Size: 3612 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-04  7:55   ` Burn Alting
@ 2021-01-04 14:46     ` Steve Grubb
  2021-01-04 20:12       ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-04 14:46 UTC (permalink / raw)
  To: Linux Audit, burn

On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > Sometimes, events recorded in /var/log/audit/audit.log appear some
> > > seconds past co- located events which results in
> > > auparse:au_check_events() marking  these events complete before they
> > > are. An example of this can be seen below with the offending event id
> > > 44609.
> > > 
> > > This has been plaguing me for a year or two and this morning was the
> > > first time I still had access to the raw audit.log files (I monitor a lot
> > > of event types and the log files roll over fairly quickly).
> > > The example below is from a fully patched Centos 7 but I have also seen
> > > this on a patched Fedora 32.
> > > 
> > > Has this been seen before? Do we need to re-evaluate how auparse
> > > 'completes' an event (ie 2 seconds is too quick).
> > 
> > I have never seen this. But on the way to disk, auditd only does light
> > processing of the event.  If the format is enriched, it looks things up
> > on a record by record basis. It does not collect events until they are
> > complete - it dumps it to disk as soon as it can tack on the extra
> > information.
> > 
> > So, the question would be, does this delay happen on the way to disk? Or
> > is this an artifact of post processing the logs with an auparse based
> > utility? Can this be observed repeatedly on the same raw logs? If so,
> > then maybe auparse does have some issue. But if this is a post
> > processing issue, then the wall clock doesn't matter because this event
> > should have collected up together.
> > 
> > I'd say this merits some investigation.
> 
> OK. I think this needs to be addressed on two fronts. There may be more.
> A.  Within post processing ... a 2 second timeout is not sufficient. I
> would suggest we modify auparse.c:au_check_events() to i) perform the
> event type checks first, then
>   ii) increase the timeout of 2 seconds to be a larger value based on
> empirical tests.

In the post processing, there are 2 use cases. The first is events that are on 
disk. In this usage, the 2 second timeout does not come into effect because 
the events are run through probably within nanoseconds or microseconds at the 
worst. The only time it would come into effect is if the terminating record 
is missing.

The other use case is realtime processing as an audispd plugin. In this use 
case the wall clock could matter because records could potentially get lost 
due to overflows or another plugin taking too long. This is the use case where 
the wall clock matters. And again, it matters when records get lost or 
delayed in transit. As long as everything is flowing, it should not factor 
into event processing.

> B. I will build a temporary auditd daemon to perform some empirical testing
> to see how long events can reside within the daemon. I may need some
> advice on this. I assume that the code that sets the timestamp is in
> src/auditd.c:send_audit_event().

This is only for audit daemon's internal events. For all "real" events, it's 
set in the kernel.

> If so, I will see if I can put orchestration debug code in to monitor an
> event's 'time in daemon' until this point. I will then report on this.
> 
> I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread,
> then the testing switch in A. will not be a big issue (time cost wise). It
> will also mean that if we over compensate the timeout that would cause
> additional memory cost in auparse() then this is mittigated.

I'd suggest breaking up the event completion tests so that an exact 
collection termination reason code could be associated to the event.

> With respect to 'There may be more' fronts. Are there other points in the
> 'audit ecosystem' that makes use of the '2 second timeout'.

Ausearch/report has its own special copy of the event collection logic. It 
should be nearly identical to what auparse does.

> I will start work on this, this coming weekend if the above makes sense.

One other thought, the current shipping code is audit-3.0, doing a diff 
between it and audit-2.8.5 for the auparse directory does show some 
differences in event collection/grouping/next_event. A lot of the differences 
are cosmetic to fix extra whitespace or indentation. But if you skip all that, 
there are some real changes that probably were because of bug reports. For 
example,


@@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event(
        if (lowest && lowest->status == EBS_COMPLETE) {
                lowest->status = EBS_EMPTY;
                au->au_ready--;
-               // Try to consolidate the array so that we iterate
-               // over a smaller portion next time
-               if (lowest == &lol->array[lol->maxi]) {
-                       au_lolnode *ptr = lowest;
-                       while (ptr->status == EBS_EMPTY && lol->maxi > 0) {
-                               lol->maxi--;
-                               ptr = &lol->array[lol->maxi];
-                       }
-               }
                return lowest->l;
        }

and

@@ -1536,6 +1550,13 @@ static int au_auparse_next_event(auparse
                aup_list_create(l);
                aup_list_set_event(l, &e);
                aup_list_append(l, au->cur_buf, au->list_idx, au-
>line_number);
+               // Eat standalone EOE - main event was already marked 
complete
+               if (l->head->type == AUDIT_EOE) {
+                       au->cur_buf = NULL;
+                       aup_list_clear(l);
+                       free(l);
+                       continue;
+               }
                if (au_lol_append(au->au_lo, l) == NULL) {
                        free((char *)e.host);
 #ifdef LOL_EVENTS_DEBUG01

I don't know if those have an effect on what you are seeing. But that is the 
only substantial changes that I can see.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-04 14:46     ` Steve Grubb
@ 2021-01-04 20:12       ` Burn Alting
  2021-01-10  4:39         ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-04 20:12 UTC (permalink / raw)
  To: Steve Grubb, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 7556 bytes --]

On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
> On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > > Sometimes, events recorded in /var/log/audit/audit.log appear some
> > > > seconds past co- located events which results in
> > > > auparse:au_check_events() marking  these events complete before they
> > > > are. An example of this can be seen below with the offending event id
> > > > 44609.
> > > > 
> > > > This has been plaguing me for a year or two and this morning was the
> > > > first time I still had access to the raw audit.log files (I monitor a lot
> > > > of event types and the log files roll over fairly quickly).
> > > > The example below is from a fully patched Centos 7 but I have also seen
> > > > this on a patched Fedora 32.
> > > > 
> > > > Has this been seen before? Do we need to re-evaluate how auparse
> > > > 'completes' an event (ie 2 seconds is too quick).
> > > 
> > > I have never seen this. But on the way to disk, auditd only does light
> > > processing of the event.  If the format is enriched, it looks things up
> > > on a record by record basis. It does not collect events until they are
> > > complete - it dumps it to disk as soon as it can tack on the extra
> > > information.
> > > 
> > > So, the question would be, does this delay happen on the way to disk? Or
> > > is this an artifact of post processing the logs with an auparse based
> > > utility? Can this be observed repeatedly on the same raw logs? If so,
> > > then maybe auparse does have some issue. But if this is a post
> > > processing issue, then the wall clock doesn't matter because this event
> > > should have collected up together.
> > > 
> > > I'd say this merits some investigation.
> > 
> > OK. I think this needs to be addressed on two fronts. There may be more.
> > A.  Within post processing ... a 2 second timeout is not sufficient. I
> > would suggest we modify auparse.c:au_check_events() to i) perform the
> > event type checks first, then
> >   ii) increase the timeout of 2 seconds to be a larger value based on
> > empirical tests.
> 
> In the post processing, there are 2 use cases. The first is events that are on 
> disk. In this usage, the 2 second timeout does not come into effect because 
> the events are run through probably within nanoseconds or microseconds at the 
> worst. The only time it would come into effect is if the terminating record 
> is missing.

In this first case, the 2 second timeout is on the event's time, not the 'processing
time'.  See ausearch-lol.c:check_events() and auparse.c:au_get_ready_event().
And I use the checkpointing code to avoid the incomplete event issue.

In my case, I have not lost records, it's just that an event has arrived on disk
with an event time more than 2 seconds after the previously written event.
Basically,
a. The event was delayed getting to auditd and we look to the kernel for a solution.
b. The event arrived at a reasonable point in time at auditd and for some reason
auditd delayed it's printing (by the way I tend to use RAW log format, not enriched.

In either case, I believe ausearch-lol.c:check_events() and
auparse.c:au_get_ready_event() do need to be changed as we have complete events
written by auditd
which these two routines fail to process properly.

> 
> The other use case is realtime processing as an audispd plugin. In this use 
> case the wall clock could matter because records could potentially get lost 
> due to overflows or another plugin taking too long. This is the use case where 
> the wall clock matters. And again, it matters when records get lost or 
> delayed in transit. As long as everything is flowing, it should not factor 
> into event processing.
> 
> > B. I will build a temporary auditd daemon to perform some empirical testing
> > to see how long events can reside within the daemon. I may need some
> > advice on this. I assume that the code that sets the timestamp is in
> > src/auditd.c:send_audit_event().
> 
> This is only for audit daemon's internal events. For all "real" events, it's 
> set in the kernel.

If that is the case and the kernel is establishing the timestamp, then either the
kernel has delayed the events
arrival at the daemon or the daemon has delayed it's writing.

> 
> > If so, I will see if I can put orchestration debug code in to monitor an
> > event's 'time in daemon' until this point. I will then report on this.
> > 
> > I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread,
> > then the testing switch in A. will not be a big issue (time cost wise). It
> > will also mean that if we over compensate the timeout that would cause
> > additional memory cost in auparse() then this is mittigated.
> 
> I'd suggest breaking up the event completion tests so that an exact 
> collection termination reason code could be associated to the event.
> 
> > With respect to 'There may be more' fronts. Are there other points in the
> > 'audit ecosystem' that makes use of the '2 second timeout'.
> 
> Ausearch/report has its own special copy of the event collection logic. It 
> should be nearly identical to what auparse does.

They appear identical  ... ausearch-lol.c:check_events() and
auparse.c:au_get_ready_event().
> 
> > I will start work on this, this coming weekend if the above makes sense.
> 
> One other thought, the current shipping code is audit-3.0, doing a diff 
> between it and audit-2.8.5 for the auparse directory does show some 
> differences in event collection/grouping/next_event. A lot of the differences 
> are cosmetic to fix extra whitespace or indentation. But if you skip all that, 
> there are some real changes that probably were because of bug reports. For 
> example,

I will go through these, although this occurs on my Centos 7's (audit-2.8.5-4.el7.x86_64) as well as my 8's (audit-3.0-0.17.20191104git1c2f876.el8.x86_64).

> 
> @@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event(
>         if (lowest && lowest->status == EBS_COMPLETE) {
>                 lowest->status = EBS_EMPTY;
>                 au->au_ready--;
> -               // Try to consolidate the array so that we iterate
> -               // over a smaller portion next time
> -               if (lowest == &lol->array[lol->maxi]) {
> -                       au_lolnode *ptr = lowest;
> -                       while (ptr->status == EBS_EMPTY && lol->maxi > 0) {
> -                               lol->maxi--;
> -                               ptr = &lol->array[lol->maxi];
> -                       }
> -               }
>                 return lowest->l;
>         }
> 
> and
> 
> @@ -1536,6 +1550,13 @@ static int au_auparse_next_event(auparse
>                 aup_list_create(l);
>                 aup_list_set_event(l, &e);
>                 aup_list_append(l, au->cur_buf, au->list_idx, au-
> > line_number);
> +               // Eat standalone EOE - main event was already marked 
> complete
> +               if (l->head->type == AUDIT_EOE) {
> +                       au->cur_buf = NULL;
> +                       aup_list_clear(l);
> +                       free(l);
> +                       continue;
> +               }
>                 if (au_lol_append(au->au_lo, l) == NULL) {
>                         free((char *)e.host);
>  #ifdef LOL_EVENTS_DEBUG01
> 
> I don't know if those have an effect on what you are seeing. But that is the 
> only substantial changes that I can see.
> 
> -Steve
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 10354 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-04 20:12       ` Burn Alting
@ 2021-01-10  4:39         ` Burn Alting
  2021-01-15 22:18           ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-10  4:39 UTC (permalink / raw)
  To: Steve Grubb, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 8437 bytes --]

On Tue, 2021-01-05 at 07:12 +1100, Burn Alting wrote:
> On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
> > On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> > > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > > > Sometimes, events recorded in /var/log/audit/audit.log appear someseconds
> > > > > past co- located events which results inauparse:au_check_events()
> > > > > marking  these events complete before theyare. An example of this can be
> > > > > seen below with the offending event id44609.
> > > > > This has been plaguing me for a year or two and this morning was thefirst
> > > > > time I still had access to the raw audit.log files (I monitor a lotof
> > > > > event types and the log files roll over fairly quickly).The example below
> > > > > is from a fully patched Centos 7 but I have also seenthis on a patched
> > > > > Fedora 32.
> > > > > Has this been seen before? Do we need to re-evaluate how
> > > > > auparse'completes' an event (ie 2 seconds is too quick).
> > > > 
> > > > I have never seen this. But on the way to disk, auditd only does
> > > > lightprocessing of the event.  If the format is enriched, it looks things
> > > > upon a record by record basis. It does not collect events until they
> > > > arecomplete - it dumps it to disk as soon as it can tack on the
> > > > extrainformation.
> > > > So, the question would be, does this delay happen on the way to disk? Oris
> > > > this an artifact of post processing the logs with an auparse basedutility?
> > > > Can this be observed repeatedly on the same raw logs? If so,then maybe
> > > > auparse does have some issue. But if this is a postprocessing issue, then
> > > > the wall clock doesn't matter because this eventshould have collected up
> > > > together.
> > > > I'd say this merits some investigation.
> > > 
> > > OK. I think this needs to be addressed on two fronts. There may be
> > > more.A.  Within post processing ... a 2 second timeout is not sufficient.
> > > Iwould suggest we modify auparse.c:au_check_events() to i) perform theevent
> > > type checks first, then  ii) increase the timeout of 2 seconds to be a larger
> > > value based onempirical tests.
> > 
> > In the post processing, there are 2 use cases. The first is events that are on
> > disk. In this usage, the 2 second timeout does not come into effect because the
> > events are run through probably within nanoseconds or microseconds at the worst.
> > The only time it would come into effect is if the terminating record is missing.
> 
> In this first case, the 2 second timeout is on the event's time, not the
> 'processing time'.  See ausearch-lol.c:check_events() and
> auparse.c:au_get_ready_event().
> And I use the checkpointing code to avoid the incomplete event issue.
> 
> In my case, I have not lost records, it's just that an event has arrived on disk
> with an event time more than 2 seconds after the previously written event.
> Basically,
> a. The event was delayed getting to auditd and we look to the kernel for a
> solution.
> b. The event arrived at a reasonable point in time at auditd and for some reason
> auditd delayed it's printing (by the way I tend to use RAW log format, not
> enriched.
> 
> In either case, I believe ausearch-lol.c:check_events() and
> auparse.c:au_get_ready_event() do need to be changed as we have complete events
> written by auditd
> which these two routines fail to process properly.
> 
Changing the two second timeout in ausearch-lol.c:check_events() and, one assumes
in  auparse.c:au_get_ready_event() (but I have not tested the auparse code) fixes
the processing  of the delayed event.
Changing the value to say 10 seconds fixes my example use case, but given the kernel
or auditd could emit an event with a larger delay, should this be a configuration
item in /etc/audit/auditd.conf?

I have raised both a bugzilla report (
https://bugzilla.redhat.com/show_bug.cgi?id=1914603) and Issue (
https://github.com/linux-audit/audit-userspace/issues/148)

How do you want me to proceed ... a simple change to 10 seconds or a more versatile
configuration item in auditd.conf?
I can perform either and issue a PR if so required.
 
> > The other use case is realtime processing as an audispd plugin. In this use case
> > the wall clock could matter because records could potentially get lost due to
> > overflows or another plugin taking too long. This is the use case where the wall
> > clock matters. And again, it matters when records get lost or delayed in
> > transit. As long as everything is flowing, it should not factor into event
> > processing.
> > > B. I will build a temporary auditd daemon to perform some empirical testingto
> > > see how long events can reside within the daemon. I may need someadvice on
> > > this. I assume that the code that sets the timestamp is
> > > insrc/auditd.c:send_audit_event().
> > 
> > This is only for audit daemon's internal events. For all "real" events, it's set
> > in the kernel.
> 
> If that is the case and the kernel is establishing the timestamp, then either the
> kernel has delayed the eventsarrival at the daemon or the daemon has delayed it's
> writing.
> > > If so, I will see if I can put orchestration debug code in to monitor
> > > anevent's 'time in daemon' until this point. I will then report on this.
> > > I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread,then
> > > the testing switch in A. will not be a big issue (time cost wise). Itwill also
> > > mean that if we over compensate the timeout that would causeadditional memory
> > > cost in auparse() then this is mittigated.
> > 
> > I'd suggest breaking up the event completion tests so that an exact collection
> > termination reason code could be associated to the event.
> > > With respect to 'There may be more' fronts. Are there other points in
> > > the'audit ecosystem' that makes use of the '2 second timeout'.
> > 
> > Ausearch/report has its own special copy of the event collection logic. It
> > should be nearly identical to what auparse does.
> 
> They appear identical  ... ausearch-lol.c:check_events() and
> auparse.c:au_get_ready_event().
> > > I will start work on this, this coming weekend if the above makes sense.
> > 
> > One other thought, the current shipping code is audit-3.0, doing a diff between
> > it and audit-2.8.5 for the auparse directory does show some differences in event
> > collection/grouping/next_event. A lot of the differences are cosmetic to fix
> > extra whitespace or indentation. But if you skip all that, there are some real
> > changes that probably were because of bug reports. For example,
> 
> I will go through these, although this occurs on my Centos 7's (audit-2.8.5-
> 4.el7.x86_64) as well as my 8's (audit-3.0-0.17.20191104git1c2f876.el8.x86_64).
> > @@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event(        if (lowest
> > && lowest->status == EBS_COMPLETE) {                lowest->status =
> > EBS_EMPTY;                au->au_ready--;-               // Try to consolidate
> > the array so that we iterate-               // over a smaller portion next
> > time-               if (lowest == &lol->array[lol->maxi])
> > {-                       au_lolnode *ptr = lowest;-                       while
> > (ptr->status == EBS_EMPTY && lol->maxi > 0)
> > {-                               lol->maxi--;-                               ptr
> > = &lol->array[lol
> > ->maxi];-                       }-               }                return lowest-
> > >l;        }
> > and
> > @@ -1536,6 +1550,13 @@ static int
> > au_auparse_next_event(auparse                aup_list_create(l);                
> > aup_list_set_event(l, &e);                aup_list_append(l, au->cur_buf, au-
> > >list_idx, au-
> > > line_number);
> > +               // Eat standalone EOE - main event was already marked
> > complete+               if (l->head->type == AUDIT_EOE)
> > {+                       au->cur_buf =
> > NULL;+                       aup_list_clear(l);+                       free(l);+
> >                        continue;+               }                if
> > (au_lol_append(au->au_lo, l) == NULL) {                        free((char
> > *)e.host); #ifdef LOL_EVENTS_DEBUG01
> > I don't know if those have an effect on what you are seeing. But that is the
> > only substantial changes that I can see.
> > -Steve

Burn

[-- Attachment #1.2: Type: text/html, Size: 11339 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-10  4:39         ` Burn Alting
@ 2021-01-15 22:18           ` Burn Alting
  2021-01-16  0:35             ` Richard Guy Briggs
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-15 22:18 UTC (permalink / raw)
  To: Steve Grubb, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 9523 bytes --]

On Sun, 2021-01-10 at 15:39 +1100, Burn Alting wrote:
> On Tue, 2021-01-05 at 07:12 +1100, Burn Alting wrote:
> > On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
> > > On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> > > > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > > > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > > > > Sometimes, events recorded in /var/log/audit/audit.log appear
> > > > > > someseconds past co- located events which results
> > > > > > inauparse:au_check_events() marking  these events complete before
> > > > > > theyare. An example of this can be seen below with the offending event
> > > > > > id44609.
> > > > > > This has been plaguing me for a year or two and this morning was
> > > > > > thefirst time I still had access to the raw audit.log files (I monitor a
> > > > > > lotof event types and the log files roll over fairly quickly).The
> > > > > > example below is from a fully patched Centos 7 but I have also seenthis
> > > > > > on a patched Fedora 32.
> > > > > > Has this been seen before? Do we need to re-evaluate how
> > > > > > auparse'completes' an event (ie 2 seconds is too quick).
> > > > > 
> > > > > I have never seen this. But on the way to disk, auditd only does
> > > > > lightprocessing of the event.  If the format is enriched, it looks things
> > > > > upon a record by record basis. It does not collect events until they
> > > > > arecomplete - it dumps it to disk as soon as it can tack on the
> > > > > extrainformation.
> > > > > So, the question would be, does this delay happen on the way to disk? Oris
> > > > > this an artifact of post processing the logs with an auparse basedutility?
> > > > > Can this be observed repeatedly on the same raw logs? If so,then maybe
> > > > > auparse does have some issue. But if this is a postprocessing issue, then
> > > > > the wall clock doesn't matter because this eventshould have collected up
> > > > > together.
> > > > > I'd say this merits some investigation.
> > > > 
> > > > OK. I think this needs to be addressed on two fronts. There may be
> > > > more.A.  Within post processing ... a 2 second timeout is not sufficient.
> > > > Iwould suggest we modify auparse.c:au_check_events() to i) perform theevent
> > > > type checks first, then  ii) increase the timeout of 2 seconds to be a
> > > > larger value based onempirical tests.
> > > 
> > > In the post processing, there are 2 use cases. The first is events that are on
> > > disk. In this usage, the 2 second timeout does not come into effect because
> > > the events are run through probably within nanoseconds or microseconds at the
> > > worst. The only time it would come into effect is if the terminating record is
> > > missing.
> > 
> > In this first case, the 2 second timeout is on the event's time, not the
> > 'processing time'.  See ausearch-lol.c:check_events() and
> > auparse.c:au_get_ready_event().
> > And I use the checkpointing code to avoid the incomplete event issue.
> > 
> > In my case, I have not lost records, it's just that an event has arrived on disk
> > with an event time more than 2 seconds after the previously written event.
> > Basically,
> > a. The event was delayed getting to auditd and we look to the kernel for a
> > solution.
> > b. The event arrived at a reasonable point in time at auditd and for some reason
> > auditd delayed it's printing (by the way I tend to use RAW log format, not
> > enriched.
> > 
> > In either case, I believe ausearch-lol.c:check_events() and
> > auparse.c:au_get_ready_event() do need to be changed as we have complete events
> > written by auditd
> > which these two routines fail to process properly.
> > 
> Changing the two second timeout in ausearch-lol.c:check_events() and, one assumes
> in  auparse.c:au_get_ready_event() (but I have not tested the auparse code) fixes
> the processing  of the delayed event.
> Changing the value to say 10 seconds fixes my example use case, but given the
> kernel or auditd could emit an event with a larger delay, should this be a
> configuration item in /etc/audit/auditd.conf?
> 
> I have raised both a bugzilla report (
> https://bugzilla.redhat.com/show_bug.cgi?id=1914603) and Issue (
> https://github.com/linux-audit/audit-userspace/issues/148)
> 
> How do you want me to proceed ... a simple change to 10 seconds or a more
> versatile configuration item in auditd.conf?
> I can perform either and issue a PR if so required.

What do people think ... I would point out this currently occurs in both audit-
2.8.5-4.el7.x86_64 and audit-3.0-0.17.20191104git1c2f876.el8.x86_64 and a just
compiled checkout of the audit-userspace code audit-3.0-1.fc33.x86_64.
I have found examples (under audit-3.0-0.17.20191104git1c2f876.el8.x86_64) that I
need a 15 second delay.
I accept there may be an issue in the kernel but if it is not simple (I only see
this occasionally), then we need a fix in the userspace. Since I have found varied
times required, do I go down the path of a big value OR a configuration item in
/etc/audit/auditd.conf?
>  
> > > The other use case is realtime processing as an audispd plugin. In this use
> > > case the wall clock could matter because records could potentially get lost
> > > due to overflows or another plugin taking too long. This is the use case where
> > > the wall clock matters. And again, it matters when records get lost or delayed
> > > in transit. As long as everything is flowing, it should not factor into event
> > > processing.
> > > > B. I will build a temporary auditd daemon to perform some empirical
> > > > testingto see how long events can reside within the daemon. I may need
> > > > someadvice on this. I assume that the code that sets the timestamp is
> > > > insrc/auditd.c:send_audit_event().
> > > 
> > > This is only for audit daemon's internal events. For all "real" events, it's
> > > set in the kernel.
> > 
> > If that is the case and the kernel is establishing the timestamp, then either
> > the kernel has delayed the eventsarrival at the daemon or the daemon has delayed
> > it's writing.
> > > > If so, I will see if I can put orchestration debug code in to monitor
> > > > anevent's 'time in daemon' until this point. I will then report on this.
> > > > I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread,then
> > > > the testing switch in A. will not be a big issue (time cost wise). Itwill
> > > > also mean that if we over compensate the timeout that would causeadditional
> > > > memory cost in auparse() then this is mittigated.
> > > 
> > > I'd suggest breaking up the event completion tests so that an exact collection
> > > termination reason code could be associated to the event.
> > > > With respect to 'There may be more' fronts. Are there other points in
> > > > the'audit ecosystem' that makes use of the '2 second timeout'.
> > > 
> > > Ausearch/report has its own special copy of the event collection logic. It
> > > should be nearly identical to what auparse does.
> > 
> > They appear identical  ... ausearch-lol.c:check_events() and
> > auparse.c:au_get_ready_event().
> > > > I will start work on this, this coming weekend if the above makes sense.
> > > 
> > > One other thought, the current shipping code is audit-3.0, doing a diff
> > > between it and audit-2.8.5 for the auparse directory does show some
> > > differences in event collection/grouping/next_event. A lot of the differences
> > > are cosmetic to fix extra whitespace or indentation. But if you skip all that,
> > > there are some real changes that probably were because of bug reports. For
> > > example,
> > 
> > I will go through these, although this occurs on my Centos 7's (audit-2.8.5-
> > 4.el7.x86_64) as well as my 8's (audit-3.0-0.17.20191104git1c2f876.el8.x86_64).
> > > @@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event(        if
> > > (lowest && lowest->status == EBS_COMPLETE) {                lowest->status =
> > > EBS_EMPTY;                au->au_ready--;-               // Try to consolidate
> > > the array so that we iterate-               // over a smaller portion next
> > > time-               if (lowest == &lol->array[lol->maxi])
> > > {-                       au_lolnode *ptr =
> > > lowest;-                       while (ptr->status == EBS_EMPTY && lol->maxi >
> > > 0) {-                               lol->maxi
> > > --;-                               ptr = &lol->array[lol
> > > ->maxi];-                       }-               }                return
> > > lowest->l;        }
> > > and
> > > @@ -1536,6 +1550,13 @@ static int
> > > au_auparse_next_event(auparse                aup_list_create(l);              
> > >   aup_list_set_event(l, &e);                aup_list_append(l, au->cur_buf,
> > > au->list_idx, au-
> > > > line_number);
> > > +               // Eat standalone EOE - main event was already marked
> > > complete+               if (l->head->type == AUDIT_EOE)
> > > {+                       au->cur_buf =
> > > NULL;+                       aup_list_clear(l);+                       free(l)
> > > ;+                       continue;+               }                if
> > > (au_lol_append(au->au_lo, l) == NULL) {                        free((char
> > > *)e.host); #ifdef LOL_EVENTS_DEBUG01
> > > I don't know if those have an effect on what you are seeing. But that is the
> > > only substantial changes that I can see.
> > > -Steve
> 
> Burn
> 
> --Linux-audit mailing listLinux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit

[-- Attachment #1.2: Type: text/html, Size: 12588 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-15 22:18           ` Burn Alting
@ 2021-01-16  0:35             ` Richard Guy Briggs
  2021-01-16  2:42               ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Guy Briggs @ 2021-01-16  0:35 UTC (permalink / raw)
  To: burn; +Cc: Linux Audit

On 2021-01-16 09:18, Burn Alting wrote:
> On Sun, 2021-01-10 at 15:39 +1100, Burn Alting wrote:
> > On Tue, 2021-01-05 at 07:12 +1100, Burn Alting wrote:
> > > On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
> > > > On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> > > > > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > > > > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > > > > > Sometimes, events recorded in /var/log/audit/audit.log appear
> > > > > > > someseconds past co- located events which results
> > > > > > > inauparse:au_check_events() marking  these events complete before
> > > > > > > theyare. An example of this can be seen below with the offending event
> > > > > > > id44609.
> > > > > > > This has been plaguing me for a year or two and this morning was
> > > > > > > thefirst time I still had access to the raw audit.log files (I monitor a
> > > > > > > lotof event types and the log files roll over fairly quickly).The
> > > > > > > example below is from a fully patched Centos 7 but I have also seenthis
> > > > > > > on a patched Fedora 32.
> > > > > > > Has this been seen before? Do we need to re-evaluate how
> > > > > > > auparse'completes' an event (ie 2 seconds is too quick).
> > > > > > 
> > > > > > I have never seen this. But on the way to disk, auditd only does
> > > > > > lightprocessing of the event.  If the format is enriched, it looks things
> > > > > > upon a record by record basis. It does not collect events until they
> > > > > > arecomplete - it dumps it to disk as soon as it can tack on the
> > > > > > extrainformation.
> > > > > > So, the question would be, does this delay happen on the way to disk? Oris
> > > > > > this an artifact of post processing the logs with an auparse basedutility?
> > > > > > Can this be observed repeatedly on the same raw logs? If so,then maybe
> > > > > > auparse does have some issue. But if this is a postprocessing issue, then
> > > > > > the wall clock doesn't matter because this eventshould have collected up
> > > > > > together.
> > > > > > I'd say this merits some investigation.
> > > > > 
> > > > > OK. I think this needs to be addressed on two fronts. There may be
> > > > > more.A.  Within post processing ... a 2 second timeout is not sufficient.
> > > > > Iwould suggest we modify auparse.c:au_check_events() to i) perform theevent
> > > > > type checks first, then  ii) increase the timeout of 2 seconds to be a
> > > > > larger value based onempirical tests.
> > > > 
> > > > In the post processing, there are 2 use cases. The first is events that are on
> > > > disk. In this usage, the 2 second timeout does not come into effect because
> > > > the events are run through probably within nanoseconds or microseconds at the
> > > > worst. The only time it would come into effect is if the terminating record is
> > > > missing.
> > > 
> > > In this first case, the 2 second timeout is on the event's time, not the
> > > 'processing time'.  See ausearch-lol.c:check_events() and
> > > auparse.c:au_get_ready_event().
> > > And I use the checkpointing code to avoid the incomplete event issue.
> > > 
> > > In my case, I have not lost records, it's just that an event has arrived on disk
> > > with an event time more than 2 seconds after the previously written event.
> > > Basically,
> > > a. The event was delayed getting to auditd and we look to the kernel for a
> > > solution.
> > > b. The event arrived at a reasonable point in time at auditd and for some reason
> > > auditd delayed it's printing (by the way I tend to use RAW log format, not
> > > enriched.
> > > 
> > > In either case, I believe ausearch-lol.c:check_events() and
> > > auparse.c:au_get_ready_event() do need to be changed as we have complete events
> > > written by auditd
> > > which these two routines fail to process properly.
> > > 
> > Changing the two second timeout in ausearch-lol.c:check_events() and, one assumes
> > in  auparse.c:au_get_ready_event() (but I have not tested the auparse code) fixes
> > the processing  of the delayed event.
> > Changing the value to say 10 seconds fixes my example use case, but given the
> > kernel or auditd could emit an event with a larger delay, should this be a
> > configuration item in /etc/audit/auditd.conf?
> > 
> > I have raised both a bugzilla report (
> > https://bugzilla.redhat.com/show_bug.cgi?id=1914603) and Issue (
> > https://github.com/linux-audit/audit-userspace/issues/148)
> > 
> > How do you want me to proceed ... a simple change to 10 seconds or a more
> > versatile configuration item in auditd.conf?
> > I can perform either and issue a PR if so required.
> 
> What do people think ... I would point out this currently occurs in both audit-
> 2.8.5-4.el7.x86_64 and audit-3.0-0.17.20191104git1c2f876.el8.x86_64 and a just
> compiled checkout of the audit-userspace code audit-3.0-1.fc33.x86_64.
> I have found examples (under audit-3.0-0.17.20191104git1c2f876.el8.x86_64) that I
> need a 15 second delay.
> I accept there may be an issue in the kernel but if it is not simple (I only see
> this occasionally), then we need a fix in the userspace. Since I have found varied
> times required, do I go down the path of a big value OR a configuration item in
> /etc/audit/auditd.conf?

Or we go back to userspace code looking for the EOE record?  This
doesn't help if they arrive out of order.  Do we number the records in
the kernel?  N of M...

> > > > The other use case is realtime processing as an audispd plugin. In this use
> > > > case the wall clock could matter because records could potentially get lost
> > > > due to overflows or another plugin taking too long. This is the use case where
> > > > the wall clock matters. And again, it matters when records get lost or delayed
> > > > in transit. As long as everything is flowing, it should not factor into event
> > > > processing.
> > > > > B. I will build a temporary auditd daemon to perform some empirical
> > > > > testingto see how long events can reside within the daemon. I may need
> > > > > someadvice on this. I assume that the code that sets the timestamp is
> > > > > insrc/auditd.c:send_audit_event().
> > > > 
> > > > This is only for audit daemon's internal events. For all "real" events, it's
> > > > set in the kernel.
> > > 
> > > If that is the case and the kernel is establishing the timestamp, then either
> > > the kernel has delayed the eventsarrival at the daemon or the daemon has delayed
> > > it's writing.
> > > > > If so, I will see if I can put orchestration debug code in to monitor
> > > > > anevent's 'time in daemon' until this point. I will then report on this.
> > > > > I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread,then
> > > > > the testing switch in A. will not be a big issue (time cost wise). Itwill
> > > > > also mean that if we over compensate the timeout that would causeadditional
> > > > > memory cost in auparse() then this is mittigated.
> > > > 
> > > > I'd suggest breaking up the event completion tests so that an exact collection
> > > > termination reason code could be associated to the event.
> > > > > With respect to 'There may be more' fronts. Are there other points in
> > > > > the'audit ecosystem' that makes use of the '2 second timeout'.
> > > > 
> > > > Ausearch/report has its own special copy of the event collection logic. It
> > > > should be nearly identical to what auparse does.
> > > 
> > > They appear identical  ... ausearch-lol.c:check_events() and
> > > auparse.c:au_get_ready_event().
> > > > > I will start work on this, this coming weekend if the above makes sense.
> > > > 
> > > > One other thought, the current shipping code is audit-3.0, doing a diff
> > > > between it and audit-2.8.5 for the auparse directory does show some
> > > > differences in event collection/grouping/next_event. A lot of the differences
> > > > are cosmetic to fix extra whitespace or indentation. But if you skip all that,
> > > > there are some real changes that probably were because of bug reports. For
> > > > example,
> > > 
> > > I will go through these, although this occurs on my Centos 7's (audit-2.8.5-
> > > 4.el7.x86_64) as well as my 8's (audit-3.0-0.17.20191104git1c2f876.el8.x86_64).
> > > > @@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event(        if
> > > > (lowest && lowest->status == EBS_COMPLETE) {                lowest->status =
> > > > EBS_EMPTY;                au->au_ready--;-               // Try to consolidate
> > > > the array so that we iterate-               // over a smaller portion next
> > > > time-               if (lowest == &lol->array[lol->maxi])
> > > > {-                       au_lolnode *ptr =
> > > > lowest;-                       while (ptr->status == EBS_EMPTY && lol->maxi >
> > > > 0) {-                               lol->maxi
> > > > --;-                               ptr = &lol->array[lol
> > > > ->maxi];-                       }-               }                return
> > > > lowest->l;        }
> > > > and
> > > > @@ -1536,6 +1550,13 @@ static int
> > > > au_auparse_next_event(auparse                aup_list_create(l);              
> > > >   aup_list_set_event(l, &e);                aup_list_append(l, au->cur_buf,
> > > > au->list_idx, au-
> > > > > line_number);
> > > > +               // Eat standalone EOE - main event was already marked
> > > > complete+               if (l->head->type == AUDIT_EOE)
> > > > {+                       au->cur_buf =
> > > > NULL;+                       aup_list_clear(l);+                       free(l)
> > > > ;+                       continue;+               }                if
> > > > (au_lol_append(au->au_lo, l) == NULL) {                        free((char
> > > > *)e.host); #ifdef LOL_EVENTS_DEBUG01
> > > > I don't know if those have an effect on what you are seeing. But that is the
> > > > only substantial changes that I can see.
> > > > -Steve
> > 
> > Burn

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-16  0:35             ` Richard Guy Briggs
@ 2021-01-16  2:42               ` Burn Alting
  2021-01-17 14:07                 ` Paul Moore
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-16  2:42 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 10552 bytes --]

On Fri, 2021-01-15 at 19:35 -0500, Richard Guy Briggs wrote:
> On 2021-01-16 09:18, Burn Alting wrote:
> > On Sun, 2021-01-10 at 15:39 +1100, Burn Alting wrote:
> > > On Tue, 2021-01-05 at 07:12 +1100, Burn Alting wrote:
> > > > On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
> > > > > On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote:
> > > > > > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> > > > > > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > > > > > > > Sometimes, events recorded in /var/log/audit/audit.log
> > > > > > > > appearsomeseconds past co- located events which
> > > > > > > > resultsinauparse:au_check_events() marking  these events complete
> > > > > > > > beforetheyare. An example of this can be seen below with the
> > > > > > > > offending eventid44609.This has been plaguing me for a year or two
> > > > > > > > and this morning wasthefirst time I still had access to the raw
> > > > > > > > audit.log files (I monitor alotof event types and the log files roll
> > > > > > > > over fairly quickly).Theexample below is from a fully patched Centos
> > > > > > > > 7 but I have also seenthison a patched Fedora 32.Has this been seen
> > > > > > > > before? Do we need to re-evaluate howauparse'completes' an event (ie
> > > > > > > > 2 seconds is too quick).
> > > > > > > 
> > > > > > > I have never seen this. But on the way to disk, auditd only
> > > > > > > doeslightprocessing of the event.  If the format is enriched, it looks
> > > > > > > thingsupon a record by record basis. It does not collect events until
> > > > > > > theyarecomplete - it dumps it to disk as soon as it can tack on
> > > > > > > theextrainformation.So, the question would be, does this delay happen
> > > > > > > on the way to disk? Oristhis an artifact of post processing the logs
> > > > > > > with an auparse basedutility?Can this be observed repeatedly on the
> > > > > > > same raw logs? If so,then maybeauparse does have some issue. But if
> > > > > > > this is a postprocessing issue, thenthe wall clock doesn't matter
> > > > > > > because this eventshould have collected uptogether.I'd say this merits
> > > > > > > some investigation.
> > > > > > 
> > > > > > OK. I think this needs to be addressed on two fronts. There may
> > > > > > bemore.A.  Within post processing ... a 2 second timeout is not
> > > > > > sufficient.Iwould suggest we modify auparse.c:au_check_events() to i)
> > > > > > perform theeventtype checks first, then  ii) increase the timeout of 2
> > > > > > seconds to be alarger value based onempirical tests.
> > > > > 
> > > > > In the post processing, there are 2 use cases. The first is events that
> > > > > are ondisk. In this usage, the 2 second timeout does not come into effect
> > > > > becausethe events are run through probably within nanoseconds or
> > > > > microseconds at theworst. The only time it would come into effect is if
> > > > > the terminating record ismissing.
> > > > 
> > > > In this first case, the 2 second timeout is on the event's time, not
> > > > the'processing time'.  See ausearch-lol.c:check_events()
> > > > andauparse.c:au_get_ready_event().And I use the checkpointing code to avoid
> > > > the incomplete event issue.
> > > > In my case, I have not lost records, it's just that an event has arrived on
> > > > diskwith an event time more than 2 seconds after the previously written
> > > > event.Basically,a. The event was delayed getting to auditd and we look to
> > > > the kernel for asolution.b. The event arrived at a reasonable point in time
> > > > at auditd and for some reasonauditd delayed it's printing (by the way I tend
> > > > to use RAW log format, notenriched.
> > > > In either case, I believe ausearch-lol.c:check_events()
> > > > andauparse.c:au_get_ready_event() do need to be changed as we have complete
> > > > eventswritten by auditdwhich these two routines fail to process properly.
> > > Changing the two second timeout in ausearch-lol.c:check_events() and, one
> > > assumesin  auparse.c:au_get_ready_event() (but I have not tested the auparse
> > > code) fixesthe processing  of the delayed event.Changing the value to say 10
> > > seconds fixes my example use case, but given thekernel or auditd could emit an
> > > event with a larger delay, should this be aconfiguration item in
> > > /etc/audit/auditd.conf?
> > > I have raised both a bugzilla report (
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1914603) and Issue (
> > > https://github.com/linux-audit/audit-userspace/issues/148)
> > > How do you want me to proceed ... a simple change to 10 seconds or a
> > > moreversatile configuration item in auditd.conf?I can perform either and issue
> > > a PR if so required.
> > 
> > What do people think ... I would point out this currently occurs in both audit-
> > 2.8.5-4.el7.x86_64 and audit-3.0-0.17.20191104git1c2f876.el8.x86_64 and a
> > justcompiled checkout of the audit-userspace code audit-3.0-1.fc33.x86_64.I have
> > found examples (under audit-3.0-0.17.20191104git1c2f876.el8.x86_64) that Ineed a
> > 15 second delay.I accept there may be an issue in the kernel but if it is not
> > simple (I only seethis occasionally), then we need a fix in the userspace. Since
> > I have found variedtimes required, do I go down the path of a big value OR a
> > configuration item in/etc/audit/auditd.conf?
> 
> Or we go back to userspace code looking for the EOE record?  Thisdoesn't help if
> they arrive out of order.  Do we number the records inthe kernel?  N of M...

I like the N of M concept but there would be a LOT of change - especially for all
the non-kernel event sources. The EOE would be the most seamless, but at a cost.My
preference is to allow the 2 second 'timer' to be configurable.
> > > > > The other use case is realtime processing as an audispd plugin. In this
> > > > > usecase the wall clock could matter because records could potentially get
> > > > > lostdue to overflows or another plugin taking too long. This is the use
> > > > > case wherethe wall clock matters. And again, it matters when records get
> > > > > lost or delayedin transit. As long as everything is flowing, it should not
> > > > > factor into eventprocessing.
> > > > > > B. I will build a temporary auditd daemon to perform some
> > > > > > empiricaltestingto see how long events can reside within the daemon. I
> > > > > > may needsomeadvice on this. I assume that the code that sets the
> > > > > > timestamp isinsrc/auditd.c:send_audit_event().
> > > > > 
> > > > > This is only for audit daemon's internal events. For all "real" events,
> > > > > it'sset in the kernel.
> > > > 
> > > > If that is the case and the kernel is establishing the timestamp, then
> > > > eitherthe kernel has delayed the eventsarrival at the daemon or the daemon
> > > > has delayedit's writing.
> > > > > > If so, I will see if I can put orchestration debug code in to
> > > > > > monitoranevent's 'time in daemon' until this point. I will then report
> > > > > > on this.I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly
> > > > > > widespread,thenthe testing switch in A. will not be a big issue (time
> > > > > > cost wise). Itwillalso mean that if we over compensate the timeout that
> > > > > > would causeadditionalmemory cost in auparse() then this is mittigated.
> > > > > 
> > > > > I'd suggest breaking up the event completion tests so that an exact
> > > > > collectiontermination reason code could be associated to the event.
> > > > > > With respect to 'There may be more' fronts. Are there other points
> > > > > > inthe'audit ecosystem' that makes use of the '2 second timeout'.
> > > > > 
> > > > > Ausearch/report has its own special copy of the event collection logic.
> > > > > Itshould be nearly identical to what auparse does.
> > > > 
> > > > They appear identical  ... ausearch-lol.c:check_events()
> > > > andauparse.c:au_get_ready_event().
> > > > > > I will start work on this, this coming weekend if the above makes sense.
> > > > > 
> > > > > One other thought, the current shipping code is audit-3.0, doing a
> > > > > diffbetween it and audit-2.8.5 for the auparse directory does show
> > > > > somedifferences in event collection/grouping/next_event. A lot of the
> > > > > differencesare cosmetic to fix extra whitespace or indentation. But if you
> > > > > skip all that,there are some real changes that probably were because of
> > > > > bug reports. Forexample,
> > > > 
> > > > I will go through these, although this occurs on my Centos 7's (audit-2.8.5-
> > > > 4.el7.x86_64) as well as my 8's (audit-3.0-
> > > > 0.17.20191104git1c2f876.el8.x86_64).
> > > > > @@ -259,15 +260,6 @@ static event_list_t
> > > > > *au_get_ready_event(        if(lowest && lowest->status == EBS_COMPLETE)
> > > > > {                lowest->status =EBS_EMPTY;                au->au_ready
> > > > > --;-               // Try to consolidatethe array so that we
> > > > > iterate-               // over a smaller portion
> > > > > nexttime-               if (lowest == &lol->array[lol
> > > > > ->maxi]){-                       au_lolnode *ptr
> > > > > =lowest;-                       while (ptr->status == EBS_EMPTY && lol-
> > > > > >maxi >0) {-                               lol->maxi
> > > > > --;-                               ptr = &lol->array[lol
> > > > > ->maxi];-                       }-               }                returnlo
> > > > > west->l;        }and@@ -1536,6 +1550,13 @@ static
> > > > > intau_auparse_next_event(auparse                aup_list_create(l);       
> > > > >          aup_list_set_event(l, &e);                aup_list_append(l, au-
> > > > > >cur_buf,au->list_idx, au-
> > > > > > line_number);
> > > > > +               // Eat standalone EOE - main event was already
> > > > > markedcomplete+               if (l->head->type ==
> > > > > AUDIT_EOE){+                       au->cur_buf
> > > > > =NULL;+                       aup_list_clear(l);+                       fr
> > > > > ee(l);+                       continue;+               }                if
> > > > > (au_lol_append(au->au_lo, l) == NULL)
> > > > > {                        free((char*)e.host); #ifdef LOL_EVENTS_DEBUG01I
> > > > > don't know if those have an effect on what you are seeing. But that is
> > > > > theonly substantial changes that I can see.-Steve
> > > 
> > > Burn
> 
> - RGB
> --Richard Guy Briggs <rgb@redhat.com>Sr. S/W Engineer, Kernel Security, Base
> Operating SystemsRemote, Ottawa, Red Hat CanadaIRC: rgb, SunRaycerVoice:
> +1.647.777.2635, Internal: (81) 32635

[-- Attachment #1.2: Type: text/html, Size: 13816 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-16  2:42               ` Burn Alting
@ 2021-01-17 14:07                 ` Paul Moore
  2021-01-17 21:12                   ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2021-01-17 14:07 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Fri, Jan 15, 2021 at 9:43 PM Burn Alting <burn.alting@iinet.net.au> wrote:
> On Fri, 2021-01-15 at 19:35 -0500, Richard Guy Briggs wrote:
>> Or we go back to userspace code looking for the EOE record?  This
>> doesn't help if they arrive out of order.  Do we number the records in
>> the kernel?  N of M...
>
> I like the N of M concept but there would be a LOT of change - especially for all the non-kernel event sources. The EOE would be the most seamless, but at a cost.
> My preference is to allow the 2 second 'timer' to be configurable.

Agree with Burn, numbering the records coming up from the kernel is
going to be a real nightmare, and not something to consider lightly.
Especially when it sounds like we don't yet have a root cause for the
issue.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-17 14:07                 ` Paul Moore
@ 2021-01-17 21:12                   ` Steve Grubb
  2021-01-18 13:54                     ` Paul Moore
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-17 21:12 UTC (permalink / raw)
  To: burn, linux-audit; +Cc: Richard Guy Briggs, Linux Audit

On Sunday, January 17, 2021 9:07:08 AM EST Paul Moore wrote:
> On Fri, Jan 15, 2021 at 9:43 PM Burn Alting <burn.alting@iinet.net.au> 
wrote:
> > On Fri, 2021-01-15 at 19:35 -0500, Richard Guy Briggs wrote:
> >> Or we go back to userspace code looking for the EOE record?  This
> >> doesn't help if they arrive out of order.  Do we number the records in
> >> the kernel?  N of M...
> > 
> > I like the N of M concept but there would be a LOT of change - especially
> > for all the non-kernel event sources. The EOE would be the most
> > seamless, but at a cost. My preference is to allow the 2 second 'timer'
> > to be configurable.
>
> Agree with Burn, numbering the records coming up from the kernel is
> going to be a real nightmare, and not something to consider lightly.
> Especially when it sounds like we don't yet have a root cause for the
> issue.

A very long time ago, we had numbered records. But it was decided that 
there's no real point in it and we'd rather just save disk space.

I know that the kernel does not serialize the events headed for user space. 
But I'm curious how an event gets stuck and others can jump ahead while one 
that's already inflight can get hung for 4 seconds before it's next record 
goes out?

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-17 21:12                   ` Steve Grubb
@ 2021-01-18 13:54                     ` Paul Moore
  2021-01-18 14:31                       ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2021-01-18 13:54 UTC (permalink / raw)
  To: Steve Grubb, burn, linux-audit; +Cc: Richard Guy Briggs, Linux Audit

On January 17, 2021 4:12:44 PM Steve Grubb <sgrubb@redhat.com> wrote:

> On Sunday, January 17, 2021 9:07:08 AM EST Paul Moore wrote:
>> On Fri, Jan 15, 2021 at 9:43 PM Burn Alting <burn.alting@iinet.net.au>
> wrote:
>>> On Fri, 2021-01-15 at 19:35 -0500, Richard Guy Briggs wrote:
>>>> Or we go back to userspace code looking for the EOE record?  This
>>>> doesn't help if they arrive out of order.  Do we number the records in
>>>> the kernel?  N of M...
>>>
>>> I like the N of M concept but there would be a LOT of change - especially
>>> for all the non-kernel event sources. The EOE would be the most
>>> seamless, but at a cost. My preference is to allow the 2 second 'timer'
>>> to be configurable.
>>
>> Agree with Burn, numbering the records coming up from the kernel is
>> going to be a real nightmare, and not something to consider lightly.
>> Especially when it sounds like we don't yet have a root cause for the
>> issue.
>
> A very long time ago, we had numbered records. But it was decided that
> there's no real point in it and we'd rather just save disk space.

With the current kernel code, adding numbered records is not something to take lightly.

> I know that the kernel does not serialize the events headed for user space.
> But I'm curious how an event gets stuck and others can jump ahead while one
> that's already inflight can get hung for 4 seconds before it's next record
> goes out?

Have you determined that the problem is the kernel? Initially it was looking like it was a userspace issue, is that no longer the general thought? Also, is there a reliable reproducer yet?

--
paul moore
www.paul-moore.com




--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-18 13:54                     ` Paul Moore
@ 2021-01-18 14:31                       ` Steve Grubb
  2021-01-18 20:34                         ` Burn Alting
  2021-01-18 20:36                         ` Paul Moore
  0 siblings, 2 replies; 32+ messages in thread
From: Steve Grubb @ 2021-01-18 14:31 UTC (permalink / raw)
  To: burn, linux-audit, Paul Moore; +Cc: Richard Guy Briggs, Linux Audit

On Monday, January 18, 2021 8:54:30 AM EST Paul Moore wrote:
> >>> I like the N of M concept but there would be a LOT of change -
> >>> especially
> >>> for all the non-kernel event sources. The EOE would be the most
> >>> seamless, but at a cost. My preference is to allow the 2 second 'timer'
> >>> to be configurable.
> >> 
> >> Agree with Burn, numbering the records coming up from the kernel is
> >> going to be a real nightmare, and not something to consider lightly.
> >> Especially when it sounds like we don't yet have a root cause for the
> >> issue.
> > 
> > A very long time ago, we had numbered records. But it was decided that
> > there's no real point in it and we'd rather just save disk space.
> 
> With the current kernel code, adding numbered records is not something to
> take lightly.

That's why I'm saying we had it and it was removed. I could imagine that if 
you had auditing of the kill syscall enabled and a whole process group was 
being killed, you could have hundreds of records that need numbering. No good 
way to know in advance how many records make up the event.

> > I know that the kernel does not serialize the events headed for user
> > space. But I'm curious how an event gets stuck and others can jump ahead
> > while one that's already inflight can get hung for 4 seconds before it's
> > next record goes out?
> 
> Have you determined that the problem is the kernel? 

I assume so because the kernel adds the timestamp and choses what hits the 
socket next. Auditd does no ordering of events. It just looks up the text 
event ID, some minor translation if the enriched format is being used, and 
writes it to disk. It can handle well over 100k records per second.

> Initially it was looking like it was a userspace issue, is that no longer
> the general thought?

I don't see how user space could cause this. Even if auditd was slow, it 
shouldn't take 4 seconds to write to disk and then come back to read another 
record. And even it did, why would the newest record go out before completing 
one that's in progress? Something in the kernel chooses what's next. I 
suspect that might need looking at.

> Also, is there a reliable reproducer yet?

I don't know of one. But, I suppose we could modify ausearch to look for 
examples of this.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-18 14:31                       ` Steve Grubb
@ 2021-01-18 20:34                         ` Burn Alting
  2021-01-18 20:36                         ` Paul Moore
  1 sibling, 0 replies; 32+ messages in thread
From: Burn Alting @ 2021-01-18 20:34 UTC (permalink / raw)
  To: Steve Grubb, linux-audit, Paul Moore; +Cc: Richard Guy Briggs


[-- Attachment #1.1: Type: text/plain, Size: 2873 bytes --]

On Mon, 2021-01-18 at 09:31 -0500, Steve Grubb wrote:
> On Monday, January 18, 2021 8:54:30 AM EST Paul Moore wrote:
> > > > > I like the N of M concept but there would be a LOT of change -
> > > > > especially
> > > > > for all the non-kernel event sources. The EOE would be the most
> > > > > seamless, but at a cost. My preference is to allow the 2 second 'timer'
> > > > > to be configurable.
> > > > 
> > > > Agree with Burn, numbering the records coming up from the kernel is
> > > > going to be a real nightmare, and not something to consider lightly.
> > > > Especially when it sounds like we don't yet have a root cause for the
> > > > issue.
> > > 
> > > A very long time ago, we had numbered records. But it was decided that
> > > there's no real point in it and we'd rather just save disk space.
> > 
> > With the current kernel code, adding numbered records is not something to
> > take lightly.
> 
> That's why I'm saying we had it and it was removed. I could imagine that if 
> you had auditing of the kill syscall enabled and a whole process group was 
> being killed, you could have hundreds of records that need numbering. No good 
> way to know in advance how many records make up the event.
> 
> > > I know that the kernel does not serialize the events headed for user
> > > space. But I'm curious how an event gets stuck and others can jump ahead
> > > while one that's already inflight can get hung for 4 seconds before it's
> > > next record goes out?
> > 
> > Have you determined that the problem is the kernel? 
> 
> I assume so because the kernel adds the timestamp and choses what hits the 
> socket next. Auditd does no ordering of events. It just looks up the text 
> event ID, some minor translation if the enriched format is being used, and 
> writes it to disk. It can handle well over 100k records per second.
> 
> > Initially it was looking like it was a userspace issue, is that no longer
> > the general thought?
> 
> I don't see how user space could cause this. Even if auditd was slow, it 
> shouldn't take 4 seconds to write to disk and then come back to read another 
> record. And even it did, why would the newest record go out before completing 
> one that's in progress? Something in the kernel chooses what's next. I 
> suspect that might need looking at.
> 
> > Also, is there a reliable reproducer yet?
> 
> I don't know of one. But, I suppose we could modify ausearch to look for 
> examples of this.

Happy to run this where I can. I have also added the auditd.conf and audit.rules
files to my github issue (https://github.com/linux-audit/audit-userspace/issues/148)
that makes this activity more likely to occur if that helps.

Also, to meet the issue of existing ausearch and the auparse library failing to
process audit.log files with such issues, are we happy for a configuration item in
auditd.conf?

> 
> -Steve
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 4576 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-18 14:31                       ` Steve Grubb
  2021-01-18 20:34                         ` Burn Alting
@ 2021-01-18 20:36                         ` Paul Moore
  2021-01-19  8:18                           ` Burn Alting
  1 sibling, 1 reply; 32+ messages in thread
From: Paul Moore @ 2021-01-18 20:36 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit

On Mon, Jan 18, 2021 at 9:31 AM Steve Grubb <sgrubb@redhat.com> wrote:
> On Monday, January 18, 2021 8:54:30 AM EST Paul Moore wrote:
> > >>> I like the N of M concept but there would be a LOT of change -
> > >>> especially
> > >>> for all the non-kernel event sources. The EOE would be the most
> > >>> seamless, but at a cost. My preference is to allow the 2 second 'timer'
> > >>> to be configurable.
> > >>
> > >> Agree with Burn, numbering the records coming up from the kernel is
> > >> going to be a real nightmare, and not something to consider lightly.
> > >> Especially when it sounds like we don't yet have a root cause for the
> > >> issue.
> > >
> > > A very long time ago, we had numbered records. But it was decided that
> > > there's no real point in it and we'd rather just save disk space.
> >
> > With the current kernel code, adding numbered records is not something to
> > take lightly.
>
> That's why I'm saying we had it and it was removed. I could imagine that if
> you had auditing of the kill syscall enabled and a whole process group was
> being killed, you could have hundreds of records that need numbering. No good
> way to know in advance how many records make up the event.

You only mentioned disk space concerns so it wasn't clear to me that
you were in agreement about this being a bad idea.  Regardless, I'm
glad to see we are on the same page about this.

> > > I know that the kernel does not serialize the events headed for user
> > > space. But I'm curious how an event gets stuck and others can jump ahead
> > > while one that's already inflight can get hung for 4 seconds before it's
> > > next record goes out?
> >
> > Have you determined that the problem is the kernel?
>
> I assume so because the kernel adds the timestamp and choses what hits the
> socket next. Auditd does no ordering of events. It just looks up the text
> event ID, some minor translation if the enriched format is being used, and
> writes it to disk. It can handle well over 100k records per second.

Feel free to insert the old joke about assumptions.

I guess I was hoping for a bit more understanding of the problem and
perhaps some actual data indicating the kernel was the source of the
problem.  Conjecture based on how things are supposed to work can be
misleading.

> > Initially it was looking like it was a userspace issue, is that no longer
> > the general thought?
>
> I don't see how user space could cause this. Even if auditd was slow, it
> shouldn't take 4 seconds to write to disk and then come back to read another
> record. And even it did, why would the newest record go out before completing
> one that's in progress? Something in the kernel chooses what's next. I
> suspect that might need looking at.

See above.

> > Also, is there a reliable reproducer yet?
>
> I don't know of one. But, I suppose we could modify ausearch to look for
> examples of this.

The kernel queuing is a rather complicated affair due to the need to
gracefully handle auditd failing, fallbacks to the console, and
multicast groups all while handling extreme pressure (e.g. auditing
*every* syscall) and not destroying the responsiveness of the system
(we actually can still make forward progress if you are auditing
*every* syscall).  With that complexity comes a number of corner
cases, and I imagine there are a few cases where the system is under
extreme pressure and/or the auditd daemon is dead and/or starved from
CPU time.  As I know Richard is reading this, to be clear I'm talking
about the hold/retry queues and the UNICAST_RETRIES case.  The severe
delays you are talking about in this thread seem severe, but perhaps
if the system is under enough pressure to cause the ordering issues in
the first place such a delay is to be expected.

Anyway, my test setup isn't likely able to reproduce such a scenario
without some significant tweaks, so perhaps those of you who have seen
this problem (Burn, and anyone else?) could shed some light into the
state of the system when the ordering problem occurred.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-18 20:36                         ` Paul Moore
@ 2021-01-19  8:18                           ` Burn Alting
  2021-01-19 15:31                             ` Lenny Bruzenak
  2021-01-19 19:11                             ` Paul Moore
  0 siblings, 2 replies; 32+ messages in thread
From: Burn Alting @ 2021-01-19  8:18 UTC (permalink / raw)
  To: Paul Moore, Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 4932 bytes --]

On Mon, 2021-01-18 at 15:36 -0500, Paul Moore wrote:
> On Mon, Jan 18, 2021 at 9:31 AM Steve Grubb <sgrubb@redhat.com> wrote:
> > On Monday, January 18, 2021 8:54:30 AM EST Paul Moore wrote:
> > > > > > I like the N of M concept but there would be a LOT of change
> > > > > > -especiallyfor all the non-kernel event sources. The EOE would be the
> > > > > > mostseamless, but at a cost. My preference is to allow the 2 second
> > > > > > 'timer'to be configurable.
> > > > > 
> > > > > Agree with Burn, numbering the records coming up from the kernel isgoing
> > > > > to be a real nightmare, and not something to consider lightly.Especially
> > > > > when it sounds like we don't yet have a root cause for theissue.
> > > > 
> > > > A very long time ago, we had numbered records. But it was decided
> > > > thatthere's no real point in it and we'd rather just save disk space.
> > > 
> > > With the current kernel code, adding numbered records is not something totake
> > > lightly.
> > 
> > That's why I'm saying we had it and it was removed. I could imagine that ifyou
> > had auditing of the kill syscall enabled and a whole process group wasbeing
> > killed, you could have hundreds of records that need numbering. No goodway to
> > know in advance how many records make up the event.
> 
> You only mentioned disk space concerns so it wasn't clear to me thatyou were in
> agreement about this being a bad idea.  Regardless, I'mglad to see we are on the
> same page about this.
> > > > I know that the kernel does not serialize the events headed for userspace.
> > > > But I'm curious how an event gets stuck and others can jump aheadwhile one
> > > > that's already inflight can get hung for 4 seconds before it'snext record
> > > > goes out?
> > > 
> > > Have you determined that the problem is the kernel?
> > 
> > I assume so because the kernel adds the timestamp and choses what hits thesocket
> > next. Auditd does no ordering of events. It just looks up the textevent ID, some
> > minor translation if the enriched format is being used, andwrites it to disk. It
> > can handle well over 100k records per second.
> 
> Feel free to insert the old joke about assumptions.
> I guess I was hoping for a bit more understanding of the problem andperhaps some
> actual data indicating the kernel was the source of theproblem.  Conjecture based
> on how things are supposed to work can bemisleading.
> > > Initially it was looking like it was a userspace issue, is that no longerthe
> > > general thought?
> > 
> > I don't see how user space could cause this. Even if auditd was slow,
> > itshouldn't take 4 seconds to write to disk and then come back to read
> > anotherrecord. And even it did, why would the newest record go out before
> > completingone that's in progress? Something in the kernel chooses what's next.
> > Isuspect that might need looking at.
> 
> See above.
> > > Also, is there a reliable reproducer yet?
> > 
> > I don't know of one. But, I suppose we could modify ausearch to look forexamples
> > of this.
> 
> The kernel queuing is a rather complicated affair due to the need togracefully
> handle auditd failing, fallbacks to the console, andmulticast groups all while
> handling extreme pressure (e.g. auditing*every* syscall) and not destroying the
> responsiveness of the system(we actually can still make forward progress if you
> are auditing*every* syscall).  With that complexity comes a number of cornercases,
> and I imagine there are a few cases where the system is underextreme pressure
> and/or the auditd daemon is dead and/or starved fromCPU time.  As I know Richard
> is reading this, to be clear I'm talkingabout the hold/retry queues and the
> UNICAST_RETRIES case.  The severedelays you are talking about in this thread seem
> severe, but perhapsif the system is under enough pressure to cause the ordering
> issues inthe first place such a delay is to be expected.
> Anyway, my test setup isn't likely able to reproduce such a scenariowithout some
> significant tweaks, so perhaps those of you who have seenthis problem (Burn, and
> anyone else?) could shed some light into thestate of the system when the ordering
> problem occurred.

I tend to have a rigorous auditing posture (see the rules loaded in 
https://github.com/linux-audit/audit-userspace/issues/148) which is not normal for
most. Perhaps, Paul, you have hit the nail on the head by stating that this 'severe
delay' is not that unreasonable given my rules posture and we just need to 'deal
with it' in user space.We still get the event data, I just need to adjust the user
space tools to deal with this occurrence.
As for what the system is doing, in my home case it's a Centos 7 VM running a tomcat
service which only gets busy every 20 minutes and the other is a HPE Z800 running
Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist
in 'validating'/testing the delay. Advise and I will run.

[-- Attachment #1.2: Type: text/html, Size: 7401 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19  8:18                           ` Burn Alting
@ 2021-01-19 15:31                             ` Lenny Bruzenak
  2021-01-19 19:11                             ` Paul Moore
  1 sibling, 0 replies; 32+ messages in thread
From: Lenny Bruzenak @ 2021-01-19 15:31 UTC (permalink / raw)
  To: linux-audit


[-- Attachment #1.1: Type: text/plain, Size: 1337 bytes --]

On 1/19/21 2:18 AM, Burn Alting wrote:

>> Anyway, my test setup isn't likely able to reproduce such a scenario
>> without some significant tweaks, so perhaps those of you who have seen
>> this problem (Burn, and anyone else?) could shed some light into the
>> state of the system when the ordering problem occurred.
>
> I tend to have a rigorous auditing posture (see the rules loaded in
> https://github.com/linux-audit/audit-userspace/issues/148) which is
> not normal for most. Perhaps, Paul, you have hit the nail on the head
> by stating that this 'severe delay' is not that unreasonable given my
> rules posture and we just need to 'deal with it' in user space.
> We still get the event data, I just need to adjust the user space
> tools to deal with this occurrence.
> As for what the system is doing, in my home case it's a Centos 7 VM
> running a tomcat service which only gets busy every 20 minutes and the
> other is a HPE Z800 running Centos 8 with 4-5 VM's mostly dormant. I
> can put any code in these hosts to assist in 'validating'/testing the
> delay. Advise and I will run.


Looking at the records which appear to be delayed (in OP), I see that
they are PATH (w/PROCTITLE). Just curious, is the path involved part of
a networked FS, or something like a VM shared directory?

Thx,

LCB

-- 
Lenny Bruzenak
MagitekLTD


[-- Attachment #1.2: Type: text/html, Size: 2213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19  8:18                           ` Burn Alting
  2021-01-19 15:31                             ` Lenny Bruzenak
@ 2021-01-19 19:11                             ` Paul Moore
  2021-01-19 19:38                               ` Burn Alting
  2021-01-19 20:42                               ` Richard Guy Briggs
  1 sibling, 2 replies; 32+ messages in thread
From: Paul Moore @ 2021-01-19 19:11 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Tue, Jan 19, 2021 at 3:18 AM Burn Alting <burn.alting@iinet.net.au> wrote:
> I tend to have a rigorous auditing posture (see the rules loaded in https://github.com/linux-audit/audit-userspace/issues/148) which is not normal for most. Perhaps, Paul, you have hit the nail on the head by stating that this 'severe delay' is not that unreasonable given my rules posture and we just need to 'deal with it' in user space.
> We still get the event data, I just need to adjust the user space tools to deal with this occurrence.
> As for what the system is doing, in my home case it's a Centos 7 VM running a tomcat service which only gets busy every 20 minutes and the other is a HPE Z800 running Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist in 'validating'/testing the delay. Advise and I will run.

I took a (slightly) closer look at the queuing code just now and the
corner case I suspected doesn't look very promising here.  On failure
to send a record to userspace, the record is put back at the front of
the queue to preserve ordering so the kernel *should* still emit
records in order even when auditd is under pressure, dead, or in the
process of a restart.  Also, in this case the kernel kicks the auditd
connection, disconnecting auditd's netlink socket; I imagine that
would be a notable event on your systems.

However, looking at the timestamps in the audit events you posted, I
noticed something a little odd.  If I align the timestamps a bit
better let's see if it jumps out at you guys ...

type=XXX msg=audit(1609519900.159:44606): ...
type=XXX msg=audit(1609519900.161:44607): ...
type=XXX msg=audit(1609519900.163:44608): ...
type=XXX msg=audit(1609519896.829:44609): ...
type=XXX msg=audit(1609519900.170:44610): ...

Let me guess Burn, you're running NTP/PTP :)  I'm not sure how auditd
handles things like this, but it looks like there was a small negative
time correction between events which caused the odd scenario where
event N+1 actually occurred before event N according to the wall
clock.  In other words, your system is time traveling ;)

-- 
paul moore
www.paul-moore.com


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19 19:11                             ` Paul Moore
@ 2021-01-19 19:38                               ` Burn Alting
  2021-01-19 20:26                                 ` Paul Moore
  2021-01-19 20:42                               ` Richard Guy Briggs
  1 sibling, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-19 19:38 UTC (permalink / raw)
  To: Paul Moore; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 5028 bytes --]

On Tue, 2021-01-19 at 14:11 -0500, Paul Moore wrote:
> On Tue, Jan 19, 2021 at 3:18 AM Burn Alting <burn.alting@iinet.net.au> wrote:
> > I tend to have a rigorous auditing posture (see the rules loaded in 
> > https://github.com/linux-audit/audit-userspace/issues/148) which is not normal
> > for most. Perhaps, Paul, you have hit the nail on the head by stating that this
> > 'severe delay' is not that unreasonable given my rules posture and we just need
> > to 'deal with it' in user space.We still get the event data, I just need to
> > adjust the user space tools to deal with this occurrence.As for what the system
> > is doing, in my home case it's a Centos 7 VM running a tomcat service which only
> > gets busy every 20 minutes and the other is a HPE Z800 running Centos 8 with 4-5 
> > VM's mostly dormant. I can put any code in these hosts to assist in
> > 'validating'/testing the delay. Advise and I will run.
> 
> I took a (slightly) closer look at the queuing code just now and thecorner case I
> suspected doesn't look very promising here.  On failureto send a record to
> userspace, the record is put back at the front ofthe queue to preserve ordering so
> the kernel *should* still emitrecords in order even when auditd is under pressure,
> dead, or in theprocess of a restart.  Also, in this case the kernel kicks the
> auditdconnection, disconnecting auditd's netlink socket; I imagine thatwould be a
> notable event on your systems.
> However, looking at the timestamps in the audit events you posted, Inoticed
> something a little odd.  If I align the timestamps a bitbetter let's see if it
> jumps out at you guys ...
> type=XXX msg=audit(1609519900.159:44606): ...type=XXX
> msg=audit(1609519900.161:44607): ...type=XXX msg=audit(1609519900.163:44608):
> ...type=XXX msg=audit(1609519896.829:44609): ...type=XXX
> msg=audit(1609519900.170:44610): ...
> Let me guess Burn, you're running NTP/PTP :)  I'm not sure how auditdhandles
> things like this, but it looks like there was a small negativetime correction
> between events which caused the odd scenario whereevent N+1 actually occurred
> before event N according to the wallclock.  In other words, your system is time
> traveling ;)
. All systems use chrony (current NTP daemon). One is a VM (on top of KVM) and the
other a bare metal deployment. Does the above explain my second data set (in the
issue)  from a bare metal Centos 8 host?Perhaps Lenny's comments bare investigation.
Either way, I will offer a patch to the user space code to, based on a configuration
value, manage correctly such activity.
msg=audit(1609920994.481:1787844):msg=audit(1609920994.481:1787844):msg=audit(160992
0994.481:1787844):msg=audit(1609920994.481:1787844):msg=audit(1609920994.481:1787845
):msg=audit(1609920994.481:1787845):msg=audit(1609920994.481:1787845):msg=audit(1609
920994.481:1787845):msg=audit(1609920994.481:1787845):msg=audit(1609920994.481:17878
45):msg=audit(1609920994.481:1787845):msg=audit(1609920994.481:1787846):msg=audit(16
09920994.481:1787846):msg=audit(1609920994.481:1787846):msg=audit(1609920994.481:178
7846):msg=audit(1609920994.481:1787846):msg=audit(1609920994.482:1787847):msg=audit(
1609920994.482:1787847):msg=audit(1609920994.482:1787847):msg=audit(1609920994.482:1
787847):msg=audit(1609920994.482:1787847):msg=audit(1609920994.483:1787848):msg=audi
t(1609920994.483:1787848):msg=audit(1609920994.483:1787848):msg=audit(1609920994.483
:1787848):msg=audit(1609920994.483:1787848):msg=audit(1609920994.484:1787849):msg=au
dit(1609920994.484:1787849):msg=audit(1609921000.636:1787850):msg=audit(1609921000.6
36:1787850):msg=audit(1609921000.636:1787850):msg=audit(1609921008.456:1787851):msg=
audit(1609921008.456:1787851):msg=audit(1609921008.456:1787851):msg=audit(1609921008
.456:1787851):msg=audit(1609921008.456:1787851):msg=audit(1609921008.456:1787851):ms
g=audit(1609920994.484:1787849):msg=audit(1609920994.484:1787849):msg=audit(16099209
94.484:1787849):msg=audit(1609921010.837:1787852):msg=audit(1609921010.837:1787852):
msg=audit(1609921010.837:1787852):msg=audit(1609921010.837:1787852):msg=audit(160992
1010.837:1787853):msg=audit(1609921010.837:1787853):msg=audit(1609921010.837:1787853
):msg=audit(1609921010.837:1787853):msg=audit(1609921010.837:1787853):msg=audit(1609
921010.837:1787853):msg=audit(1609921010.837:1787853):msg=audit(1609921010.838:17878
54):msg=audit(1609921010.838:1787854):msg=audit(1609921010.838:1787854):msg=audit(16
09921010.838:1787854):msg=audit(1609921010.838:1787854):msg=audit(1609921010.839:178
7855):msg=audit(1609921010.839:1787855):msg=audit(1609921010.839:1787855):msg=audit(
1609921010.839:1787855):msg=audit(1609921010.839:1787855):msg=audit(1609921010.840:1
787856):msg=audit(1609921010.840:1787856):msg=audit(1609921010.840:1787856):msg=audi
t(1609921010.840:1787856):msg=audit(1609921010.840:1787856):msg=audit(1609921010.841
:1787857):msg=audit(1609921010.841:1787857):msg=audit(1609921010.841:1787857):msg=au
dit(1609921010.841:1787857):msg=audit(1609921010.841:1787857):


[-- Attachment #1.2: Type: text/html, Size: 6728 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19 19:38                               ` Burn Alting
@ 2021-01-19 20:26                                 ` Paul Moore
  2021-01-19 21:51                                   ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2021-01-19 20:26 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Tue, Jan 19, 2021 at 2:38 PM Burn Alting <burn.alting@iinet.net.au> wrote:
> All systems use chrony (current NTP daemon). One is a VM (on top of KVM) and the other a bare metal deployment. Does the above explain my second data set (in the issue) from a bare metal Centos 8 host?
> Perhaps Lenny's comments bare investigation. Either way, I will offer a patch to the user space code to, based on a configuration value, manage correctly such activity.

...

> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609921000.636:1787850):
> msg=audit(1609921000.636:1787850):
> msg=audit(1609921000.636:1787850):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609921010.837:1787852):
> msg=audit(1609921010.837:1787852):
> msg=audit(1609921010.837:1787852):
> msg=audit(1609921010.837:1787852):

Looking at the extracted snippet above where event 1787849 is out of
order we see the following timestamps:

> msg=audit(1609920994.483:1787848):
> msg=audit(1609920994.484:1787849):
> msg=audit(1609921000.636:1787850):
> msg=audit(1609921008.456:1787851):
> msg=audit(1609921010.837:1787852):

... which looks correct in as much that the time doesn't appear to go
backwards between events.  As I said before, I'm not sure how Steve's
userspace works so the time may be a red herring.

Barring some weird condition where auditd disconnects and quickly
reconnects to the kernel, and/or dies and is replaced quickly, I'm not
seeing anything obvious in the kernel which would cause this.  I'm not
saying there isn't anything there, just that it isn't obvious to me at
the moment :)

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19 19:11                             ` Paul Moore
  2021-01-19 19:38                               ` Burn Alting
@ 2021-01-19 20:42                               ` Richard Guy Briggs
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Guy Briggs @ 2021-01-19 20:42 UTC (permalink / raw)
  To: Paul Moore; +Cc: Linux Audit

On 2021-01-19 14:11, Paul Moore wrote:
> On Tue, Jan 19, 2021 at 3:18 AM Burn Alting <burn.alting@iinet.net.au> wrote:
> > I tend to have a rigorous auditing posture (see the rules loaded in https://github.com/linux-audit/audit-userspace/issues/148) which is not normal for most. Perhaps, Paul, you have hit the nail on the head by stating that this 'severe delay' is not that unreasonable given my rules posture and we just need to 'deal with it' in user space.
> > We still get the event data, I just need to adjust the user space tools to deal with this occurrence.
> > As for what the system is doing, in my home case it's a Centos 7 VM running a tomcat service which only gets busy every 20 minutes and the other is a HPE Z800 running Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist in 'validating'/testing the delay. Advise and I will run.
> 
> I took a (slightly) closer look at the queuing code just now and the
> corner case I suspected doesn't look very promising here.  On failure
> to send a record to userspace, the record is put back at the front of
> the queue to preserve ordering so the kernel *should* still emit
> records in order even when auditd is under pressure, dead, or in the
> process of a restart.  Also, in this case the kernel kicks the auditd
> connection, disconnecting auditd's netlink socket; I imagine that
> would be a notable event on your systems.
> 
> However, looking at the timestamps in the audit events you posted, I
> noticed something a little odd.  If I align the timestamps a bit
> better let's see if it jumps out at you guys ...
> 
> type=XXX msg=audit(1609519900.159:44606): ...
> type=XXX msg=audit(1609519900.161:44607): ...
> type=XXX msg=audit(1609519900.163:44608): ...
> type=XXX msg=audit(1609519896.829:44609): ...
> type=XXX msg=audit(1609519900.170:44610): ...
> 
> Let me guess Burn, you're running NTP/PTP :)  I'm not sure how auditd
> handles things like this, but it looks like there was a small negative
> time correction between events which caused the odd scenario where
> event N+1 actually occurred before event N according to the wall
> clock.  In other words, your system is time traveling ;)

Interesting...

The timestamp is assigned on syscall entry.
The serial number is assigned on the creation of the first audit record of an event.

>From these timings above, NTP/PTP could explain this, but the third and
fifth are too close together to make that likely, I think.

I'd blame time namespaces, but the host system is all in the same time
namespace.

I think I'd need to see a larger sequence to be convinced of this...

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19 20:26                                 ` Paul Moore
@ 2021-01-19 21:51                                   ` Steve Grubb
  2021-01-20  6:38                                     ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-19 21:51 UTC (permalink / raw)
  To: burn, Paul Moore; +Cc: Richard Guy Briggs, Linux Audit

On Tuesday, January 19, 2021 3:26:04 PM EST Paul Moore wrote:
> On Tue, Jan 19, 2021 at 2:38 PM Burn Alting <burn.alting@iinet.net.au> 
wrote:
> > All systems use chrony (current NTP daemon). One is a VM (on top of KVM)
> > and the other a bare metal deployment. Does the above explain my second
> > data set (in the issue) from a bare metal Centos 8 host? Perhaps Lenny's
> > comments bare investigation. Either way, I will offer a patch to the
> > user space code to, based on a configuration value, manage correctly
> > such activity.
> ...
> 
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609921000.636:1787850):
> > msg=audit(1609921000.636:1787850):
> > msg=audit(1609921000.636:1787850):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609921008.456:1787851):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609921010.837:1787852):
> > msg=audit(1609921010.837:1787852):
> > msg=audit(1609921010.837:1787852):
> 
> > msg=audit(1609921010.837:1787852):
> Looking at the extracted snippet above where event 1787849 is out of
> 
> order we see the following timestamps:
> > msg=audit(1609920994.483:1787848):
> > msg=audit(1609920994.484:1787849):
> > msg=audit(1609921000.636:1787850):
> > msg=audit(1609921008.456:1787851):
> 
> > msg=audit(1609921010.837:1787852):
>
> ... which looks correct in as much that the time doesn't appear to go
> backwards between events.  As I said before, I'm not sure how Steve's
> userspace works so the time may be a red herring.

It only handles one record at a time. No chance to mix things up.

The github issue says that 30-stig.rules is being used. If the system time 
changed with chrony, I would expect syscall events with adjtimex. But the 
only ones given are execve.

-Steve

> Barring some weird condition where auditd disconnects and quickly
> reconnects to the kernel, and/or dies and is replaced quickly, I'm not
> seeing anything obvious in the kernel which would cause this.  I'm not
> saying there isn't anything there, just that it isn't obvious to me at
> the moment :)




--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-19 21:51                                   ` Steve Grubb
@ 2021-01-20  6:38                                     ` Burn Alting
  2021-01-20 22:50                                       ` Paul Moore
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-20  6:38 UTC (permalink / raw)
  To: Steve Grubb, Paul Moore; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 3186 bytes --]

All,
How is the following for a way forward.
a. I will author a patch to the user space code to correctly parse this condition
and submit it on the weekend. It will be via a new configuration item to auditd.conf
just in case placing a fixed extended timeout (15-20 secs) affects memory usage for
users of the auparse library. This solves the initial problem of ausearch/auparse
failing to parse generated audit.b. I am happy to instrument what ever is
recommended on my hosts at home (vm's and bare metal) to provide more information,
should we want to 'explain' the occurrence, given I see this every week or two and
report back.
On Tue, 2021-01-19 at 16:51 -0500, Steve Grubb wrote:
> On Tuesday, January 19, 2021 3:26:04 PM EST Paul Moore wrote:
> > On Tue, Jan 19, 2021 at 2:38 PM Burn Alting <burn.alting@iinet.net.au> 
> wrote:
> > > All systems use chrony (current NTP daemon). One is a VM (on top of KVM)and
> > > the other a bare metal deployment. Does the above explain my seconddata set
> > > (in the issue) from a bare metal Centos 8 host? Perhaps Lenny'scomments bare
> > > investigation. Either way, I will offer a patch to theuser space code to,
> > > based on a configuration value, manage correctlysuch activity.
> > ...
> > > msg=audit(1609920994.483:1787848):msg=audit(1609920994.483:1787848):msg=audit(
> > > 1609920994.483:1787848):msg=audit(1609920994.483:1787848):msg=audit(1609920994
> > > .483:1787848):msg=audit(1609920994.484:1787849):msg=audit(1609920994.484:17878
> > > 49):msg=audit(1609921000.636:1787850):msg=audit(1609921000.636:1787850):msg=au
> > > dit(1609921000.636:1787850):msg=audit(1609921008.456:1787851):msg=audit(160992
> > > 1008.456:1787851):msg=audit(1609921008.456:1787851):msg=audit(1609921008.456:1
> > > 787851):msg=audit(1609921008.456:1787851):msg=audit(1609921008.456:1787851):ms
> > > g=audit(1609920994.484:1787849):msg=audit(1609920994.484:1787849):msg=audit(16
> > > 09920994.484:1787849):msg=audit(1609921010.837:1787852):msg=audit(1609921010.8
> > > 37:1787852):msg=audit(1609921010.837:1787852):
> > > msg=audit(1609921010.837:1787852):
> > Looking at the extracted snippet above where event 1787849 is out of
> > order we see the following timestamps:
> > > msg=audit(1609920994.483:1787848):msg=audit(1609920994.484:1787849):msg=audit(
> > > 1609921000.636:1787850):msg=audit(1609921008.456:1787851):
> > > msg=audit(1609921010.837:1787852):
> > 
> > ... which looks correct in as much that the time doesn't appear to gobackwards
> > between events.  As I said before, I'm not sure how Steve'suserspace works so
> > the time may be a red herring.
> 
> It only handles one record at a time. No chance to mix things up.
> The github issue says that 30-stig.rules is being used. If the system time changed
> with chrony, I would expect syscall events with adjtimex. But the only ones given
> are execve.
> -Steve
> > Barring some weird condition where auditd disconnects and quicklyreconnects to
> > the kernel, and/or dies and is replaced quickly, I'm notseeing anything obvious
> > in the kernel which would cause this.  I'm notsaying there isn't anything there,
> > just that it isn't obvious to me atthe moment :)
> 
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 5009 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-20  6:38                                     ` Burn Alting
@ 2021-01-20 22:50                                       ` Paul Moore
  2021-01-23 22:55                                         ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Moore @ 2021-01-20 22:50 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Wed, Jan 20, 2021 at 1:38 AM Burn Alting <burn.alting@iinet.net.au> wrote:
> All,
>
> How is the following for a way forward.
>
> a. I will author a patch to the user space code to correctly parse this condition and submit it on the weekend. It will be via a new configuration item to auditd.conf just in case placing a fixed extended timeout (15-20 secs) affects memory usage for users of the auparse library. This solves the initial problem of ausearch/auparse failing to parse generated audit.
> b. I am happy to instrument what ever is recommended on my hosts at home (vm's and bare metal) to provide more information, should we want to 'explain' the occurrence, given I see this every week or two and report back.

Seems reasonable to me.

-- 
paul moore
www.paul-moore.com


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-20 22:50                                       ` Paul Moore
@ 2021-01-23 22:55                                         ` Burn Alting
  2021-01-25 23:53                                           ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-23 22:55 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 1257 bytes --]

On Wed, 2021-01-20 at 17:50 -0500, Paul Moore wrote:
> On Wed, Jan 20, 2021 at 1:38 AM Burn Alting <burn.alting@iinet.net.au> wrote:
> > All,
> > How is the following for a way forward.
> > a. I will author a patch to the user space code to correctly parse this
> > condition and submit it on the weekend. It will be via a new configuration item
> > to auditd.conf just in case placing a fixed extended timeout (15-20 secs)
> > affects memory usage for users of the auparse library. This solves the initial
> > problem of ausearch/auparse failing to parse generated audit.b. I am happy to
> > instrument what ever is recommended on my hosts at home (vm's and bare metal) to
> > provide more information, should we want to 'explain' the occurrence, given I
> > see this every week or two and report back.
> 
> Seems reasonable to me.
Steve,
I can implement the 'end_of_event_timeout' change either as
i. a command line argument to ausearch/aureport (say --eoetmo secs) and a new pair
of library functions within the  auparse() stable (say auparse_set_eoe_timeout() and
auparse_get_eoe_timeout())
or
ii. a configuration item in /etc/audit/auditd.conf, or


Which is your preference? Mine is i. as this is a user space processing change, not
a demon change.



[-- Attachment #1.2: Type: text/html, Size: 2209 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-23 22:55                                         ` Burn Alting
@ 2021-01-25 23:53                                           ` Steve Grubb
  2021-01-26  0:11                                             ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-25 23:53 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > How is the following for a way forward.
> > > a. I will author a patch to the user space code to correctly parse this
> > > condition and submit it on the weekend. It will be via a new
> > > configuration item to auditd.conf just in case placing a fixed
> > > extended timeout (15-20 secs) affects memory usage for users of the
> > > auparse library. This solves the initial problem of ausearch/auparse
> > > failing to parse generated audit.b. I am happy to instrument what ever
> > > is recommended on my hosts at home (vm's and bare metal) to provide
> > > more information, should we want to 'explain' the occurrence, given I
> > > see this every week or two and report back.
> > 
> > Seems reasonable to me.
> 
> I can implement the 'end_of_event_timeout' change either as
> i. a command line argument to ausearch/aureport (say --eoetmo secs) and a
> new pair of library functions within the  auparse() stable (say
> auparse_set_eoe_timeout() and auparse_get_eoe_timeout())
> or
> ii. a configuration item in /etc/audit/auditd.conf, or
> 
> 
> Which is your preference? Mine is i. as this is a user space processing
> change, not a demon change.

To be honest, I'm not entirely sure what we're seeing. I run some tests today 
on my system. It's seeing issues also. I'd still like to treat the root cause 
of this. But we do need to change the default. That I what I'm trying to 
figure out.

Back to your question, I'm wondering if we should do both? A changeable 
default in auditd.conf and an override on the command line.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-25 23:53                                           ` Steve Grubb
@ 2021-01-26  0:11                                             ` Burn Alting
  2021-01-26  0:20                                               ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-26  0:11 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 1956 bytes --]

On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > How is the following for a way forward.
> > > > a. I will author a patch to the user space code to correctly parse this
> > > > condition and submit it on the weekend. It will be via a new
> > > > configuration item to auditd.conf just in case placing a fixed
> > > > extended timeout (15-20 secs) affects memory usage for users of the
> > > > auparse library. This solves the initial problem of ausearch/auparse
> > > > failing to parse generated audit.b. I am happy to instrument what ever
> > > > is recommended on my hosts at home (vm's and bare metal) to provide
> > > > more information, should we want to 'explain' the occurrence, given I
> > > > see this every week or two and report back.
> > > 
> > > Seems reasonable to me.
> > 
> > I can implement the 'end_of_event_timeout' change either as
> > i. a command line argument to ausearch/aureport (say --eoetmo secs) and a
> > new pair of library functions within the  auparse() stable (say
> > auparse_set_eoe_timeout() and auparse_get_eoe_timeout())
> > or
> > ii. a configuration item in /etc/audit/auditd.conf, or
> > 
> > 
> > Which is your preference? Mine is i. as this is a user space processing
> > change, not a demon change.
> 
> To be honest, I'm not entirely sure what we're seeing. I run some tests today 
> on my system. It's seeing issues also. I'd still like to treat the root cause 
> of this. But we do need to change the default. That I what I'm trying to 
> figure out.
> 
> Back to your question, I'm wondering if we should do both? A changeable 
> default in auditd.conf and an override on the command line.

So far, all items in /etc/audit/auditd.conf appear to only affect the daemon. Is
this the right location to start adding non-daemon configuration items? (I accept
there is no other place).

Happy to do both, if required.

> -Steve
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 2896 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-26  0:11                                             ` Burn Alting
@ 2021-01-26  0:20                                               ` Steve Grubb
  2021-01-26  0:29                                                 ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-26  0:20 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Monday, January 25, 2021 7:11:45 PM EST Burn Alting wrote:
> On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> > On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > > How is the following for a way forward.
> > > > > a. I will author a patch to the user space code to correctly parse
> > > > > this
> > > > > condition and submit it on the weekend. It will be via a new
> > > > > configuration item to auditd.conf just in case placing a fixed
> > > > > extended timeout (15-20 secs) affects memory usage for users of the
> > > > > auparse library. This solves the initial problem of
> > > > > ausearch/auparse
> > > > > failing to parse generated audit.b. I am happy to instrument what
> > > > > ever
> > > > > is recommended on my hosts at home (vm's and bare metal) to provide
> > > > > more information, should we want to 'explain' the occurrence, given
> > > > > I
> > > > > see this every week or two and report back.
> > > > 
> > > > Seems reasonable to me.
> > > 
> > > I can implement the 'end_of_event_timeout' change either as
> > > i. a command line argument to ausearch/aureport (say --eoetmo secs) and
> > > a
> > > new pair of library functions within the  auparse() stable (say
> > > auparse_set_eoe_timeout() and auparse_get_eoe_timeout())
> > > or
> > > ii. a configuration item in /etc/audit/auditd.conf, or
> > > 
> > > 
> > > Which is your preference? Mine is i. as this is a user space processing
> > > change, not a demon change.
> > 
> > To be honest, I'm not entirely sure what we're seeing. I run some tests
> > today on my system. It's seeing issues also. I'd still like to treat the
> > root cause of this. But we do need to change the default. That I what
> > I'm trying to figure out.
> > 
> > Back to your question, I'm wondering if we should do both? A changeable
> > default in auditd.conf and an override on the command line.
> 
> So far, all items in /etc/audit/auditd.conf appear to only affect the
> daemon. Is this the right location to start adding non-daemon
> configuration items? (I accept there is no other place).

ausearch/report/auparse all read the auditd.conf to find the canonical 
location for where the logs are supposed to be. So, they already read this 
file. I'd rather keep it there than make yet another config. The only drawback 
it that it might again confuse people that auditd really doesn't do anything 
with the records but just some light processing.

-Steve

> Happy to do both, if required.
> 
> > -Steve




--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-26  0:20                                               ` Steve Grubb
@ 2021-01-26  0:29                                                 ` Burn Alting
  2021-01-26 11:53                                                   ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-26  0:29 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 2718 bytes --]

On Mon, 2021-01-25 at 19:20 -0500, Steve Grubb wrote:
> On Monday, January 25, 2021 7:11:45 PM EST Burn Alting wrote:
> > On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> > > On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > > > How is the following for a way forward.
> > > > > > a. I will author a patch to the user space code to correctly parse
> > > > > > this
> > > > > > condition and submit it on the weekend. It will be via a new
> > > > > > configuration item to auditd.conf just in case placing a fixed
> > > > > > extended timeout (15-20 secs) affects memory usage for users of the
> > > > > > auparse library. This solves the initial problem of
> > > > > > ausearch/auparse
> > > > > > failing to parse generated audit.b. I am happy to instrument what
> > > > > > ever
> > > > > > is recommended on my hosts at home (vm's and bare metal) to provide
> > > > > > more information, should we want to 'explain' the occurrence, given
> > > > > > I
> > > > > > see this every week or two and report back.
> > > > > 
> > > > > Seems reasonable to me.
> > > > 
> > > > I can implement the 'end_of_event_timeout' change either as
> > > > i. a command line argument to ausearch/aureport (say --eoetmo secs) and
> > > > a
> > > > new pair of library functions within the  auparse() stable (say
> > > > auparse_set_eoe_timeout() and auparse_get_eoe_timeout())
> > > > or
> > > > ii. a configuration item in /etc/audit/auditd.conf, or
> > > > 
> > > > 
> > > > Which is your preference? Mine is i. as this is a user space processing
> > > > change, not a demon change.
> > > 
> > > To be honest, I'm not entirely sure what we're seeing. I run some tests
> > > today on my system. It's seeing issues also. I'd still like to treat the
> > > root cause of this. But we do need to change the default. That I what
> > > I'm trying to figure out.
> > > 
> > > Back to your question, I'm wondering if we should do both? A changeable
> > > default in auditd.conf and an override on the command line.
> > 
> > So far, all items in /etc/audit/auditd.conf appear to only affect the
> > daemon. Is this the right location to start adding non-daemon
> > configuration items? (I accept there is no other place).
> 
> ausearch/report/auparse all read the auditd.conf to find the canonical 
> location for where the logs are supposed to be. So, they already read this 
> file. I'd rather keep it there than make yet another config. The only drawback 
> it that it might again confuse people that auditd really doesn't do anything 
> with the records but just some light processing.

OK. I will put it in /etc/audit/auditd.conf


> -Steve
> 
> > Happy to do both, if required.
> > 
> > > -Steve
> 
> 
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 4108 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-26  0:29                                                 ` Burn Alting
@ 2021-01-26 11:53                                                   ` Burn Alting
  2021-01-26 20:42                                                     ` Steve Grubb
  0 siblings, 1 reply; 32+ messages in thread
From: Burn Alting @ 2021-01-26 11:53 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 3231 bytes --]

On Tue, 2021-01-26 at 11:29 +1100, Burn Alting wrote:
> On Mon, 2021-01-25 at 19:20 -0500, Steve Grubb wrote:
> > On Monday, January 25, 2021 7:11:45 PM EST Burn Alting wrote:
> > > On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> > > > On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > > > > How is the following for a way forward.a. I will author a patch to the
> > > > > > > user space code to correctly parsethiscondition and submit it on the
> > > > > > > weekend. It will be via a newconfiguration item to auditd.conf just in
> > > > > > > case placing a fixedextended timeout (15-20 secs) affects memory usage
> > > > > > > for users of theauparse library. This solves the initial problem
> > > > > > > ofausearch/auparsefailing to parse generated audit.b. I am happy to
> > > > > > > instrument whateveris recommended on my hosts at home (vm's and bare
> > > > > > > metal) to providemore information, should we want to 'explain' the
> > > > > > > occurrence, givenIsee this every week or two and report back.
> > > > > > 
> > > > > > Seems reasonable to me.
> > > > > 
> > > > > I can implement the 'end_of_event_timeout' change either asi. a command
> > > > > line argument to ausearch/aureport (say --eoetmo secs) andanew pair of
> > > > > library functions within the  auparse() stable
> > > > > (sayauparse_set_eoe_timeout() and auparse_get_eoe_timeout())orii. a
> > > > > configuration item in /etc/audit/auditd.conf, or
> > > > > 
> > > > > Which is your preference? Mine is i. as this is a user space
> > > > > processingchange, not a demon change.
> > > > 
> > > > To be honest, I'm not entirely sure what we're seeing. I run some teststoday
> > > > on my system. It's seeing issues also. I'd still like to treat theroot cause
> > > > of this. But we do need to change the default. That I whatI'm trying to
> > > > figure out.
> > > > Back to your question, I'm wondering if we should do both? A
> > > > changeabledefault in auditd.conf and an override on the command line.
> > > 
> > > So far, all items in /etc/audit/auditd.conf appear to only affect thedaemon.
> > > Is this the right location to start adding non-daemonconfiguration items? (I
> > > accept there is no other place).
> > 
> > ausearch/report/auparse all read the auditd.conf to find the canonical location
> > for where the logs are supposed to be. So, they already read this file. I'd
> > rather keep it there than make yet another config. The only drawback it that it
> > might again confuse people that auditd really doesn't do anything with the
> > records but just some light processing.
> 
> OK. I will put it in /etc/audit/auditd.conf

One question with this solution. If the user does not have read permission to
/etc/audit/auditd.conf, then any change cannot take effect. The default mode for
this file is 640 to root, so a non-root user could never change the timeout.

Should I also add
- a command line argument to ausearch/aureport (say --eoetmo secs) and,
- a pair of new auparse() functions - auparse_set_eoe_timeout() and
auparse_get_eoe_timeout()
so that non root users can make use of the new configuration item.

Also, do you want the default timeout to be 2 seconds or should I make it higher.



[-- Attachment #1.2: Type: text/html, Size: 4345 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-26 11:53                                                   ` Burn Alting
@ 2021-01-26 20:42                                                     ` Steve Grubb
  2021-01-27 12:12                                                       ` Burn Alting
  0 siblings, 1 reply; 32+ messages in thread
From: Steve Grubb @ 2021-01-26 20:42 UTC (permalink / raw)
  To: burn; +Cc: Richard Guy Briggs, Linux Audit

On Tuesday, January 26, 2021 6:53:31 AM EST Burn Alting wrote:
> On Tue, 2021-01-26 at 11:29 +1100, Burn Alting wrote:
> > On Mon, 2021-01-25 at 19:20 -0500, Steve Grubb wrote:
> > > On Monday, January 25, 2021 7:11:45 PM EST Burn Alting wrote:
> > > > On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> > > > > On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > > > > > How is the following for a way forward.a. I will author a
> > > > > > > > patch to the
> > > > > > > > user space code to correctly parsethiscondition and submit it
> > > > > > > > on the
> > > > > > > > weekend. It will be via a newconfiguration item to
> > > > > > > > auditd.conf just in
> > > > > > > > case placing a fixedextended timeout (15-20 secs) affects
> > > > > > > > memory usage
> > > > > > > > for users of theauparse library. This solves the initial
> > > > > > > > problem
> > > > > > > > ofausearch/auparsefailing to parse generated audit.b. I am
> > > > > > > > happy to
> > > > > > > > instrument whateveris recommended on my hosts at home (vm's
> > > > > > > > and bare
> > > > > > > > metal) to providemore information, should we want to
> > > > > > > > 'explain' the
> > > > > > > > occurrence, givenIsee this every week or two and report back.
> > > > > > > 
> > > > > > > Seems reasonable to me.
> > > > > > 
> > > > > > I can implement the 'end_of_event_timeout' change either asi. a
> > > > > > command
> > > > > > line argument to ausearch/aureport (say --eoetmo secs) andanew
> > > > > > pair of
> > > > > > library functions within the  auparse() stable
> > > > > > (sayauparse_set_eoe_timeout() and auparse_get_eoe_timeout())orii.
> > > > > > a
> > > > > > configuration item in /etc/audit/auditd.conf, or
> > > > > > 
> > > > > > Which is your preference? Mine is i. as this is a user space
> > > > > > processingchange, not a demon change.
> > > > > 
> > > > > To be honest, I'm not entirely sure what we're seeing. I run some
> > > > > teststoday on my system. It's seeing issues also. I'd still like
> > > > > to treat theroot cause of this. But we do need to change the
> > > > > default. That I whatI'm trying to figure out.
> > > > > Back to your question, I'm wondering if we should do both? A
> > > > > changeabledefault in auditd.conf and an override on the command
> > > > > line.
> > > > 
> > > > So far, all items in /etc/audit/auditd.conf appear to only affect
> > > > thedaemon. Is this the right location to start adding
> > > > non-daemonconfiguration items? (I accept there is no other place).
> > > 
> > > ausearch/report/auparse all read the auditd.conf to find the canonical
> > > location for where the logs are supposed to be. So, they already read
> > > this file. I'd rather keep it there than make yet another config. The
> > > only drawback it that it might again confuse people that auditd really
> > > doesn't do anything with the records but just some light processing.
> > 
> > OK. I will put it in /etc/audit/auditd.conf
> 
> One question with this solution. If the user does not have read permission
> to /etc/audit/auditd.conf, then any change cannot take effect. The default
> mode for this file is 640 to root, so a non-root user could never change
> the timeout.

Right, but since they cannot access the logs, it's not a problem in general.  
But if they so happen to have a local copy of logs, then the command line 
override should allow them to correct this. I am also reviewing things to see 
if a better default can be picked.

> Should I also add - a command line argument to ausearch/aureport (say --
> eoetmo secs) and, - a pair of new auparse() functions -
> auparse_set_eoe_timeout() and  auparse_get_eoe_timeout()
> so that non root users can make use of the new configuration item.

Yes, that is what I meant by doing both. We have default in auditd.conf that 
works for everyone with direct audit access. We have a commandline option for 
overriding the auditd.conf value.

Although, I don't know why we would want to get the eoe_timeout value? I 
can't imagine a use for it right now. 

As for ausearch/report, let's just make a long option --eoe-timeout

-Steve

> Also, do you want the default timeout to be 2 seconds or should I make it
> higher.

I'm likely to adjust it, but I'm still looking to see what is happening. Just 
go with the 2 second default for now.

Thanks,
-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Occasional delayed output of events
  2021-01-26 20:42                                                     ` Steve Grubb
@ 2021-01-27 12:12                                                       ` Burn Alting
  0 siblings, 0 replies; 32+ messages in thread
From: Burn Alting @ 2021-01-27 12:12 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, Linux Audit


[-- Attachment #1.1: Type: text/plain, Size: 4630 bytes --]

On Tue, 2021-01-26 at 15:42 -0500, Steve Grubb wrote:
> On Tuesday, January 26, 2021 6:53:31 AM EST Burn Alting wrote:
> > On Tue, 2021-01-26 at 11:29 +1100, Burn Alting wrote:
> > > On Mon, 2021-01-25 at 19:20 -0500, Steve Grubb wrote:
> > > > On Monday, January 25, 2021 7:11:45 PM EST Burn Alting wrote:
> > > > > On Mon, 2021-01-25 at 18:53 -0500, Steve Grubb wrote:
> > > > > > On Saturday, January 23, 2021 5:55:44 PM EST Burn Alting wrote:
> > > > > > > > > How is the following for a way forward.a. I will author a
> > > > > > > > > patch to the
> > > > > > > > > user space code to correctly parsethiscondition and submit it
> > > > > > > > > on the
> > > > > > > > > weekend. It will be via a newconfiguration item to
> > > > > > > > > auditd.conf just in
> > > > > > > > > case placing a fixedextended timeout (15-20 secs) affects
> > > > > > > > > memory usage
> > > > > > > > > for users of theauparse library. This solves the initial
> > > > > > > > > problem
> > > > > > > > > ofausearch/auparsefailing to parse generated audit.b. I am
> > > > > > > > > happy to
> > > > > > > > > instrument whateveris recommended on my hosts at home (vm's
> > > > > > > > > and bare
> > > > > > > > > metal) to providemore information, should we want to
> > > > > > > > > 'explain' the
> > > > > > > > > occurrence, givenIsee this every week or two and report back.
> > > > > > > > 
> > > > > > > > Seems reasonable to me.
> > > > > > > 
> > > > > > > I can implement the 'end_of_event_timeout' change either asi. a
> > > > > > > command
> > > > > > > line argument to ausearch/aureport (say --eoetmo secs) andanew
> > > > > > > pair of
> > > > > > > library functions within the  auparse() stable
> > > > > > > (sayauparse_set_eoe_timeout() and auparse_get_eoe_timeout())orii.
> > > > > > > a
> > > > > > > configuration item in /etc/audit/auditd.conf, or
> > > > > > > 
> > > > > > > Which is your preference? Mine is i. as this is a user space
> > > > > > > processingchange, not a demon change.
> > > > > > 
> > > > > > To be honest, I'm not entirely sure what we're seeing. I run some
> > > > > > teststoday on my system. It's seeing issues also. I'd still like
> > > > > > to treat theroot cause of this. But we do need to change the
> > > > > > default. That I whatI'm trying to figure out.
> > > > > > Back to your question, I'm wondering if we should do both? A
> > > > > > changeabledefault in auditd.conf and an override on the command
> > > > > > line.
> > > > > 
> > > > > So far, all items in /etc/audit/auditd.conf appear to only affect
> > > > > thedaemon. Is this the right location to start adding
> > > > > non-daemonconfiguration items? (I accept there is no other place).
> > > > 
> > > > ausearch/report/auparse all read the auditd.conf to find the canonical
> > > > location for where the logs are supposed to be. So, they already read
> > > > this file. I'd rather keep it there than make yet another config. The
> > > > only drawback it that it might again confuse people that auditd really
> > > > doesn't do anything with the records but just some light processing.
> > > 
> > > OK. I will put it in /etc/audit/auditd.conf
> > 
> > One question with this solution. If the user does not have read permission
> > to /etc/audit/auditd.conf, then any change cannot take effect. The default
> > mode for this file is 640 to root, so a non-root user could never change
> > the timeout.
> 
> Right, but since they cannot access the logs, it's not a problem in general.  
> But if they so happen to have a local copy of logs, then the command line 
> override should allow them to correct this. I am also reviewing things to see 
> if a better default can be picked.
> 
> > Should I also add - a command line argument to ausearch/aureport (say --
> > eoetmo secs) and, - a pair of new auparse() functions -
> > auparse_set_eoe_timeout() and  auparse_get_eoe_timeout()
> > so that non root users can make use of the new configuration item.
> 
> Yes, that is what I meant by doing both. We have default in auditd.conf that 
> works for everyone with direct audit access. We have a commandline option for 
> overriding the auditd.conf value.
> 
> Although, I don't know why we would want to get the eoe_timeout value? I 
> can't imagine a use for it right now. 
> 
> As for ausearch/report, let's just make a long option --eoe-timeout
> 
> -Steve
> 
> > Also, do you want the default timeout to be 2 seconds or should I make it
> > higher.
> 
> I'm likely to adjust it, but I'm still looking to see what is happening. Just 
> go with the 2 second default for now.

Issued PR just now.

> 
> Thanks,
> -Steve
> 
> 

[-- Attachment #1.2: Type: text/html, Size: 6277 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-01-27 12:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-01 21:22 Occasional delayed output of events Burn Alting
2021-01-03 15:41 ` Steve Grubb
2021-01-04  7:55   ` Burn Alting
2021-01-04 14:46     ` Steve Grubb
2021-01-04 20:12       ` Burn Alting
2021-01-10  4:39         ` Burn Alting
2021-01-15 22:18           ` Burn Alting
2021-01-16  0:35             ` Richard Guy Briggs
2021-01-16  2:42               ` Burn Alting
2021-01-17 14:07                 ` Paul Moore
2021-01-17 21:12                   ` Steve Grubb
2021-01-18 13:54                     ` Paul Moore
2021-01-18 14:31                       ` Steve Grubb
2021-01-18 20:34                         ` Burn Alting
2021-01-18 20:36                         ` Paul Moore
2021-01-19  8:18                           ` Burn Alting
2021-01-19 15:31                             ` Lenny Bruzenak
2021-01-19 19:11                             ` Paul Moore
2021-01-19 19:38                               ` Burn Alting
2021-01-19 20:26                                 ` Paul Moore
2021-01-19 21:51                                   ` Steve Grubb
2021-01-20  6:38                                     ` Burn Alting
2021-01-20 22:50                                       ` Paul Moore
2021-01-23 22:55                                         ` Burn Alting
2021-01-25 23:53                                           ` Steve Grubb
2021-01-26  0:11                                             ` Burn Alting
2021-01-26  0:20                                               ` Steve Grubb
2021-01-26  0:29                                                 ` Burn Alting
2021-01-26 11:53                                                   ` Burn Alting
2021-01-26 20:42                                                     ` Steve Grubb
2021-01-27 12:12                                                       ` Burn Alting
2021-01-19 20:42                               ` Richard Guy Briggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.