All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] audit: report audit wait metric in audit status reply
@ 2020-07-01 21:32 Max Englander
  2020-07-02 20:42 ` Paul Moore
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Max Englander @ 2020-07-01 21:32 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, linux-audit

In environments where the preservation of audit events and predictable
usage of system memory are prioritized, admins may use a combination of
--backlog_wait_time and -b options at the risk of degraded performance
resulting from backlog waiting. In some cases, this risk may be
preferred to lost events or unbounded memory usage. Ideally, this risk
can be mitigated by making adjustments when backlog waiting is detected.

However, detection can be diffult using the currently available metrics.
For example, an admin attempting to debug degraded performance may
falsely believe a full backlog indicates backlog waiting. It may turn
out the backlog frequently fills up but drains quickly.

To make it easier to reliably track degraded performance to backlog
waiting, this patch makes the following changes:

Add a new field backlog_wait_sum to the audit status reply. Initialize
this field to zero. Add to this field the total time spent by the
current task on scheduled timeouts while the backlog limit is exceeded.

Tested on Ubuntu 18.04 using complementary changes to the audit
userspace: https://github.com/linux-audit/audit-userspace/pull/134.

Signed-off-by: Max Englander <max.englander@gmail.com>
---
 Patch changelogs between v1 and v2:
 - Instead of printing a warning when backlog waiting occurs, add
   duration of backlog waiting to cumulative sum, and report this
   sum in audit status reply.

 include/uapi/linux/audit.h | 7 ++++++-
 kernel/audit.c             | 9 +++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index a534d71e689a..ea0cc364beca 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -340,6 +340,7 @@ enum {
 #define AUDIT_STATUS_BACKLOG_LIMIT	0x0010
 #define AUDIT_STATUS_BACKLOG_WAIT_TIME	0x0020
 #define AUDIT_STATUS_LOST		0x0040
+#define AUDIT_STATUS_BACKLOG_WAIT_SUM	0x0080
 
 #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT	0x00000001
 #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME	0x00000002
@@ -348,6 +349,7 @@ enum {
 #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER	0x00000010
 #define AUDIT_FEATURE_BITMAP_LOST_RESET		0x00000020
 #define AUDIT_FEATURE_BITMAP_FILTER_FS		0x00000040
+#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM	0x00000080
 
 #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
 				  AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
@@ -355,12 +357,14 @@ enum {
 				  AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
 				  AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
 				  AUDIT_FEATURE_BITMAP_LOST_RESET | \
-				  AUDIT_FEATURE_BITMAP_FILTER_FS)
+				  AUDIT_FEATURE_BITMAP_FILTER_FS | \
+				  AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
 
 /* deprecated: AUDIT_VERSION_* */
 #define AUDIT_VERSION_LATEST 		AUDIT_FEATURE_BITMAP_ALL
 #define AUDIT_VERSION_BACKLOG_LIMIT	AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
 #define AUDIT_VERSION_BACKLOG_WAIT_TIME	AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
+#define AUDIT_VERSION_BACKLOG_WAIT_SUM	AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
 
 				/* Failure-to-log actions */
 #define AUDIT_FAIL_SILENT	0
@@ -466,6 +470,7 @@ struct audit_status {
 		__u32	feature_bitmap;	/* bitmap of kernel audit features */
 	};
 	__u32		backlog_wait_time;/* message queue wait timeout */
+	__u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
 };
 
 struct audit_features {
diff --git a/kernel/audit.c b/kernel/audit.c
index 87f31bf1f0a0..301ea4f3d750 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -136,6 +136,11 @@ u32		audit_sig_sid = 0;
 */
 static atomic_t	audit_lost = ATOMIC_INIT(0);
 
+/* Monotonically increasing sum of time the kernel has spent
+ * waiting while the backlog limit is exceeded.
+ */
+static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
+
 /* Hash for inode-based rules */
 struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
 
@@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		s.backlog		= skb_queue_len(&audit_queue);
 		s.feature_bitmap	= AUDIT_FEATURE_BITMAP_ALL;
 		s.backlog_wait_time	= audit_backlog_wait_time;
+		s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
 		audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
 		break;
 	}
@@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
 				return NULL;
 			}
 		}
+
+		if (stime != audit_backlog_wait_time)
+			atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
 	}
 
 	ab = audit_buffer_alloc(ctx, gfp_mask, type);
-- 
2.17.1

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-01 21:32 [PATCH v2] audit: report audit wait metric in audit status reply Max Englander
@ 2020-07-02 20:42 ` Paul Moore
  2020-07-03 21:29   ` Richard Guy Briggs
                     ` (3 more replies)
  2020-12-03  4:33 ` Joe Wulf
  2020-12-08 16:57 ` Lenny Bruzenak
  2 siblings, 4 replies; 31+ messages in thread
From: Paul Moore @ 2020-07-02 20:42 UTC (permalink / raw)
  To: Max Englander; +Cc: linux-audit

On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander@gmail.com> wrote:
>
> In environments where the preservation of audit events and predictable
> usage of system memory are prioritized, admins may use a combination of
> --backlog_wait_time and -b options at the risk of degraded performance
> resulting from backlog waiting. In some cases, this risk may be
> preferred to lost events or unbounded memory usage. Ideally, this risk
> can be mitigated by making adjustments when backlog waiting is detected.
>
> However, detection can be diffult using the currently available metrics.
> For example, an admin attempting to debug degraded performance may
> falsely believe a full backlog indicates backlog waiting. It may turn
> out the backlog frequently fills up but drains quickly.
>
> To make it easier to reliably track degraded performance to backlog
> waiting, this patch makes the following changes:
>
> Add a new field backlog_wait_sum to the audit status reply. Initialize
> this field to zero. Add to this field the total time spent by the
> current task on scheduled timeouts while the backlog limit is exceeded.
>
> Tested on Ubuntu 18.04 using complementary changes to the audit
> userspace: https://github.com/linux-audit/audit-userspace/pull/134.
>
> Signed-off-by: Max Englander <max.englander@gmail.com>
> ---
>  Patch changelogs between v1 and v2:
>  - Instead of printing a warning when backlog waiting occurs, add
>    duration of backlog waiting to cumulative sum, and report this
>    sum in audit status reply.
>
>  include/uapi/linux/audit.h | 7 ++++++-
>  kernel/audit.c             | 9 +++++++++
>  2 files changed, 15 insertions(+), 1 deletion(-)

Hi Max,

In general this looks better than the previous approach, but I do have
a few specific comments (inline).  It also important that in addition
to the requisite userspace patch, we also need a test added to the
audit-testsuite project so we can verify this functionality in the
future.

* https://github.com/linux-audit/audit-testsuite

> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index a534d71e689a..ea0cc364beca 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -340,6 +340,7 @@ enum {
>  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
>  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
>  #define AUDIT_STATUS_LOST              0x0040
> +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080

Sooo ... you've defined this, but I don't see any of the corresponding
AUDIT_SET code that I would expect, was that an oversight?  If not, it
is something we should support in the kernel as I'm sure admins will
want to reset this value at some point.

>  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
>  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> @@ -348,6 +349,7 @@ enum {
>  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
>  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
>  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080

In an effort not to exhaust the feature bitmap too quickly, I've been
restricting it to only those features that would cause breakage with
userspace.  I haven't looked closely at Steve's userspace in quite a
while, but I'm guessing it can key off the structure size and doesn't
need this entry in the bitmap, right?  Let me rephrase, if userspace
needs to key off anything, it *should* key off the structure size and
not a new flag in the bitmask ;)

Also, I'm assuming that older userspace doesn't blow-up if it sees the
larger structure size?  That's even more important.

>  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
>                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> @@ -355,12 +357,14 @@ enum {
>                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
>                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
>                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
>
>  /* deprecated: AUDIT_VERSION_* */
>  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
>  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
>  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
>
>                                 /* Failure-to-log actions */
>  #define AUDIT_FAIL_SILENT      0
> @@ -466,6 +470,7 @@ struct audit_status {
>                 __u32   feature_bitmap; /* bitmap of kernel audit features */
>         };
>         __u32           backlog_wait_time;/* message queue wait timeout */
> +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */

This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?

> diff --git a/kernel/audit.c b/kernel/audit.c
> index 87f31bf1f0a0..301ea4f3d750 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
>  */
>  static atomic_t        audit_lost = ATOMIC_INIT(0);
>
> +/* Monotonically increasing sum of time the kernel has spent
> + * waiting while the backlog limit is exceeded.
> + */
> +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);

Needless to say, this should be renamed too so we don't go crazy.

>  /* Hash for inode-based rules */
>  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
>
> @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 s.backlog               = skb_queue_len(&audit_queue);
>                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
>                 s.backlog_wait_time     = audit_backlog_wait_time;
> +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
>                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
>                 break;
>         }
> @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
>                                 return NULL;
>                         }
>                 }
> +
> +               if (stime != audit_backlog_wait_time)
> +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);

Since stime can only be different in one place in the code above
(after the schedule_timeout() call), why not move the atomic_add() up
there and drop the "if"?  Yes there is the potential of calling
atomic_add() multiple times in this case, but the thread is waiting
anyway and this way we don't impact other code paths.

>         }
>
>         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> --
> 2.17.1

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-02 20:42 ` Paul Moore
@ 2020-07-03 21:29   ` Richard Guy Briggs
  2020-07-03 22:36     ` Max Englander
  2020-07-03 22:31   ` Max Englander
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Richard Guy Briggs @ 2020-07-03 21:29 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-audit

On 2020-07-02 16:42, Paul Moore wrote:
> On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander@gmail.com> wrote:
> >
> > In environments where the preservation of audit events and predictable
> > usage of system memory are prioritized, admins may use a combination of
> > --backlog_wait_time and -b options at the risk of degraded performance
> > resulting from backlog waiting. In some cases, this risk may be
> > preferred to lost events or unbounded memory usage. Ideally, this risk
> > can be mitigated by making adjustments when backlog waiting is detected.
> >
> > However, detection can be diffult using the currently available metrics.
> > For example, an admin attempting to debug degraded performance may
> > falsely believe a full backlog indicates backlog waiting. It may turn
> > out the backlog frequently fills up but drains quickly.
> >
> > To make it easier to reliably track degraded performance to backlog
> > waiting, this patch makes the following changes:
> >
> > Add a new field backlog_wait_sum to the audit status reply. Initialize
> > this field to zero. Add to this field the total time spent by the
> > current task on scheduled timeouts while the backlog limit is exceeded.
> >
> > Tested on Ubuntu 18.04 using complementary changes to the audit
> > userspace: https://github.com/linux-audit/audit-userspace/pull/134.
> >
> > Signed-off-by: Max Englander <max.englander@gmail.com>
> > ---
> >  Patch changelogs between v1 and v2:
> >  - Instead of printing a warning when backlog waiting occurs, add
> >    duration of backlog waiting to cumulative sum, and report this
> >    sum in audit status reply.
> >
> >  include/uapi/linux/audit.h | 7 ++++++-
> >  kernel/audit.c             | 9 +++++++++
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> Hi Max,
> 
> In general this looks better than the previous approach, but I do have
> a few specific comments (inline).  It also important that in addition
> to the requisite userspace patch, we also need a test added to the
> audit-testsuite project so we can verify this functionality in the
> future.
> 
> * https://github.com/linux-audit/audit-testsuite
> 
> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index a534d71e689a..ea0cc364beca 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -340,6 +340,7 @@ enum {
> >  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
> >  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> >  #define AUDIT_STATUS_LOST              0x0040
> > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080
> 
> Sooo ... you've defined this, but I don't see any of the corresponding
> AUDIT_SET code that I would expect, was that an oversight?  If not, it
> is something we should support in the kernel as I'm sure admins will
> want to reset this value at some point.

Have a look at the lost reset code as an example.  It is tricky since it
does an atomic reset while delivering a value back up the control plane
and issuing a record.  There were some fallout bug fixes because it
wasn't as obvious as it looked.

> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > @@ -348,6 +349,7 @@ enum {
> >  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> >  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> >  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> 
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it *should* key off the structure size and
> not a new flag in the bitmask ;)

It could key solely off the existance of AUDIT_STATUS_BACKLOG_WAIT_SUM
via HAVE_DECL_AUDIT_STATUS_BACKLOG_WAIT_SUM which would be defined in
configure.ac similar to AUDIT_STATUS_BACKLOG_WAIT_TIME as has already
been done in Max' userspace patch.

It should be possible to drop AUDIT_VERSION_BACKLOG_WAIT_SUM in
userspace and have it work.

> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.

I believe it won't even notice, but this should be tested.

> >  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> >                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> > @@ -355,12 +357,14 @@ enum {
> >                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> >                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> >                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> > -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> > +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> > +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
> >
> >  /* deprecated: AUDIT_VERSION_* */
> >  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> >  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> >  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
> >
> >                                 /* Failure-to-log actions */
> >  #define AUDIT_FAIL_SILENT      0
> > @@ -466,6 +470,7 @@ struct audit_status {
> >                 __u32   feature_bitmap; /* bitmap of kernel audit features */
> >         };
> >         __u32           backlog_wait_time;/* message queue wait timeout */
> > +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
> 
> This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?

How about 'backlog_wait_time_cumul' (or 'backlog_wait_time_cumulative')?

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 87f31bf1f0a0..301ea4f3d750 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
> >  */
> >  static atomic_t        audit_lost = ATOMIC_INIT(0);
> >
> > +/* Monotonically increasing sum of time the kernel has spent
> > + * waiting while the backlog limit is exceeded.
> > + */
> > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
> 
> Needless to say, this should be renamed too so we don't go crazy.
> 
> >  /* Hash for inode-based rules */
> >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> >
> > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                 s.backlog               = skb_queue_len(&audit_queue);
> >                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
> >                 s.backlog_wait_time     = audit_backlog_wait_time;
> > +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
> >                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> >                 break;
> >         }
> > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >                                 return NULL;
> >                         }
> >                 }
> > +
> > +               if (stime != audit_backlog_wait_time)
> > +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
> 
> Since stime can only be different in one place in the code above
> (after the schedule_timeout() call), why not move the atomic_add() up
> there and drop the "if"?  Yes there is the potential of calling
> atomic_add() multiple times in this case, but the thread is waiting
> anyway and this way we don't impact other code paths.
> 
> >         }
> >
> >         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-02 20:42 ` Paul Moore
  2020-07-03 21:29   ` Richard Guy Briggs
@ 2020-07-03 22:31   ` Max Englander
  2020-12-03  3:52   ` Steve Grubb
  2020-12-07 19:43   ` Lenny Bruzenak
  3 siblings, 0 replies; 31+ messages in thread
From: Max Englander @ 2020-07-03 22:31 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-audit

On Thu, Jul 02, 2020 at 04:42:13PM -0400, Paul Moore wrote:
> On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander@gmail.com> wrote:
> >
> > In environments where the preservation of audit events and predictable
> > usage of system memory are prioritized, admins may use a combination of
> > --backlog_wait_time and -b options at the risk of degraded performance
> > resulting from backlog waiting. In some cases, this risk may be
> > preferred to lost events or unbounded memory usage. Ideally, this risk
> > can be mitigated by making adjustments when backlog waiting is detected.
> >
> > However, detection can be diffult using the currently available metrics.
> > For example, an admin attempting to debug degraded performance may
> > falsely believe a full backlog indicates backlog waiting. It may turn
> > out the backlog frequently fills up but drains quickly.
> >
> > To make it easier to reliably track degraded performance to backlog
> > waiting, this patch makes the following changes:
> >
> > Add a new field backlog_wait_sum to the audit status reply. Initialize
> > this field to zero. Add to this field the total time spent by the
> > current task on scheduled timeouts while the backlog limit is exceeded.
> >
> > Tested on Ubuntu 18.04 using complementary changes to the audit
> > userspace: https://github.com/linux-audit/audit-userspace/pull/134.
> >
> > Signed-off-by: Max Englander <max.englander@gmail.com>
> > ---
> >  Patch changelogs between v1 and v2:
> >  - Instead of printing a warning when backlog waiting occurs, add
> >    duration of backlog waiting to cumulative sum, and report this
> >    sum in audit status reply.
> >
> >  include/uapi/linux/audit.h | 7 ++++++-
> >  kernel/audit.c             | 9 +++++++++
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> Hi Max,
> 
> In general this looks better than the previous approach, but I do have
> a few specific comments (inline).  It also important that in addition

Thanks for your feedback and comments, Paul. I've prepared a v3 patch
addressing all of your comments, with corresponding changes to the
audit-userspace, which I'll submit once I get a working change to
the test suite. I'll use this thread to respond to some things and ask
a question.

> to the requisite userspace patch, we also need a test added to the
> audit-testsuite project so we can verify this functionality in the
> future.
> 
> * https://github.com/linux-audit/audit-testsuite
> 

I downloaded this test suite and attempted to run it on Ubuntu 18.04,
with the latest audit kernel tree and latest audit-userspace. Many tests
were failing, I assume because I have some issues with my environment.
May I ask what environment (OS, tree, commit) you recommend for running
the test suite? Happy to move this question over to GitHub if that's a
better venue.

> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index a534d71e689a..ea0cc364beca 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -340,6 +340,7 @@ enum {
> >  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
> >  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> >  #define AUDIT_STATUS_LOST              0x0040
> > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080
> 
> Sooo ... you've defined this, but I don't see any of the corresponding
> AUDIT_SET code that I would expect, was that an oversight?  If not, it
> is something we should support in the kernel as I'm sure admins will
> want to reset this value at some point.

To be honest I had based this patch off v1 which included a flag for
setting the backlog warn time threshold, but didn't remove it from the
v2 patch (an oversight). I wasn't thinking about admins' need to reset
the value, but since you suggested it I've included support for that in
the v3 patch.

> 
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > @@ -348,6 +349,7 @@ enum {
> >  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> >  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> >  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> 
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it *should* key off the structure size and
> not a new flag in the bitmask ;)

I've removed this flag from the v3 patch, but if it's alright with
you I'll key off the absence/presence of
AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL, which follows the pattern
currently used throughout the audit-userspace, and which Richard suggests
in separate message.

> 
> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.
> 
> >  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> >                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> > @@ -355,12 +357,14 @@ enum {
> >                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> >                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> >                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> > -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> > +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> > +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
> >
> >  /* deprecated: AUDIT_VERSION_* */
> >  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> >  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> >  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
> >
> >                                 /* Failure-to-log actions */
> >  #define AUDIT_FAIL_SILENT      0
> > @@ -466,6 +470,7 @@ struct audit_status {
> >                 __u32   feature_bitmap; /* bitmap of kernel audit features */
> >         };
> >         __u32           backlog_wait_time;/* message queue wait timeout */
> > +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
> 
> This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?

Sure, will do.

> 
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 87f31bf1f0a0..301ea4f3d750 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
> >  */
> >  static atomic_t        audit_lost = ATOMIC_INIT(0);
> >
> > +/* Monotonically increasing sum of time the kernel has spent
> > + * waiting while the backlog limit is exceeded.
> > + */
> > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
> 
> Needless to say, this should be renamed too so we don't go crazy.

Of course :)

> 
> >  /* Hash for inode-based rules */
> >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> >
> > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                 s.backlog               = skb_queue_len(&audit_queue);
> >                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
> >                 s.backlog_wait_time     = audit_backlog_wait_time;
> > +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
> >                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> >                 break;
> >         }
> > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >                                 return NULL;
> >                         }
> >                 }
> > +
> > +               if (stime != audit_backlog_wait_time)
> > +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
> 
> Since stime can only be different in one place in the code above
> (after the schedule_timeout() call), why not move the atomic_add() up
> there and drop the "if"?  Yes there is the potential of calling
> atomic_add() multiple times in this case, but the thread is waiting
> anyway and this way we don't impact other code paths.

My concern was calling atomic_add() more than necessary, but happy to
move it up as you suggest.

> 
> >         }
> >
> >         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> > --
> > 2.17.1
> 
> -- 
> paul moore
> www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-03 21:29   ` Richard Guy Briggs
@ 2020-07-03 22:36     ` Max Englander
  0 siblings, 0 replies; 31+ messages in thread
From: Max Englander @ 2020-07-03 22:36 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Fri, Jul 03, 2020 at 05:29:49PM -0400, Richard Guy Briggs wrote:
> On 2020-07-02 16:42, Paul Moore wrote:
> > On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander@gmail.com> wrote:
> > >
> > > In environments where the preservation of audit events and predictable
> > > usage of system memory are prioritized, admins may use a combination of
> > > --backlog_wait_time and -b options at the risk of degraded performance
> > > resulting from backlog waiting. In some cases, this risk may be
> > > preferred to lost events or unbounded memory usage. Ideally, this risk
> > > can be mitigated by making adjustments when backlog waiting is detected.
> > >
> > > However, detection can be diffult using the currently available metrics.
> > > For example, an admin attempting to debug degraded performance may
> > > falsely believe a full backlog indicates backlog waiting. It may turn
> > > out the backlog frequently fills up but drains quickly.
> > >
> > > To make it easier to reliably track degraded performance to backlog
> > > waiting, this patch makes the following changes:
> > >
> > > Add a new field backlog_wait_sum to the audit status reply. Initialize
> > > this field to zero. Add to this field the total time spent by the
> > > current task on scheduled timeouts while the backlog limit is exceeded.
> > >
> > > Tested on Ubuntu 18.04 using complementary changes to the audit
> > > userspace: https://github.com/linux-audit/audit-userspace/pull/134.
> > >
> > > Signed-off-by: Max Englander <max.englander@gmail.com>
> > > ---
> > >  Patch changelogs between v1 and v2:
> > >  - Instead of printing a warning when backlog waiting occurs, add
> > >    duration of backlog waiting to cumulative sum, and report this
> > >    sum in audit status reply.
> > >
> > >  include/uapi/linux/audit.h | 7 ++++++-
> > >  kernel/audit.c             | 9 +++++++++
> > >  2 files changed, 15 insertions(+), 1 deletion(-)
> > 
> > Hi Max,
> > 
> > In general this looks better than the previous approach, but I do have
> > a few specific comments (inline).  It also important that in addition
> > to the requisite userspace patch, we also need a test added to the
> > audit-testsuite project so we can verify this functionality in the
> > future.
> > 
> > * https://github.com/linux-audit/audit-testsuite
> > 
> > > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > > index a534d71e689a..ea0cc364beca 100644
> > > --- a/include/uapi/linux/audit.h
> > > +++ b/include/uapi/linux/audit.h
> > > @@ -340,6 +340,7 @@ enum {
> > >  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
> > >  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> > >  #define AUDIT_STATUS_LOST              0x0040
> > > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080
> > 
> > Sooo ... you've defined this, but I don't see any of the corresponding
> > AUDIT_SET code that I would expect, was that an oversight?  If not, it
> > is something we should support in the kernel as I'm sure admins will
> > want to reset this value at some point.
> 
> Have a look at the lost reset code as an example.  It is tricky since it
> does an atomic reset while delivering a value back up the control plane
> and issuing a record.  There were some fallout bug fixes because it
> wasn't as obvious as it looked.

Thank you for the suggestion. I've copied that approach in the (not yet
submitted) v3 patch.

> 
> > >  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> > >  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > > @@ -348,6 +349,7 @@ enum {
> > >  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> > >  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> > >  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> > 
> > In an effort not to exhaust the feature bitmap too quickly, I've been
> > restricting it to only those features that would cause breakage with
> > userspace.  I haven't looked closely at Steve's userspace in quite a
> > while, but I'm guessing it can key off the structure size and doesn't
> > need this entry in the bitmap, right?  Let me rephrase, if userspace
> > needs to key off anything, it *should* key off the structure size and
> > not a new flag in the bitmask ;)
> 
> It could key solely off the existance of AUDIT_STATUS_BACKLOG_WAIT_SUM
> via HAVE_DECL_AUDIT_STATUS_BACKLOG_WAIT_SUM which would be defined in
> configure.ac similar to AUDIT_STATUS_BACKLOG_WAIT_TIME as has already
> been done in Max' userspace patch.
> 
> It should be possible to drop AUDIT_VERSION_BACKLOG_WAIT_SUM in
> userspace and have it work.

I will drop this in the v3 patch.

> 
> > Also, I'm assuming that older userspace doesn't blow-up if it sees the
> > larger structure size?  That's even more important.
> 
> I believe it won't even notice, but this should be tested.
> 
> > >  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> > >                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> > > @@ -355,12 +357,14 @@ enum {
> > >                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> > >                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> > >                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> > > -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> > > +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> > > +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
> > >
> > >  /* deprecated: AUDIT_VERSION_* */
> > >  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> > >  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> > >  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> > > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
> > >
> > >                                 /* Failure-to-log actions */
> > >  #define AUDIT_FAIL_SILENT      0
> > > @@ -466,6 +470,7 @@ struct audit_status {
> > >                 __u32   feature_bitmap; /* bitmap of kernel audit features */
> > >         };
> > >         __u32           backlog_wait_time;/* message queue wait timeout */
> > > +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
> > 
> > This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?
> 
> How about 'backlog_wait_time_cumul' (or 'backlog_wait_time_cumulative')?

I am happy with either _actual or _cumul(ative), whatever you two
decide.

> 
> > > diff --git a/kernel/audit.c b/kernel/audit.c
> > > index 87f31bf1f0a0..301ea4f3d750 100644
> > > --- a/kernel/audit.c
> > > +++ b/kernel/audit.c
> > > @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
> > >  */
> > >  static atomic_t        audit_lost = ATOMIC_INIT(0);
> > >
> > > +/* Monotonically increasing sum of time the kernel has spent
> > > + * waiting while the backlog limit is exceeded.
> > > + */
> > > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
> > 
> > Needless to say, this should be renamed too so we don't go crazy.
> > 
> > >  /* Hash for inode-based rules */
> > >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> > >
> > > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> > >                 s.backlog               = skb_queue_len(&audit_queue);
> > >                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
> > >                 s.backlog_wait_time     = audit_backlog_wait_time;
> > > +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
> > >                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> > >                 break;
> > >         }
> > > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> > >                                 return NULL;
> > >                         }
> > >                 }
> > > +
> > > +               if (stime != audit_backlog_wait_time)
> > > +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
> > 
> > Since stime can only be different in one place in the code above
> > (after the schedule_timeout() call), why not move the atomic_add() up
> > there and drop the "if"?  Yes there is the potential of calling
> > atomic_add() multiple times in this case, but the thread is waiting
> > anyway and this way we don't impact other code paths.
> > 
> > >         }
> > >
> > >         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> > 
> > paul moore
> 
> - RGB
> 
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
> 

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-02 20:42 ` Paul Moore
  2020-07-03 21:29   ` Richard Guy Briggs
  2020-07-03 22:31   ` Max Englander
@ 2020-12-03  3:52   ` Steve Grubb
  2020-12-03  4:12     ` Paul Moore
  2020-12-07 19:43   ` Lenny Bruzenak
  3 siblings, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2020-12-03  3:52 UTC (permalink / raw)
  To: Max Englander, linux-audit, Paul Moore

Hello Paul,

On Thursday, July 2, 2020 4:42:13 PM EST Paul Moore wrote:
> > #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> > #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > @@ -348,6 +349,7 @@ enum {
> > #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> > #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> > #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> 
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it *should* key off the structure size and
> not a new flag in the bitmask 
> 
> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.

We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was 
right. Anything that changes the user space API needs to have a 
FEATURE_BITMAP so that user space can do the right thing. The lack of this is 
blocking acceptance of the pull request for the user space piece.

I am in a need of publishing the 3.0 release. This is the only blocker.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03  3:52   ` Steve Grubb
@ 2020-12-03  4:12     ` Paul Moore
  2020-12-03 12:37       ` Richard Guy Briggs
  2020-12-03 13:31       ` Steve Grubb
  0 siblings, 2 replies; 31+ messages in thread
From: Paul Moore @ 2020-12-03  4:12 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
>
> Hello Paul,

Steve.

> On Thursday, July 2, 2020 4:42:13 PM EST Paul Moore wrote:
> > > #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> > > #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > > @@ -348,6 +349,7 @@ enum {
> > > #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> > > #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> > > #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> >
> > In an effort not to exhaust the feature bitmap too quickly, I've been
> > restricting it to only those features that would cause breakage with
> > userspace.  I haven't looked closely at Steve's userspace in quite a
> > while, but I'm guessing it can key off the structure size and doesn't
> > need this entry in the bitmap, right?  Let me rephrase, if userspace
> > needs to key off anything, it *should* key off the structure size and
> > not a new flag in the bitmask
> >
> > Also, I'm assuming that older userspace doesn't blow-up if it sees the
> > larger structure size?  That's even more important.
>
> We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was
> right. Anything that changes the user space API needs to have a
> FEATURE_BITMAP so that user space can do the right thing. The lack of this is
> blocking acceptance of the pull request for the user space piece.

I don't believe you need a new bitmap entry in this case, you should
be able to examine the size of the reply from the AUDIT_GET request
and make a determination from there.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-01 21:32 [PATCH v2] audit: report audit wait metric in audit status reply Max Englander
  2020-07-02 20:42 ` Paul Moore
@ 2020-12-03  4:33 ` Joe Wulf
  2020-12-07 21:48   ` Max Englander
  2020-12-08 16:57 ` Lenny Bruzenak
  2 siblings, 1 reply; 31+ messages in thread
From: Joe Wulf @ 2020-12-03  4:33 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, linux-audit, Max Englander


[-- Attachment #1.1: Type: text/plain, Size: 2742 bytes --]

I would like to suggest providing a mechanism where admins can query the status or state of backlog issues (wait time, sums, etc...).  Maybe the intent is to expand the output of status checking of auditd.

I believe further clarity is beneficial on the setting of the 'backlog_wait_sum' (or to whatever the name evolves to) initially.-  How it evolves over time-  What the conditions in the system, or auditing, would change it-  What conditions admins should pay attention to for informational understanding of status
-  What conditions admins should realize exist such that adjustments are needed   (and suggestions to what those adjustments should be)-  What new guidance will admins have for building adjusting audit.rules around this

Consider the scenario where auditing has been 'working fine' for days.Little to no active admin monitoring.Events occur to spike the auditing such that backloging of audit records dramatically increases.(for some reason) admins now come looking to investigate.Assuming they do:  'systemctl status auditd' the newly presented 'state' of the 'backlog_wait_sum' will show some evidence.Q:  Is that just a moment in time?Q:  What information here will give the perspective things are good/ok 'now', versus some action needs to be taken?
Maybe that isn't a great scenario, or good questions----it is what occurs to me at the moment.

Thank you.
R,-Joe Wulf

    On Wednesday, July 1, 2020, 5:33:14 PM EDT, Max Englander <max.englander@gmail.com> wrote:  
 >  In environments where the preservation of audit events and predictable
>  usage of system memory are prioritized, admins may use a combination of>  --backlog_wait_time and -b options at the risk of degraded performance>  resulting from backlog waiting. In some cases, this risk may be>  preferred to lost events or unbounded memory usage. Ideally, this risk>  can be mitigated by making adjustments when backlog waiting is detected.>  >  However, detection can be diffult using the currently available metrics.>  For example, an admin attempting to debug degraded performance may>  falsely believe a full backlog indicates backlog waiting. It may turn>  out the backlog frequently fills up but drains quickly.>  >  To make it easier to reliably track degraded performance to backlog>  waiting, this patch makes the following changes:>  >  Add a new field backlog_wait_sum to the audit status reply. Initialize>  this field to zero. Add to this field the total time spent by the>  current task on scheduled timeouts while the backlog limit is exceeded.>  >  Tested on Ubuntu 18.04 using complementary changes to the audit>  userspace: https://github.com/linux-audit/audit-userspace/pull/134.
<snip>
  

[-- Attachment #1.2: Type: text/html, Size: 5031 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03  4:12     ` Paul Moore
@ 2020-12-03 12:37       ` Richard Guy Briggs
  2020-12-03 15:37         ` Paul Moore
  2020-12-03 13:31       ` Steve Grubb
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Guy Briggs @ 2020-12-03 12:37 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-audit

On 2020-12-02 23:12, Paul Moore wrote:
> On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > Hello Paul,
> 
> Steve.
> 
> > On Thursday, July 2, 2020 4:42:13 PM EST Paul Moore wrote:
> > > > #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> > > > #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > > > @@ -348,6 +349,7 @@ enum {
> > > > #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> > > > #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> > > > #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > > > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> > >
> > > In an effort not to exhaust the feature bitmap too quickly, I've been
> > > restricting it to only those features that would cause breakage with
> > > userspace.  I haven't looked closely at Steve's userspace in quite a
> > > while, but I'm guessing it can key off the structure size and doesn't
> > > need this entry in the bitmap, right?  Let me rephrase, if userspace
> > > needs to key off anything, it *should* key off the structure size and
> > > not a new flag in the bitmask
> > >
> > > Also, I'm assuming that older userspace doesn't blow-up if it sees the
> > > larger structure size?  That's even more important.
> >
> > We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was
> > right. Anything that changes the user space API needs to have a
> > FEATURE_BITMAP so that user space can do the right thing. The lack of this is
> > blocking acceptance of the pull request for the user space piece.
> 
> I don't believe you need a new bitmap entry in this case, you should
> be able to examine the size of the reply from the AUDIT_GET request
> and make a determination from there.

The danger I see is if another feature is added to the audit status and
that is backported to a distro rather than this one.  It would be
impossible to determine which feature it was from the size alone.
Keying off specific fields in the kernel should be able to do
this at build time if I understood correctly.

> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03  4:12     ` Paul Moore
  2020-12-03 12:37       ` Richard Guy Briggs
@ 2020-12-03 13:31       ` Steve Grubb
  1 sibling, 0 replies; 31+ messages in thread
From: Steve Grubb @ 2020-12-03 13:31 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-audit

On Wednesday, December 2, 2020 11:12:31 PM EST Paul Moore wrote:
> On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > Hello Paul,
> 
> Steve.
> 
> > On Thursday, July 2, 2020 4:42:13 PM EST Paul Moore wrote:
> > > > #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> > > > #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > > > @@ -348,6 +349,7 @@ enum {
> > > > #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> > > > #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> > > > #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > > > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> > > 
> > > In an effort not to exhaust the feature bitmap too quickly, I've been
> > > restricting it to only those features that would cause breakage with
> > > userspace.  I haven't looked closely at Steve's userspace in quite a
> > > while, but I'm guessing it can key off the structure size and doesn't
> > > need this entry in the bitmap, right?  Let me rephrase, if userspace
> > > needs to key off anything, it *should* key off the structure size and
> > > not a new flag in the bitmask
> > > 
> > > Also, I'm assuming that older userspace doesn't blow-up if it sees the
> > > larger structure size?  That's even more important.
> > 
> > We need this FEATURE_BITMAP to do anything in userspace. Max's instinct
> > was right. Anything that changes the user space API needs to have a
> > FEATURE_BITMAP so that user space can do the right thing. The lack of
> > this is blocking acceptance of the pull request for the user space
> > piece.
>
> I don't believe you need a new bitmap entry in this case, you should
> be able to examine the size of the reply from the AUDIT_GET request
> and make a determination from there.

For the upstream kernel, this may be the case. But in the world where people 
backport patches, how do I know that the size is related to this patch and no 
other?

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03 12:37       ` Richard Guy Briggs
@ 2020-12-03 15:37         ` Paul Moore
  2020-12-03 23:10           ` Richard Guy Briggs
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2020-12-03 15:37 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Thu, Dec 3, 2020 at 7:37 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-12-02 23:12, Paul Moore wrote:
> > On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > > We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was
> > > right. Anything that changes the user space API needs to have a
> > > FEATURE_BITMAP so that user space can do the right thing. The lack of this is
> > > blocking acceptance of the pull request for the user space piece.
> >
> > I don't believe you need a new bitmap entry in this case, you should
> > be able to examine the size of the reply from the AUDIT_GET request
> > and make a determination from there.
>
> The danger I see is if another feature is added to the audit status and
> that is backported to a distro rather than this one.  It would be
> impossible to determine which feature it was from the size alone.
> Keying off specific fields in the kernel should be able to do
> this at build time if I understood correctly.

...

On Thu, Dec 3, 2020 at 8:31 AM Steve Grubb <sgrubb@redhat.com> wrote:
> For the upstream kernel, this may be the case. But in the world where people
> backport patches, how do I know that the size is related to this patch and no
> other?

We've discussed this in the past, and most of you should already know
my answer to this, but I'll repeat my stance on this again.

My first priority is the upstream kernel, enterprise distribution
kernels are secondary.  The upstream kernels do not generally backport
features, backports are limited to fixes.  If an individual or a
distribution decides to backport kernel features they are responsible
for making things work; it is not upstream's responsibility to enable,
or support, arbitrary combinations of patches.  Any assistance we
provide here is "best effort" and not guaranteed.

Bringing this back to the case at hand, the feature bitmap is a
limited resource and it is my opinion that we need to limit its use to
only those features which can not be determined through other means;
in this case this feature can be determine by the size of the
AUDIT_GET reply buffer (the audit_status struct).  Of course if more
care and thought had been put into the audit kernel/userspace API in
the first place we would not be in this situation, but we are and we
need to deal with it as best we can.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03 15:37         ` Paul Moore
@ 2020-12-03 23:10           ` Richard Guy Briggs
  2020-12-03 23:43             ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Guy Briggs @ 2020-12-03 23:10 UTC (permalink / raw)
  To: Paul Moore; +Cc: linux-audit

On 2020-12-03 10:37, Paul Moore wrote:
> On Thu, Dec 3, 2020 at 7:37 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2020-12-02 23:12, Paul Moore wrote:
> > > On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > > > We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was
> > > > right. Anything that changes the user space API needs to have a
> > > > FEATURE_BITMAP so that user space can do the right thing. The lack of this is
> > > > blocking acceptance of the pull request for the user space piece.
> > >
> > > I don't believe you need a new bitmap entry in this case, you should
> > > be able to examine the size of the reply from the AUDIT_GET request
> > > and make a determination from there.
> >
> > The danger I see is if another feature is added to the audit status and
> > that is backported to a distro rather than this one.  It would be
> > impossible to determine which feature it was from the size alone.
> > Keying off specific fields in the kernel should be able to do
> > this at build time if I understood correctly.
> 
> ...
> 
> On Thu, Dec 3, 2020 at 8:31 AM Steve Grubb <sgrubb@redhat.com> wrote:
> > For the upstream kernel, this may be the case. But in the world where people
> > backport patches, how do I know that the size is related to this patch and no
> > other?
> 
> We've discussed this in the past, and most of you should already know
> my answer to this, but I'll repeat my stance on this again.
> 
> My first priority is the upstream kernel, enterprise distribution
> kernels are secondary.  The upstream kernels do not generally backport
> features, backports are limited to fixes.  If an individual or a
> distribution decides to backport kernel features they are responsible
> for making things work; it is not upstream's responsibility to enable,
> or support, arbitrary combinations of patches.  Any assistance we
> provide here is "best effort" and not guaranteed.
> 
> Bringing this back to the case at hand, the feature bitmap is a
> limited resource and it is my opinion that we need to limit its use to
> only those features which can not be determined through other means;

So far there are only seven bits used out of 32, so it does not appear we are
in danger of running out anytime soon.

It was introduced with commit 0288d7183c41c0192d2963d44590f346f4aee917
	Author:     Richard Guy Briggs <rgb@redhat.com>
	AuthorDate: 2014-11-17 15:51:01 -0500
	Commit:     Paul Moore <pmoore@redhat.com>
	CommitDate: 2014-11-17 16:53:51 -0500
	("audit: convert status version to a feature bitmap")
It was introduced specifically to enable distributions to selectively
backport features.  It was converted away from AUDIT_VERSION.

There are other ways to detect the presence of backlog_wait_time_actual
as I mentioned above.

configure.ac already has these:
	AC_CHECK_DECLS([AUDIT_FEATURE_VERSION], [], [], [[#include <linux/audit.h>]])
	AC_CHECK_MEMBERS([struct audit_status.feature_bitmap], [], [], [[#include <linux/audit.h>]])
	AC_CHECK_DECLS([AUDIT_VERSION_BACKLOG_WAIT_TIME], [], [], [[#include <linux/audit.h>]])
	AC_CHECK_DECLS([AUDIT_STATUS_BACKLOG_WAIT_TIME], [], [], [[#include <linux/audit.h>]])

Max' userspace patch already has provisions to check for the existance of the
macro AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL, so while it doesn't actually check
for the struct audit_status member backlog_wait_time_actual, it does check
something that should only be present if that member is present, so I believe
we are fine as it currently stands.  If you want to be really paranoid, then
add a check for HAVE_STRUCT_AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL with:

	AC_CHECK_MEMBERS([struct audit_status.backlog_wait_time_actual], [], [], [[#include <linux/audit.h>]])


Don't look at audit_set_feature() since it has an outstanding issue
filed against it to untangle the unfortunately named FEATURE_VERSION vs
FEATURE_BITMAP.
	https://github.com/linux-audit/audit-userspace/issues/10

> in this case this feature can be determine by the size of the
> AUDIT_GET reply buffer (the audit_status struct).  Of course if more
> care and thought had been put into the audit kernel/userspace API in
> the first place we would not be in this situation, but we are and we
> need to deal with it as best we can.
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03 23:10           ` Richard Guy Briggs
@ 2020-12-03 23:43             ` Paul Moore
  2020-12-03 23:55               ` Steve Grubb
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2020-12-03 23:43 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Thu, Dec 3, 2020 at 6:10 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-12-03 10:37, Paul Moore wrote:
> > On Thu, Dec 3, 2020 at 7:37 AM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2020-12-02 23:12, Paul Moore wrote:
> > > > On Wed, Dec 2, 2020 at 10:52 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > > > > We need this FEATURE_BITMAP to do anything in userspace. Max's instinct was
> > > > > right. Anything that changes the user space API needs to have a
> > > > > FEATURE_BITMAP so that user space can do the right thing. The lack of this is
> > > > > blocking acceptance of the pull request for the user space piece.
> > > >
> > > > I don't believe you need a new bitmap entry in this case, you should
> > > > be able to examine the size of the reply from the AUDIT_GET request
> > > > and make a determination from there.
> > >
> > > The danger I see is if another feature is added to the audit status and
> > > that is backported to a distro rather than this one.  It would be
> > > impossible to determine which feature it was from the size alone.
> > > Keying off specific fields in the kernel should be able to do
> > > this at build time if I understood correctly.
> >
> > ...
> >
> > On Thu, Dec 3, 2020 at 8:31 AM Steve Grubb <sgrubb@redhat.com> wrote:
> > > For the upstream kernel, this may be the case. But in the world where people
> > > backport patches, how do I know that the size is related to this patch and no
> > > other?
> >
> > We've discussed this in the past, and most of you should already know
> > my answer to this, but I'll repeat my stance on this again.
> >
> > My first priority is the upstream kernel, enterprise distribution
> > kernels are secondary.  The upstream kernels do not generally backport
> > features, backports are limited to fixes.  If an individual or a
> > distribution decides to backport kernel features they are responsible
> > for making things work; it is not upstream's responsibility to enable,
> > or support, arbitrary combinations of patches.  Any assistance we
> > provide here is "best effort" and not guaranteed.
> >
> > Bringing this back to the case at hand, the feature bitmap is a
> > limited resource and it is my opinion that we need to limit its use to
> > only those features which can not be determined through other means;
>
> So far there are only seven bits used out of 32, so it does not appear we are
> in danger of running out anytime soon.
>
> It was introduced with commit 0288d7183c41c0192d2963d44590f346f4aee917
>         Author:     Richard Guy Briggs <rgb@redhat.com>
>         AuthorDate: 2014-11-17 15:51:01 -0500
>         Commit:     Paul Moore <pmoore@redhat.com>
>         CommitDate: 2014-11-17 16:53:51 -0500
>         ("audit: convert status version to a feature bitmap")
> It was introduced specifically to enable distributions to selectively
> backport features.  It was converted away from AUDIT_VERSION.
>
> There are other ways to detect the presence of backlog_wait_time_actual
> as I mentioned above.

Let me be blunt - I honestly don't care what Steve's audit userspace
does to detect this.  I've got my own opinion, but Steve's audit
userspace is not my project to manage and I think we've established
over the years that Steve and I have very different views on what
constitutes good design.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03 23:43             ` Paul Moore
@ 2020-12-03 23:55               ` Steve Grubb
  2020-12-04  2:16                 ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2020-12-03 23:55 UTC (permalink / raw)
  To: Richard Guy Briggs, Paul Moore; +Cc: linux-audit

On Thursday, December 3, 2020 6:43:11 PM EST Paul Moore wrote:
> > So far there are only seven bits used out of 32, so it does not appear we
> > are in danger of running out anytime soon.

Exactly. Even capability bits are easier to get assigned.  :-)

> > It was introduced with commit 0288d7183c41c0192d2963d44590f346f4aee917
> > Author:     Richard Guy Briggs <rgb@redhat.com>
> > AuthorDate: 2014-11-17 15:51:01 -0500
> > Commit:     Paul Moore <pmoore@redhat.com>
> > CommitDate: 2014-11-17 16:53:51 -0500
> > ("audit: convert status version to a feature bitmap")
> > It was introduced specifically to enable distributions to selectively
> > backport features.  It was converted away from AUDIT_VERSION.
> > 
> > There are other ways to detect the presence of backlog_wait_time_actual
> > as I mentioned above.
> 
> Let me be blunt - I honestly don't care what Steve's audit userspace
> does to detect this.  I've got my own opinion, but Steve's audit
> userspace is not my project to manage and I think we've established
> over the years that Steve and I have very different views on what
> constitutes good design.

And guessing what might be in buffers of different sizes is good design? The 
FEATURE_BITMAP was introduced to get rid of this ambiguity.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03 23:55               ` Steve Grubb
@ 2020-12-04  2:16                 ` Paul Moore
  2020-12-04  2:47                   ` Steve Grubb
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2020-12-04  2:16 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, linux-audit

On Thu, Dec 3, 2020 at 6:55 PM Steve Grubb <sgrubb@redhat.com> wrote:
> On Thursday, December 3, 2020 6:43:11 PM EST Paul Moore wrote:
> > > So far there are only seven bits used out of 32, so it does not appear we
> > > are in danger of running out anytime soon.
>
> Exactly. Even capability bits are easier to get assigned.  :-)

Another way to look at it is that we've exhausted approximately
one-third of the space in six years.  In reality it is worse than that
as I've been putting the brakes on new feature bits for a while now.

> > > It was introduced with commit 0288d7183c41c0192d2963d44590f346f4aee917
> > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > Commit:     Paul Moore <pmoore@redhat.com>
> > > CommitDate: 2014-11-17 16:53:51 -0500
> > > ("audit: convert status version to a feature bitmap")
> > > It was introduced specifically to enable distributions to selectively
> > > backport features.  It was converted away from AUDIT_VERSION.
> > >
> > > There are other ways to detect the presence of backlog_wait_time_actual
> > > as I mentioned above.
> >
> > Let me be blunt - I honestly don't care what Steve's audit userspace
> > does to detect this.  I've got my own opinion, but Steve's audit
> > userspace is not my project to manage and I think we've established
> > over the years that Steve and I have very different views on what
> > constitutes good design.
>
> And guessing what might be in buffers of different sizes is good design? The
> FEATURE_BITMAP was introduced to get rid of this ambiguity.

There is just soo much to unpack in your comment Steve, but let me
keep it short ...

- This is an enterprise distro problem, not an upstream problem.  The
problems you are talking about are not a problem for upstream.

- You can obviously backport things, you just have to ensure you
preserve the structure order/size.  It may require you backporting
multiple features, but if you're already cherry-picking patches you've
already gone out on your own.  This approach is both obvious and
commonly done, if it hasn't occurred to you I don't know what to say.

... and finally, to be blunt again - I'm not merging a patch to add a
feature bit for this.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-04  2:16                 ` Paul Moore
@ 2020-12-04  2:47                   ` Steve Grubb
  2020-12-04 20:41                     ` Paul Moore
  0 siblings, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2020-12-04  2:47 UTC (permalink / raw)
  To: Paul Moore; +Cc: Richard Guy Briggs, linux-audit

On Thursday, December 3, 2020 9:16:52 PM EST Paul Moore wrote:
> > > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > > Commit:     Paul Moore <pmoore@redhat.com>
> > > > CommitDate: 2014-11-17 16:53:51 -0500
> > > > ("audit: convert status version to a feature bitmap")
> > > > It was introduced specifically to enable distributions to selectively
> > > > backport features.  It was converted away from AUDIT_VERSION.
> > > > 
> > > > There are other ways to detect the presence of
> > > > backlog_wait_time_actual
> > > > as I mentioned above.
> > > 
> > > Let me be blunt - I honestly don't care what Steve's audit userspace
> > > does to detect this.  I've got my own opinion, but Steve's audit
> > > userspace is not my project to manage and I think we've established
> > > over the years that Steve and I have very different views on what
> > > constitutes good design.
> > 
> > And guessing what might be in buffers of different sizes is good design?
> > The FEATURE_BITMAP was introduced to get rid of this ambiguity.
> 
> There is just soo much to unpack in your comment Steve, but let me
> keep it short ...
> 
> - This is an enterprise distro problem, not an upstream problem.  The
> problems you are talking about are not a problem for upstream.

You may look at it that way. I do not. Audit -userspace is also an upstream 
for a lot of distros and I need to make this painless for them. So, while you 
may think of this being a backport problem for Red Hat to solve, I think of 
this as a generic problem that I'd like to solve for Debian, Suse, Ubuntu, 
Arch, Gentoo, anyone using audit. We both are upstream.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-04  2:47                   ` Steve Grubb
@ 2020-12-04 20:41                     ` Paul Moore
  2020-12-07 21:13                       ` Max Englander
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Moore @ 2020-12-04 20:41 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Richard Guy Briggs, linux-audit

On Thu, Dec 3, 2020 at 9:47 PM Steve Grubb <sgrubb@redhat.com> wrote:
> On Thursday, December 3, 2020 9:16:52 PM EST Paul Moore wrote:
> > > > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > > > Commit:     Paul Moore <pmoore@redhat.com>
> > > > > CommitDate: 2014-11-17 16:53:51 -0500
> > > > > ("audit: convert status version to a feature bitmap")
> > > > > It was introduced specifically to enable distributions to selectively
> > > > > backport features.  It was converted away from AUDIT_VERSION.
> > > > >
> > > > > There are other ways to detect the presence of
> > > > > backlog_wait_time_actual
> > > > > as I mentioned above.
> > > >
> > > > Let me be blunt - I honestly don't care what Steve's audit userspace
> > > > does to detect this.  I've got my own opinion, but Steve's audit
> > > > userspace is not my project to manage and I think we've established
> > > > over the years that Steve and I have very different views on what
> > > > constitutes good design.
> > >
> > > And guessing what might be in buffers of different sizes is good design?
> > > The FEATURE_BITMAP was introduced to get rid of this ambiguity.
> >
> > There is just soo much to unpack in your comment Steve, but let me
> > keep it short ...
> >
> > - This is an enterprise distro problem, not an upstream problem.  The
> > problems you are talking about are not a problem for upstream.
>
> You may look at it that way. I do not. Audit -userspace is also an upstream
> for a lot of distros and I need to make this painless for them. So, while you
> may think of this being a backport problem for Red Hat to solve, I think of
> this as a generic problem that I'd like to solve for Debian, Suse, Ubuntu,
> Arch, Gentoo, anyone using audit. We both are upstream.

I intentionally said "enterprise Linux distributions", I never singled
out RH/IBM.  Contrary to what RH/IBM marketing may have me believe, I
don't consider RHEL to be the only "enterprise Linux distribution" :)

Beyond that, while I haven't looked at all of the distros you list
above, I know a few of them typically only backport fixes, not new
features.  Further, as I mentioned previously in this thread, there is
a way to backport this feature in a safe manner without using the
feature bits.  Eeeeeven further, if there wasn't a way to backport
this feature safely (and let me stress agai that you can backport this
safely), I would still consider that to be a distro problem and not an
upstream kernel problem.  The upstream kernel is not responsible for
enabling or supporting arbitrary combinations of patches.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-07-02 20:42 ` Paul Moore
                     ` (2 preceding siblings ...)
  2020-12-03  3:52   ` Steve Grubb
@ 2020-12-07 19:43   ` Lenny Bruzenak
  2020-12-07 21:14     ` Paul Moore
  3 siblings, 1 reply; 31+ messages in thread
From: Lenny Bruzenak @ 2020-12-07 19:43 UTC (permalink / raw)
  To: linux-audit


[-- Attachment #1.1: Type: text/plain, Size: 1837 bytes --]

On 7/2/20 2:42 PM, Paul Moore wrote:

>>   #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
>>   #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
>> @@ -348,6 +349,7 @@ enum {
>>   #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
>>   #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
>>   #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
>> +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it*should*  key off the structure size and
> not a new flag in the bitmask;)
>
> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.
>
Paul,

This change does seem to the untrained eye to be in line with the 
existing FEATURE_BITMAP definitions. I appreciate your intent on not 
exhausting the available space, but at some point if that happens is 
there any reasonable way to expand? I'm sure you have some thoughts, or 
is this "it" as far as features could go (the last available bits)?

Max,

It's a pretty good feature. I agree with your original problem 
assessment; this is an area I'm always looking at. I've got questions 
I'll post separately as they are not germane to this thread.


As an interested user I'm hoping for a resolution on this, so that the 
userspace release can happen, as this seems to be a beneficial change 
which I could make use of when available.


Thx,

LCB


-- 
Lenny Bruzenak
MagitekLTD


[-- Attachment #1.2: Type: text/html, Size: 2649 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-04 20:41                     ` Paul Moore
@ 2020-12-07 21:13                       ` Max Englander
  2020-12-07 21:17                         ` Paul Moore
  2020-12-07 21:21                         ` Richard Guy Briggs
  0 siblings, 2 replies; 31+ messages in thread
From: Max Englander @ 2020-12-07 21:13 UTC (permalink / raw)
  To: Paul Moore; +Cc: Richard Guy Briggs, linux-audit


[-- Attachment #1.1: Type: text/plain, Size: 4742 bytes --]

On Fri, Dec 4, 2020 at 3:41 PM Paul Moore <paul@paul-moore.com> wrote:

> On Thu, Dec 3, 2020 at 9:47 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > On Thursday, December 3, 2020 9:16:52 PM EST Paul Moore wrote:
> > > > > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > > > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > > > > Commit:     Paul Moore <pmoore@redhat.com>
> > > > > > CommitDate: 2014-11-17 16:53:51 -0500
> > > > > > ("audit: convert status version to a feature bitmap")
> > > > > > It was introduced specifically to enable distributions to
> selectively
> > > > > > backport features.  It was converted away from AUDIT_VERSION.
> > > > > >
> > > > > > There are other ways to detect the presence of
> > > > > > backlog_wait_time_actual
> > > > > > as I mentioned above.
> > > > >
> > > > > Let me be blunt - I honestly don't care what Steve's audit
> userspace
> > > > > does to detect this.  I've got my own opinion, but Steve's audit
> > > > > userspace is not my project to manage and I think we've established
> > > > > over the years that Steve and I have very different views on what
> > > > > constitutes good design.
> > > >
> > > > And guessing what might be in buffers of different sizes is good
> design?
> > > > The FEATURE_BITMAP was introduced to get rid of this ambiguity.
> > >
> > > There is just soo much to unpack in your comment Steve, but let me
> > > keep it short ...
> > >
> > > - This is an enterprise distro problem, not an upstream problem.  The
> > > problems you are talking about are not a problem for upstream.
> >
> > You may look at it that way. I do not. Audit -userspace is also an
> upstream
> > for a lot of distros and I need to make this painless for them. So,
> while you
> > may think of this being a backport problem for Red Hat to solve, I think
> of
> > this as a generic problem that I'd like to solve for Debian, Suse,
> Ubuntu,
> > Arch, Gentoo, anyone using audit. We both are upstream.
>
> I intentionally said "enterprise Linux distributions", I never singled
> out RH/IBM.  Contrary to what RH/IBM marketing may have me believe, I
> don't consider RHEL to be the only "enterprise Linux distribution" :)
>
> Beyond that, while I haven't looked at all of the distros you list
> above, I know a few of them typically only backport fixes, not new
> features.  Further, as I mentioned previously in this thread, there is
> a way to backport this feature in a safe manner without using the
> feature bits.  Eeeeeven further, if there wasn't a way to backport
> this feature safely (and let me stress agai that you can backport this
> safely), I would still consider that to be a distro problem and not an
> upstream kernel problem.  The upstream kernel is not responsible for
> enabling or supporting arbitrary combinations of patches.
>
> --
> paul moore
> www.paul-moore.com
>
> --
> Linux-audit mailing list
> Linux-audit@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-audit
>
>
Hi Steve, Paul,

I'm replying with the Gmail UI since I don't have my Mutt setup handy, so
apologies for any formatting which doesn't align with the mailing list best
 practices!

First off, my apologies for being late to the thread, and for submitting
code
to the kernel and user space which aren't playing nicely with each other.

It sounds like there's a decision to be made around whether or not to use
the bitmap feature flags which I probably am probably not in a position to
help decide. However, I'm more than happy to fix my userspace PR so
that it does not rely on the feature flag space using the approach Paul
outlined, in spite of the drawbacks, if that ends up being the decision.

Steve, I understand your preference to rely on the feature bitmap since it
is a more reliable way to determine the availability of a feature than
key size, but if you're open to Paul's recommendations in spite of the
drawbacks, I'll make the changes to my patch as soon as I can to unblock
your work.

Separately, since there is tension between these two approaches
(structure size and bitmap), I wonder if Paul/Steve you would be open
to a third way.

For example, I can imagine adding additional bitmap
spaces (FEATURE_BITMAP_2, FEATURE_BITMAP_3, etc.).
Alternately, I can imagine each feature being assigned a unique u64
ID, and user space programs querying the kernel to see whether a
a particular feature is enabled.

I'm not familiar enough with the kernel to be able to judge how sound
either idea is (or if these have been considered and rejected in the past)
but if you all think a third way is viable, I'd be happy to start a separate
mailing thread to try to thread the competing requirements of the kernel
and userspace, and contribute code if we can find a solution.

Max

[-- Attachment #1.2: Type: text/html, Size: 6504 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 19:43   ` Lenny Bruzenak
@ 2020-12-07 21:14     ` Paul Moore
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Moore @ 2020-12-07 21:14 UTC (permalink / raw)
  To: Lenny Bruzenak; +Cc: linux-audit

On Mon, Dec 7, 2020 at 2:43 PM Lenny Bruzenak <lenny@magitekltd.com> wrote:
> Paul,
>
> This change does seem to the untrained eye to be in line with the existing FEATURE_BITMAP definitions. I appreciate your intent on not exhausting the available space, but at some point if that happens is there any reasonable way to expand? I'm sure you have some thoughts, or is this "it" as far as features could go (the last available bits)?

I haven't given it a lot of thought, but I'm sure there are ways we
could expand the bitmap if/when it is needed.  I'm also sure it would
be ugly; it almost always is a pain.

My hope is that we've changed the API before then so it is a non-issue.

-- 
paul moore
www.paul-moore.com


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 21:13                       ` Max Englander
@ 2020-12-07 21:17                         ` Paul Moore
  2020-12-07 21:21                         ` Richard Guy Briggs
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Moore @ 2020-12-07 21:17 UTC (permalink / raw)
  To: Max Englander; +Cc: Richard Guy Briggs, linux-audit

On Mon, Dec 7, 2020 at 4:13 PM Max Englander <max.englander@gmail.com> wrote:
> It sounds like there's a decision to be made around whether or not to use
> the bitmap feature flags which I probably am probably not in a position to
> help decide. However, I'm more than happy to fix my userspace PR so
> that it does not rely on the feature flag space using the approach Paul
> outlined, in spite of the drawbacks, if that ends up being the decision.

As mentioned several times in the past, I'm not merging a patch which
allocates a bitmap entry for this feature.

> Separately, since there is tension between these two approaches
> (structure size and bitmap), I wonder if Paul/Steve you would be open
> to a third way.
>
> For example, I can imagine adding additional bitmap
> spaces (FEATURE_BITMAP_2, FEATURE_BITMAP_3, etc.).
> Alternately, I can imagine each feature being assigned a unique u64
> ID, and user space programs querying the kernel to see whether a
> a particular feature is enabled.

This isn't attractive to me at this point in time.  NACK.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 21:13                       ` Max Englander
  2020-12-07 21:17                         ` Paul Moore
@ 2020-12-07 21:21                         ` Richard Guy Briggs
  2020-12-07 21:28                           ` Max Englander
  1 sibling, 1 reply; 31+ messages in thread
From: Richard Guy Briggs @ 2020-12-07 21:21 UTC (permalink / raw)
  To: Max Englander; +Cc: linux-audit

On 2020-12-07 16:13, Max Englander wrote:
> On Fri, Dec 4, 2020 at 3:41 PM Paul Moore <paul@paul-moore.com> wrote:
> 
> > On Thu, Dec 3, 2020 at 9:47 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > > On Thursday, December 3, 2020 9:16:52 PM EST Paul Moore wrote:
> > > > > > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > > > > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > > > > > Commit:     Paul Moore <pmoore@redhat.com>
> > > > > > > CommitDate: 2014-11-17 16:53:51 -0500
> > > > > > > ("audit: convert status version to a feature bitmap")
> > > > > > > It was introduced specifically to enable distributions to
> > selectively
> > > > > > > backport features.  It was converted away from AUDIT_VERSION.
> > > > > > >
> > > > > > > There are other ways to detect the presence of
> > > > > > > backlog_wait_time_actual
> > > > > > > as I mentioned above.
> > > > > >
> > > > > > Let me be blunt - I honestly don't care what Steve's audit
> > userspace
> > > > > > does to detect this.  I've got my own opinion, but Steve's audit
> > > > > > userspace is not my project to manage and I think we've established
> > > > > > over the years that Steve and I have very different views on what
> > > > > > constitutes good design.
> > > > >
> > > > > And guessing what might be in buffers of different sizes is good
> > design?
> > > > > The FEATURE_BITMAP was introduced to get rid of this ambiguity.
> > > >
> > > > There is just soo much to unpack in your comment Steve, but let me
> > > > keep it short ...
> > > >
> > > > - This is an enterprise distro problem, not an upstream problem.  The
> > > > problems you are talking about are not a problem for upstream.
> > >
> > > You may look at it that way. I do not. Audit -userspace is also an
> > upstream
> > > for a lot of distros and I need to make this painless for them. So,
> > while you
> > > may think of this being a backport problem for Red Hat to solve, I think
> > of
> > > this as a generic problem that I'd like to solve for Debian, Suse,
> > Ubuntu,
> > > Arch, Gentoo, anyone using audit. We both are upstream.
> >
> > I intentionally said "enterprise Linux distributions", I never singled
> > out RH/IBM.  Contrary to what RH/IBM marketing may have me believe, I
> > don't consider RHEL to be the only "enterprise Linux distribution" :)
> >
> > Beyond that, while I haven't looked at all of the distros you list
> > above, I know a few of them typically only backport fixes, not new
> > features.  Further, as I mentioned previously in this thread, there is
> > a way to backport this feature in a safe manner without using the
> > feature bits.  Eeeeeven further, if there wasn't a way to backport
> > this feature safely (and let me stress agai that you can backport this
> > safely), I would still consider that to be a distro problem and not an
> > upstream kernel problem.  The upstream kernel is not responsible for
> > enabling or supporting arbitrary combinations of patches.
> >
> > --
> > paul moore
> > www.paul-moore.com
> >
> > --
> > Linux-audit mailing list
> > Linux-audit@redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-audit
> >
> >
> Hi Steve, Paul,
> 
> I'm replying with the Gmail UI since I don't have my Mutt setup handy, so
> apologies for any formatting which doesn't align with the mailing list best
>  practices!
> 
> First off, my apologies for being late to the thread, and for submitting
> code
> to the kernel and user space which aren't playing nicely with each other.
> 
> It sounds like there's a decision to be made around whether or not to use
> the bitmap feature flags which I probably am probably not in a position to
> help decide. However, I'm more than happy to fix my userspace PR so
> that it does not rely on the feature flag space using the approach Paul
> outlined, in spite of the drawbacks, if that ends up being the decision.
> 
> Steve, I understand your preference to rely on the feature bitmap since it
> is a more reliable way to determine the availability of a feature than
> key size, but if you're open to Paul's recommendations in spite of the
> drawbacks, I'll make the changes to my patch as soon as I can to unblock
> your work.
> 
> Separately, since there is tension between these two approaches
> (structure size and bitmap), I wonder if Paul/Steve you would be open
> to a third way.

Max, Steve: have you looked at my reply in this thread from 2020-12-03 18:10?

There are two solutions in there that should work without relying on the
unreliable test for structure size that work without changing the
currently commited kernel code.  Or have I missed something?

> For example, I can imagine adding additional bitmap
> spaces (FEATURE_BITMAP_2, FEATURE_BITMAP_3, etc.).
> Alternately, I can imagine each feature being assigned a unique u64
> ID, and user space programs querying the kernel to see whether a
> a particular feature is enabled.
> 
> I'm not familiar enough with the kernel to be able to judge how sound
> either idea is (or if these have been considered and rejected in the past)
> but if you all think a third way is viable, I'd be happy to start a separate
> mailing thread to try to thread the competing requirements of the kernel
> and userspace, and contribute code if we can find a solution.
> 
> Max

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 21:21                         ` Richard Guy Briggs
@ 2020-12-07 21:28                           ` Max Englander
  2020-12-07 23:28                             ` Steve Grubb
  0 siblings, 1 reply; 31+ messages in thread
From: Max Englander @ 2020-12-07 21:28 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit


[-- Attachment #1.1: Type: text/plain, Size: 6065 bytes --]

On Mon, Dec 7, 2020 at 4:21 PM Richard Guy Briggs <rgb@redhat.com> wrote:

> On 2020-12-07 16:13, Max Englander wrote:
> > On Fri, Dec 4, 2020 at 3:41 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > > On Thu, Dec 3, 2020 at 9:47 PM Steve Grubb <sgrubb@redhat.com> wrote:
> > > > On Thursday, December 3, 2020 9:16:52 PM EST Paul Moore wrote:
> > > > > > > > Author:     Richard Guy Briggs <rgb@redhat.com>
> > > > > > > > AuthorDate: 2014-11-17 15:51:01 -0500
> > > > > > > > Commit:     Paul Moore <pmoore@redhat.com>
> > > > > > > > CommitDate: 2014-11-17 16:53:51 -0500
> > > > > > > > ("audit: convert status version to a feature bitmap")
> > > > > > > > It was introduced specifically to enable distributions to
> > > selectively
> > > > > > > > backport features.  It was converted away from AUDIT_VERSION.
> > > > > > > >
> > > > > > > > There are other ways to detect the presence of
> > > > > > > > backlog_wait_time_actual
> > > > > > > > as I mentioned above.
> > > > > > >
> > > > > > > Let me be blunt - I honestly don't care what Steve's audit
> > > userspace
> > > > > > > does to detect this.  I've got my own opinion, but Steve's
> audit
> > > > > > > userspace is not my project to manage and I think we've
> established
> > > > > > > over the years that Steve and I have very different views on
> what
> > > > > > > constitutes good design.
> > > > > >
> > > > > > And guessing what might be in buffers of different sizes is good
> > > design?
> > > > > > The FEATURE_BITMAP was introduced to get rid of this ambiguity.
> > > > >
> > > > > There is just soo much to unpack in your comment Steve, but let me
> > > > > keep it short ...
> > > > >
> > > > > - This is an enterprise distro problem, not an upstream problem.
> The
> > > > > problems you are talking about are not a problem for upstream.
> > > >
> > > > You may look at it that way. I do not. Audit -userspace is also an
> > > upstream
> > > > for a lot of distros and I need to make this painless for them. So,
> > > while you
> > > > may think of this being a backport problem for Red Hat to solve, I
> think
> > > of
> > > > this as a generic problem that I'd like to solve for Debian, Suse,
> > > Ubuntu,
> > > > Arch, Gentoo, anyone using audit. We both are upstream.
> > >
> > > I intentionally said "enterprise Linux distributions", I never singled
> > > out RH/IBM.  Contrary to what RH/IBM marketing may have me believe, I
> > > don't consider RHEL to be the only "enterprise Linux distribution" :)
> > >
> > > Beyond that, while I haven't looked at all of the distros you list
> > > above, I know a few of them typically only backport fixes, not new
> > > features.  Further, as I mentioned previously in this thread, there is
> > > a way to backport this feature in a safe manner without using the
> > > feature bits.  Eeeeeven further, if there wasn't a way to backport
> > > this feature safely (and let me stress agai that you can backport this
> > > safely), I would still consider that to be a distro problem and not an
> > > upstream kernel problem.  The upstream kernel is not responsible for
> > > enabling or supporting arbitrary combinations of patches.
> > >
> > > --
> > > paul moore
> > > www.paul-moore.com
> > >
> > > --
> > > Linux-audit mailing list
> > > Linux-audit@redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-audit
> > >
> > >
> > Hi Steve, Paul,
> >
> > I'm replying with the Gmail UI since I don't have my Mutt setup handy, so
> > apologies for any formatting which doesn't align with the mailing list
> best
> >  practices!
> >
> > First off, my apologies for being late to the thread, and for submitting
> > code
> > to the kernel and user space which aren't playing nicely with each other.
> >
> > It sounds like there's a decision to be made around whether or not to use
> > the bitmap feature flags which I probably am probably not in a position
> to
> > help decide. However, I'm more than happy to fix my userspace PR so
> > that it does not rely on the feature flag space using the approach Paul
> > outlined, in spite of the drawbacks, if that ends up being the decision.
> >
> > Steve, I understand your preference to rely on the feature bitmap since
> it
> > is a more reliable way to determine the availability of a feature than
> > key size, but if you're open to Paul's recommendations in spite of the
> > drawbacks, I'll make the changes to my patch as soon as I can to unblock
> > your work.
> >
> > Separately, since there is tension between these two approaches
> > (structure size and bitmap), I wonder if Paul/Steve you would be open
> > to a third way.
>
> Max, Steve: have you looked at my reply in this thread from 2020-12-03
> 18:10?
>
> There are two solutions in there that should work without relying on the
> unreliable test for structure size that work without changing the
> currently commited kernel code.  Or have I missed something?
>
> > For example, I can imagine adding additional bitmap
> > spaces (FEATURE_BITMAP_2, FEATURE_BITMAP_3, etc.).
> > Alternately, I can imagine each feature being assigned a unique u64
> > ID, and user space programs querying the kernel to see whether a
> > a particular feature is enabled.
> >
> > I'm not familiar enough with the kernel to be able to judge how sound
> > either idea is (or if these have been considered and rejected in the
> past)
> > but if you all think a third way is viable, I'd be happy to start a
> separate
> > mailing thread to try to thread the competing requirements of the kernel
> > and userspace, and contribute code if we can find a solution.
> >
> > Max
>
> - RGB
>
> --
> Richard Guy Briggs <rgb@redhat.com>
> Sr. S/W Engineer, Kernel Security, Base Operating Systems
> Remote, Ottawa, Red Hat Canada
> IRC: rgb, SunRaycer
> Voice: +1.647.777.2635, Internal: (81) 32635
>
>
Richard,

I missed that. That suggestion seems reasonable to me. Thanks for
the suggestion.

Steve, I'm happy to make changes to the userspace PR based on
Richard's suggestions, if that sounds good to you. I'll follow up in
the PR to discuss it more

Max

[-- Attachment #1.2: Type: text/html, Size: 8303 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-03  4:33 ` Joe Wulf
@ 2020-12-07 21:48   ` Max Englander
  0 siblings, 0 replies; 31+ messages in thread
From: Max Englander @ 2020-12-07 21:48 UTC (permalink / raw)
  To: Joe Wulf; +Cc: linux-audit


[-- Attachment #1.1: Type: text/plain, Size: 4614 bytes --]

On Wed, Dec 2, 2020 at 11:33 PM Joe Wulf <joe_wulf@yahoo.com> wrote:

> I would like to suggest providing a mechanism where admins can query the
> status or state of backlog issues (wait time, sums, etc...).  Maybe the
> intent is to expand the output of status checking of auditd.
>
> I believe further clarity is beneficial on the setting of the
> 'backlog_wait_sum' (or to whatever the name evolves to) initially.
> -  How it evolves over time
> -  What the conditions in the system, or auditing, would change it
> -  What conditions admins should pay attention to for informational
> understanding of status
> -  What conditions admins should realize exist such that adjustments are
> needed
>    (and suggestions to what those adjustments should be)
> -  What new guidance will admins have for building adjusting audit.rules
> around this
>
> Consider the scenario where auditing has been 'working fine' for days.
> Little to no active admin monitoring.
> Events occur to spike the auditing such that backloging of audit records
> dramatically increases.
> (for some reason) admins now come looking to investigate.
> Assuming they do:  'systemctl status auditd' the newly presented 'state'
> of the 'backlog_wait_sum' will show some evidence.
> Q:  Is that just a moment in time?
> Q:  What information here will give the perspective things are good/ok
> 'now', versus some action needs to be taken?
>
> Maybe that isn't a great scenario, or good questions----it is what occurs
> to me at the moment.
>
> Thank you.
>
> R,
> -Joe Wulf
>
>
> On Wednesday, July 1, 2020, 5:33:14 PM EDT, Max Englander <
> max.englander@gmail.com> wrote:
>
> >  In environments where the preservation of audit events and predictable
> >  usage of system memory are prioritized, admins may use a combination of
> >  --backlog_wait_time and -b options at the risk of degraded performance
> >  resulting from backlog waiting. In some cases, this risk may be
> >  preferred to lost events or unbounded memory usage. Ideally, this risk
> >  can be mitigated by making adjustments when backlog waiting is
> detected.
> >
> >  However, detection can be diffult using the currently available
> metrics.
> >  For example, an admin attempting to debug degraded performance may
> >  falsely believe a full backlog indicates backlog waiting. It may turn
> >  out the backlog frequently fills up but drains quickly.
> >
> >  To make it easier to reliably track degraded performance to backlog
> >  waiting, this patch makes the following changes:
> >
> >  Add a new field backlog_wait_sum to the audit status reply. Initialize
> >  this field to zero. Add to this field the total time spent by the
> >  current task on scheduled timeouts while the backlog limit is exceeded.
> >
> >  Tested on Ubuntu 18.04 using complementary changes to the audit
> >  userspace: https://github.com/linux-audit/audit-userspace/pull/134.
>
> <snip>
>

Hi Joe,

Not sure I can address all your points above, but the way that we monitor
Linux audit internals at my employer is to continuously monitor the audit
status
response with short evaluation windows.

- We compute a rate of change on the lost field, and alert if the there are
more than N lost records per second on average
- We compute the backlog utilization by computing backlog/backlog_limit,
and alert if that goes above 75% at any point in time
- If/when we run on a kernel that has backlog_wait_time_actual, we'll
monitor on that as well, setting thresholds around where we'd expect
growth in this value to result in service degradation.

If we get an alert, and it is just a blip that goes away and doesn't come
back,
we probably won't spend a lot of time investigating. However, if we see
that the alert is frequently active across multiple hosts, that will prompt
us
to investigate. As far as what action we would take, it would depend on
the precise values in the audit status reply, as well as other information
we
had gathered from our system. For example, if we observed elevated values
for backlog and backlog_wait_time_actual, we might first investigate other
environmental factors such as whether the auditd daemon was crashed or
starved for CPU time. If we saw that lost was high but backlog was low
that might indicate to us that the rate limit is being exceeded, or that the
kernel is out of memory.

I agree with you that it would help to expand the metrics reported in
audit status. For example, reporting the number of times an audit record was
lost due to rate limit being exceeded would help.

Not sure how responsive this is to your questions. Hope it helps some.

Thanks,
Max

[-- Attachment #1.2: Type: text/html, Size: 7010 bytes --]

[-- Attachment #2: Type: text/plain, Size: 102 bytes --]

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 21:28                           ` Max Englander
@ 2020-12-07 23:28                             ` Steve Grubb
  2020-12-08  1:34                               ` Richard Guy Briggs
  0 siblings, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2020-12-07 23:28 UTC (permalink / raw)
  To: Max Englander; +Cc: Richard Guy Briggs, linux-audit

Hello Max,

On Monday, December 7, 2020 4:28:14 PM EST Max Englander wrote:
> Steve, I'm happy to make changes to the userspace PR based on
> Richard's suggestions, if that sounds good to you. I'll follow up in
> the PR to discuss it more

The only issue is new userspace on old kernel. I think if we use both the 
configure macro in addition to a size check, then it will at least allow 
forward and backward compatibility.

Other metrics would be good. I'd like to see a max_backlog to know if we are 
wasting memory. It would just record the highwater mark since auditing was 
enabled.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-07 23:28                             ` Steve Grubb
@ 2020-12-08  1:34                               ` Richard Guy Briggs
  2020-12-08  3:34                                 ` Steve Grubb
  2020-12-08 23:08                                 ` Paul Moore
  0 siblings, 2 replies; 31+ messages in thread
From: Richard Guy Briggs @ 2020-12-08  1:34 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

On 2020-12-07 18:28, Steve Grubb wrote:
> Hello Max,
> 
> On Monday, December 7, 2020 4:28:14 PM EST Max Englander wrote:
> > Steve, I'm happy to make changes to the userspace PR based on
> > Richard's suggestions, if that sounds good to you. I'll follow up in
> > the PR to discuss it more
> 
> The only issue is new userspace on old kernel. I think if we use both the 
> configure macro in addition to a size check, then it will at least allow 
> forward and backward compatibility.

Are you talking about a new userspace compiled on a new kernel header
file run on an old kernel?  That would be less reliable and need the
size check.  The bitmap would be the most reliable in that scenario.

By configure macro are you talking about the presence of that audit
status mask bit, or the presence of that struct audit_status member?

> Other metrics would be good. I'd like to see a max_backlog to know if we are 
> wasting memory. It would just record the highwater mark since auditing was 
> enabled.

That would be covered with this issue:
	https://github.com/linux-audit/audit-kernel/issues/63

> -Steve

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-08  1:34                               ` Richard Guy Briggs
@ 2020-12-08  3:34                                 ` Steve Grubb
  2020-12-08 13:20                                   ` Richard Guy Briggs
  2020-12-08 23:08                                 ` Paul Moore
  1 sibling, 1 reply; 31+ messages in thread
From: Steve Grubb @ 2020-12-08  3:34 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Monday, December 7, 2020 8:34:35 PM EST Richard Guy Briggs wrote:
> On 2020-12-07 18:28, Steve Grubb wrote:
> > Hello Max,
> > 
> > On Monday, December 7, 2020 4:28:14 PM EST Max Englander wrote:
> > > Steve, I'm happy to make changes to the userspace PR based on
> > > Richard's suggestions, if that sounds good to you. I'll follow up in
> > > the PR to discuss it more
> > 
> > The only issue is new userspace on old kernel. I think if we use both the
> > configure macro in addition to a size check, then it will at least allow
> > forward and backward compatibility.
> 
> Are you talking about a new userspace compiled on a new kernel header
> file run on an old kernel?

Yes. This is my worry. Someone compiles the code and the does a roll back. It 
can happen because the new kernel has some problems that a driver cannot 
handle.

> That would be less reliable and need the
> size check.  The bitmap would be the most reliable in that scenario.

Right, but the person that can make that happen doesn't want to use this 
facility for what it was intended for. So, we are all trying to do the best.


> By configure macro are you talking about the presence of that audit
> status mask bit, or the presence of that struct audit_status member?

Yes. But it doesn't apply to old kernels.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-08  3:34                                 ` Steve Grubb
@ 2020-12-08 13:20                                   ` Richard Guy Briggs
  2020-12-08 13:44                                     ` Steve Grubb
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Guy Briggs @ 2020-12-08 13:20 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

On 2020-12-07 22:34, Steve Grubb wrote:
> On Monday, December 7, 2020 8:34:35 PM EST Richard Guy Briggs wrote:
> > On 2020-12-07 18:28, Steve Grubb wrote:
> > > Hello Max,
> > > 
> > > On Monday, December 7, 2020 4:28:14 PM EST Max Englander wrote:
> > > > Steve, I'm happy to make changes to the userspace PR based on
> > > > Richard's suggestions, if that sounds good to you. I'll follow up in
> > > > the PR to discuss it more
> > > 
> > > The only issue is new userspace on old kernel. I think if we use both the
> > > configure macro in addition to a size check, then it will at least allow
> > > forward and backward compatibility.
> > 
> > Are you talking about a new userspace compiled on a new kernel header
> > file run on an old kernel?
> 
> Yes. This is my worry. Someone compiles the code and the does a roll back. It 
> can happen because the new kernel has some problems that a driver cannot 
> handle.

Ok, fair enough.

> > That would be less reliable and need the
> > size check.  The bitmap would be the most reliable in that scenario.
> 
> Right, but the person that can make that happen doesn't want to use this 
> facility for what it was intended for. So, we are all trying to do the best.

Yes, the firmness of that stance is puzzling to me...

> > By configure macro are you talking about the presence of that audit
> > status mask bit, or the presence of that struct audit_status member?
> 
> Yes. But it doesn't apply to old kernels.

An "or" question usually needs one or the other reply unless both are
true...  Which one were you talking about?

> -Steve

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-08 13:20                                   ` Richard Guy Briggs
@ 2020-12-08 13:44                                     ` Steve Grubb
  0 siblings, 0 replies; 31+ messages in thread
From: Steve Grubb @ 2020-12-08 13:44 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Tuesday, December 8, 2020 8:20:03 AM EST Richard Guy Briggs wrote:
> > > By configure macro are you talking about the presence of that audit
> > > status mask bit, or the presence of that struct audit_status member?
> > 
> > Yes. But it doesn't apply to old kernels.
> 
> An "or" question usually needs one or the other reply unless both are
> true...  Which one were you talking about?

Either. You shouldn't have one without the other. We just need to handle old 
user space/new kernel & new userspace/old kernel gracefully.

-Steve


--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* report audit wait metric in audit status reply
  2020-07-01 21:32 [PATCH v2] audit: report audit wait metric in audit status reply Max Englander
  2020-07-02 20:42 ` Paul Moore
  2020-12-03  4:33 ` Joe Wulf
@ 2020-12-08 16:57 ` Lenny Bruzenak
  2 siblings, 0 replies; 31+ messages in thread
From: Lenny Bruzenak @ 2020-12-08 16:57 UTC (permalink / raw)
  To: linux-audit

On 7/1/20 3:32 PM, Max Englander wrote:

> In environments where the preservation of audit events and predictable
> usage of system memory are prioritized, admins may use a combination of
> --backlog_wait_time and -b options at the risk of degraded performance
> resulting from backlog waiting. In some cases, this risk may be
> preferred to lost events or unbounded memory usage. Ideally, this risk
> can be mitigated by making adjustments when backlog waiting is detected.
>
> However, detection can be diffult using the currently available metrics.
> For example, an admin attempting to debug degraded performance may
> falsely believe a full backlog indicates backlog waiting. It may turn
> out the backlog frequently fills up but drains quickly.
>
> To make it easier to reliably track degraded performance to backlog
> waiting, this patch makes the following changes:
>
> Add a new field backlog_wait_sum to the audit status reply. Initialize
> this field to zero. Add to this field the total time spent by the
> current task on scheduled timeouts while the backlog limit is exceeded.
>
> Tested on Ubuntu 18.04 using complementary changes to the audit
> userspace:https://github.com/linux-audit/audit-userspace/pull/134.

Max,

Along those lines, the current failure actions (silent, printk, panic) 
are kind of restrictive. I guess one can filter on the printk messages 
and redirect those to a userspace handler which might do something 
specific to the operating environment? Is that how you would handle it? 
Or would your admin just look for it and report? If you have any 
shareable info I'd appreciate seeing it. Almost no one I can think of 
would want a panic to happen, but only almost. No one who needs some 
level of assurance would want "silent".

FWIW I looked at the kernel printk calls, and although I maybe looked at 
the wrong one, even though on boot I'm seeing drops from the "auditctl 
-s", I do not see any output in the dmesg buffer that appears to be 
indicative of this. My guess there is that on bootup, the auditd 
userspace config has not yet been activated and it's likely using a 
"silent" default...but here I realize was not the focus of your effort. 
Just musing.

I applaud your efforts in this area, and if you are able to share any 
practices about handling the backlogs I'd appreciate seeing that.

V/R,

LCB

-- 
Lenny Bruzenak
MagitekLTD

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2] audit: report audit wait metric in audit status reply
  2020-12-08  1:34                               ` Richard Guy Briggs
  2020-12-08  3:34                                 ` Steve Grubb
@ 2020-12-08 23:08                                 ` Paul Moore
  1 sibling, 0 replies; 31+ messages in thread
From: Paul Moore @ 2020-12-08 23:08 UTC (permalink / raw)
  To: Richard Guy Briggs; +Cc: linux-audit

On Mon, Dec 7, 2020 at 8:34 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-12-07 18:28, Steve Grubb wrote:

...

> > Other metrics would be good. I'd like to see a max_backlog to know if we are
> > wasting memory. It would just record the highwater mark since auditing was
> > enabled.
>
> That would be covered with this issue:
>         https://github.com/linux-audit/audit-kernel/issues/63

For those who haven't clicked on the GH issue above, increasing the
queue depth doesn't result in wasted memory; memory is allocated as
needed and released when it is no longer used.  Simply increasing the
backlog size doesn't increase the amount of memory used in the kernel
by audit until the backlog queues start to fill.  Once the backlog is
cleared by auditd then the memory is released.

-- 
paul moore
www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2020-12-08 23:09 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01 21:32 [PATCH v2] audit: report audit wait metric in audit status reply Max Englander
2020-07-02 20:42 ` Paul Moore
2020-07-03 21:29   ` Richard Guy Briggs
2020-07-03 22:36     ` Max Englander
2020-07-03 22:31   ` Max Englander
2020-12-03  3:52   ` Steve Grubb
2020-12-03  4:12     ` Paul Moore
2020-12-03 12:37       ` Richard Guy Briggs
2020-12-03 15:37         ` Paul Moore
2020-12-03 23:10           ` Richard Guy Briggs
2020-12-03 23:43             ` Paul Moore
2020-12-03 23:55               ` Steve Grubb
2020-12-04  2:16                 ` Paul Moore
2020-12-04  2:47                   ` Steve Grubb
2020-12-04 20:41                     ` Paul Moore
2020-12-07 21:13                       ` Max Englander
2020-12-07 21:17                         ` Paul Moore
2020-12-07 21:21                         ` Richard Guy Briggs
2020-12-07 21:28                           ` Max Englander
2020-12-07 23:28                             ` Steve Grubb
2020-12-08  1:34                               ` Richard Guy Briggs
2020-12-08  3:34                                 ` Steve Grubb
2020-12-08 13:20                                   ` Richard Guy Briggs
2020-12-08 13:44                                     ` Steve Grubb
2020-12-08 23:08                                 ` Paul Moore
2020-12-03 13:31       ` Steve Grubb
2020-12-07 19:43   ` Lenny Bruzenak
2020-12-07 21:14     ` Paul Moore
2020-12-03  4:33 ` Joe Wulf
2020-12-07 21:48   ` Max Englander
2020-12-08 16:57 ` Lenny Bruzenak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.