From: Richard Guy Briggs <rgb@redhat.com>
To: Max Englander <max.englander@gmail.com>
Cc: linux-audit@redhat.com
Subject: Re: [PATCH] audit: optionally print warning after waiting to enqueue record
Date: Thu, 18 Jun 2020 20:30:10 -0400 [thread overview]
Message-ID: <20200619003009.yt5xdcpw6vggiwkl@madcap2.tricolour.ca> (raw)
In-Reply-To: <20200618234836.GB3975@linux-kernel-dev>
On 2020-06-18 23:48, Max Englander wrote:
> On Wed, Jun 17, 2020 at 09:06:27PM -0400, Paul Moore wrote:
> > On Wed, Jun 17, 2020 at 6:54 PM Max Englander <max.englander@gmail.com> wrote:
> > > On Wed, Jun 17, 2020 at 02:47:19PM -0400, Paul Moore wrote:
> > > > On Tue, Jun 16, 2020 at 12:58 AM Max Englander <max.englander@gmail.com> wrote:
> > > > >
> > > > > In environments where security is prioritized, users may set
> > > > > --backlog_wait_time to a high value in order to reduce the likelihood
> > > > > that any audit event is lost, even though doing so may result in
> > > > > unpredictable performance if the kernel schedules a timeout when the
> > > > > backlog limit is exceeded. For these users, the next best thing to
> > > > > predictable performance is the ability to quickly detect and react to
> > > > > degraded performance. This patch proposes to aid the detection of kernel
> > > > > audit subsystem pauses through the following changes:
> > > > >
> > > > > Add a variable named audit_backlog_warn_time. Enforce the value of this
> > > > > variable to be no less than zero, and no more than the value of
> > > > > audit_backlog_wait_time.
> > > > >
> > > > > If audit_backlog_warn_time is greater than zero and if the total time
> > > > > spent waiting to enqueue an audit record is greater than or equal to
> > > > > audit_backlog_warn_time, then print a warning with the total time
> > > > > spent waiting.
> > > > >
> > > > > An example configuration:
> > > > >
> > > > > auditctl --backlog_warn_time 50
> > > > >
> > > > > An example warning message:
> > > > >
> > > > > audit: sleep_time=52 >= audit_backlog_warn_time=50
> > > > >
> > > > > Tested on Ubuntu 18.04.04 using complementary changes to the audit
> > > > > userspace: https://github.com/linux-audit/audit-userspace/pull/131.
> > > > >
> > > > > Signed-off-by: Max Englander <max.englander@gmail.com>
> > > > > ---
> > > > > include/uapi/linux/audit.h | 7 ++++++-
> > > > > kernel/audit.c | 35 +++++++++++++++++++++++++++++++++++
> > > > > 2 files changed, 41 insertions(+), 1 deletion(-)
> > > >
> > > > If an admin is prioritizing security, aka don't loose any audit
> > > > records, and there is a concern over variable system latency due to an
> > > > audit queue backlog, why not simply disable the backlog limit?
> > > >
> > > > --
> > > > paul moore
> > > > www.paul-moore.com
> > >
> > > That’s good in some cases, but in other cases unbounded growth of the
> > > backlog could result in memory issues. If the kernel runs out of memory
> > > it would drop the audit event or possibly have other problems. It could
> > > also also consume memory in a way that starves user workloads or causes
> > > them to be killed by the OOMKiller.
> > >
> > > To refine my motivating use case a bit, if a Kubernetes admin wants to
> > > prioritize security, and also avoid unbounded growth of the audit
> > > backlog, they may set -b and --backlog_wait_time in a way that limits
> > > kernel memory usage and reduces the likelihood that any audit event is
> > > lost. Occasional performance degradation may be acceptable to the admin,
> > > but they would like a way to be alerted to prolonged kernel pauses, so
> > > that they can investigate and take corrective action (increase backlog,
> > > increase server capacity, move some workloads to other servers, etc.).
> > >
> > > To state another way. The kernel currently can be configured to print a
> > > message when the backlog limit is exceeded and it must discard the audit
> > > event. This is a useful message for admins, which they can address with
> > > corrective action. I think a message similar to the one proposed by this
> > > patch would be equally useful when the backlog limit is exceeded and the
> > > kernel is configured to wait for the backlog to drain. Admins could
> > > address that message in the same way, but without the cost of lost audit
> > > events.
> >
> > I'm still struggling to understand how this is any better than
> > disabling the backlog limit, or setting it very high, and simply
> > monitoring the audit size of the audit backlog. This way the admin
> > doesn't have to worry about the latency issues of a full backlog,
> > while still being able to trigger actions based on the state of the
> > backlog. The userspace tooling/scripting to watch the backlog size
> > would be trivial, and would arguably provide much better visibility
> > into the backlog state than a single warning threshold in the kernel.
> >
> > --
> > paul moore
> > www.paul-moore.com
>
> Removing the backlog limit entirely could lead to the memory issues I
> mentioned above (lost audit events, out-of-memory errors), and would
> effectively make the backlog limit a function of free memory. Setting
> the backlog limit higher won’t necessarily prevent it from being
> exceeded on very busy systems where the rate of audit data generation
> can, for long periods of time, outpace the ability of auditd or a
> drop-in replacement to consume it.
>
> The combination of backlog limit and wait time, on the other hand, sets
> a bound on memory while all but ensuring the preservation of audit
> events. The fact that latency can arise from using this combination is,
> for me, an acceptable cost for the predictable use of OS resources and
> reduced probability of lost events. I’m not trying to eliminate the
> possibility of latency, but rather find good means to monitor and
> quickly identify its source when it does occur.
>
> Watching the backlog limit with a userspace program, as you suggest, is
> easy enough and a valuable tool for monitoring the audit system. Even
> so, a full backlog may not always indicate long wait times. The backlog
> may fill up 100 times in a second, but drain so quickly as to have
> little impact on kernel performance. On the other hand, a specific
> warning that reports backlog wait times would directly implicate or rule
> out audit backlog waiting as the cause of degraded kernel performance,
> and lead to faster debugging and resolution.
>
> In case you’re any more receptive to the idea, I thought I’d mention
> that the need this patch addresses would be just as well fulfilled if
> wait times were reported in the audit status response along with other
> currently reported metrics like backlog length and lost events. Wait
> times could be reported as a cumulative sum, a moving average, or in
> some other way, and would help directly implicate or rule out backlog
> waiting as the cause in the event that an admin is faced with debugging
> degraded kernel performance. It would eliminate the need for a new flag,
> and fit well with the userspace tooling approach you suggested above.
Such as is captured in this upstream issue from 3 years ago:
https://github.com/linux-audit/audit-kernel/issues/63
"RFE: add kernel audit queue statistics"
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit
next prev parent reply other threads:[~2020-06-19 0:30 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 4:58 [PATCH] audit: optionally print warning after waiting to enqueue record Max Englander
2020-06-17 18:47 ` Paul Moore
2020-06-17 22:54 ` Max Englander
2020-06-18 1:06 ` Paul Moore
2020-06-18 23:48 ` Max Englander
2020-06-19 0:30 ` Richard Guy Briggs [this message]
2020-06-24 0:15 ` Paul Moore
2020-06-25 3:34 ` Max Englander
2020-06-18 13:39 ` Steve Grubb
2020-06-18 13:46 ` Paul Moore
2020-06-18 14:36 ` Steve Grubb
2020-06-18 16:29 ` Paul Moore
2020-06-18 22:57 ` Max Englander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200619003009.yt5xdcpw6vggiwkl@madcap2.tricolour.ca \
--to=rgb@redhat.com \
--cc=linux-audit@redhat.com \
--cc=max.englander@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).