linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: data-race in audit_log_start / audit_receive
       [not found] <CAEHB248ft0LFUQDHNtB9NN_D3F=12Jndh13Ue=LokajXNhMk5Q@mail.gmail.com>
@ 2022-08-19  1:59 ` Paul Moore
  2022-08-19 12:06   ` Paul Moore
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Moore @ 2022-08-19  1:59 UTC (permalink / raw)
  To: abhishek.shah; +Cc: eparis, linux-audit, linux-kernel, Gabriel Ryan

On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
<abhishek.shah@columbia.edu> wrote:
> Hi all,
>
> We found a data race involving the audit_cmd_mutex.owner variable. We think this bug is concerning because audit_ctl_owner_current is used at a location that controls the scheduling of tasks shown here. Please let us know what you think.
>
> Thanks!
>
> -----------------Report----------------------
>
> write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
>  audit_ctl_lock kernel/audit.c:237 [inline]

...

> read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
>  audit_ctl_owner_current kernel/audit.c:258 [inline]

Yes, technically there is a race condition if/when an auditd instance
is registering itself the exact same time as another task is
attempting to log an audit record via audit_log_start().  The risk
being that a *very* limited number of audit records could be
mis-handled with respect to their queue priority and that is it; no
records would be lost or misplaced.  Correcting this would likely
involve a more complex locking scheme[1] or a rather severe
performance penalty due to an additional lock in the audit_log_start()
code path.  There may be some value in modifying
audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
that this would significantly improve things or have no impact on
performance.

Have you noticed any serious problems on your system due to this?  If
you have a reproducer which shows actual harm on the system could you
please share that?

[1] The obvious choice would be to move to a RCU based scheme, but
even that doesn't totally solve the problem as there would still be a
window where some tasks would have an "old" value.  It might actually
end up extending the race window on large multi-core systems due to
the time needed for all of the critical sections to complete.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: data-race in audit_log_start / audit_receive
  2022-08-19  1:59 ` data-race in audit_log_start / audit_receive Paul Moore
@ 2022-08-19 12:06   ` Paul Moore
  2022-08-22 20:09     ` Gabriel Ryan
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Moore @ 2022-08-19 12:06 UTC (permalink / raw)
  To: abhishek.shah; +Cc: eparis, linux-audit, linux-kernel, Gabriel Ryan

On Thu, Aug 18, 2022 at 9:59 PM Paul Moore <paul@paul-moore.com> wrote:
> On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
> <abhishek.shah@columbia.edu> wrote:
> > Hi all,
> >
> > We found a data race involving the audit_cmd_mutex.owner variable. We think this bug is concerning because audit_ctl_owner_current is used at a location that controls the scheduling of tasks shown here. Please let us know what you think.
> >
> > Thanks!
> >
> > -----------------Report----------------------
> >
> > write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
> >  audit_ctl_lock kernel/audit.c:237 [inline]
>
> ...
>
> > read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
> >  audit_ctl_owner_current kernel/audit.c:258 [inline]
>
> Yes, technically there is a race condition if/when an auditd instance
> is registering itself the exact same time as another task is
> attempting to log an audit record via audit_log_start().

I realized after I sent this and turned off my computer last night
that I typed the wrong thing - the race isn't between auditd and
audit_log_start(), it's between the code which changes the audit
subsystem state (see audit_receive() and the audit watch/tree code)
and audit_log_start().

> The risk
> being that a *very* limited number of audit records could be
> mis-handled with respect to their queue priority and that is it; no
> records would be lost or misplaced.  Correcting this would likely
> involve a more complex locking scheme[1] or a rather severe
> performance penalty due to an additional lock in the audit_log_start()
> code path.  There may be some value in modifying
> audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
> that this would significantly improve things or have no impact on
> performance.

Another thing I thought of last night - I don't believe READ_ONCE()
adds a memory barrier, which would probably be needed; although my
original statement still stands, I'm not sure the performance hit
would justify the marginal impact on the audit queue.

> Have you noticed any serious problems on your system due to this?  If
> you have a reproducer which shows actual harm on the system could you
> please share that?
>
> [1] The obvious choice would be to move to a RCU based scheme, but
> even that doesn't totally solve the problem as there would still be a
> window where some tasks would have an "old" value.  It might actually
> end up extending the race window on large multi-core systems due to
> the time needed for all of the critical sections to complete.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: data-race in audit_log_start / audit_receive
  2022-08-19 12:06   ` Paul Moore
@ 2022-08-22 20:09     ` Gabriel Ryan
  2022-08-22 23:42       ` Paul Moore
  0 siblings, 1 reply; 4+ messages in thread
From: Gabriel Ryan @ 2022-08-22 20:09 UTC (permalink / raw)
  To: Paul Moore; +Cc: abhishek.shah, eparis, linux-audit, linux-kernel

Hi Paul,

Thanks for taking the time to review our report! It sounds like there
aren't severe negative impacts and patching to eliminate the race
would impose unnecessary performance penalties so we'll mark this as
benign for future reference.

Thanks,

Gabe

On Fri, Aug 19, 2022 at 8:06 AM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Aug 18, 2022 at 9:59 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
> > <abhishek.shah@columbia.edu> wrote:
> > > Hi all,
> > >
> > > We found a data race involving the audit_cmd_mutex.owner variable. We think this bug is concerning because audit_ctl_owner_current is used at a location that controls the scheduling of tasks shown here. Please let us know what you think.
> > >
> > > Thanks!
> > >
> > > -----------------Report----------------------
> > >
> > > write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
> > >  audit_ctl_lock kernel/audit.c:237 [inline]
> >
> > ...
> >
> > > read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
> > >  audit_ctl_owner_current kernel/audit.c:258 [inline]
> >
> > Yes, technically there is a race condition if/when an auditd instance
> > is registering itself the exact same time as another task is
> > attempting to log an audit record via audit_log_start().
>
> I realized after I sent this and turned off my computer last night
> that I typed the wrong thing - the race isn't between auditd and
> audit_log_start(), it's between the code which changes the audit
> subsystem state (see audit_receive() and the audit watch/tree code)
> and audit_log_start().
>
> > The risk
> > being that a *very* limited number of audit records could be
> > mis-handled with respect to their queue priority and that is it; no
> > records would be lost or misplaced.  Correcting this would likely
> > involve a more complex locking scheme[1] or a rather severe
> > performance penalty due to an additional lock in the audit_log_start()
> > code path.  There may be some value in modifying
> > audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
> > that this would significantly improve things or have no impact on
> > performance.
>
> Another thing I thought of last night - I don't believe READ_ONCE()
> adds a memory barrier, which would probably be needed; although my
> original statement still stands, I'm not sure the performance hit
> would justify the marginal impact on the audit queue.
>
> > Have you noticed any serious problems on your system due to this?  If
> > you have a reproducer which shows actual harm on the system could you
> > please share that?
> >
> > [1] The obvious choice would be to move to a RCU based scheme, but
> > even that doesn't totally solve the problem as there would still be a
> > window where some tasks would have an "old" value.  It might actually
> > end up extending the race window on large multi-core systems due to
> > the time needed for all of the critical sections to complete.
>
> --
> paul-moore.com

-- 
Gabriel Ryan
PhD Candidate at Columbia University

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: data-race in audit_log_start / audit_receive
  2022-08-22 20:09     ` Gabriel Ryan
@ 2022-08-22 23:42       ` Paul Moore
  0 siblings, 0 replies; 4+ messages in thread
From: Paul Moore @ 2022-08-22 23:42 UTC (permalink / raw)
  To: Gabriel Ryan; +Cc: abhishek.shah, eparis, linux-audit, linux-kernel

On Mon, Aug 22, 2022 at 4:09 PM Gabriel Ryan <gabe@cs.columbia.edu> wrote:
>
> Hi Paul,
>
> Thanks for taking the time to review our report! It sounds like there
> aren't severe negative impacts and patching to eliminate the race
> would impose unnecessary performance penalties so we'll mark this as
> benign for future reference.

I just want to reiterate that if you are seeing a serious problem
please let us know and we'll work with you to find a fix.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-08-22 23:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAEHB248ft0LFUQDHNtB9NN_D3F=12Jndh13Ue=LokajXNhMk5Q@mail.gmail.com>
2022-08-19  1:59 ` data-race in audit_log_start / audit_receive Paul Moore
2022-08-19 12:06   ` Paul Moore
2022-08-22 20:09     ` Gabriel Ryan
2022-08-22 23:42       ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).