linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Beau Belgrave <beaub@linux.microsoft.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-trace-kernel@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	David Vernet <void@manifault.com>,
	dthaler@microsoft.com, brauner@kernel.org, hch@infradead.org
Subject: Re: [PATCH] tracing/user_events: Run BPF program if attached
Date: Wed, 17 May 2023 10:22:43 -0700	[thread overview]
Message-ID: <20230517172243.GA152@W11-BEAU-MD.localdomain> (raw)
In-Reply-To: <CAHk-=wh_GEr4ehJKwMM3UA0-7CfNpVH7v_T-=1u+gq9VZD70mw@mail.gmail.com>

On Tue, May 16, 2023 at 08:03:09PM -0700, Linus Torvalds wrote:
> On Tue, May 16, 2023 at 7:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > So this code path is very much in user context (called directly by a
> > write system call). The issue that Alexei had was that it's also in an
> > rcu_read_lock() section.
> >
> > I wonder if this all goes away if we switch to SRCU?
> 
> Yes, SRCU context would work.
> 
> That said, how critical is this code? Because honestly, the *sanest*
> thing to do is to just hold the lock that actually protects the list,
> not try to walk the list in any RCU mode.
> 
> And as far as I can tell, that's the 'event_mutex', which is already held.
> 
> RCU walking of a list is only meaningful when the walk doesn't need
> the lock that guarantees the list integrity.
> 
> But *modification* of a RCU-protected list still requires locking, and
> from a very cursory look, it really looks like 'event_mutex' is
> already the lock that protects the list.
> 
> So the whole use of RCU during the list walking there in
> user_event_enabler_update() _seems_ pointless. You hold event_mutex -
> user_event_enabler_write() that is called in the loop already has a
> lockdep assert to that effect.
> 
> So what is it that could even race and change the list that is the
> cause of that rcu-ness?
> 

Processes that fork() with previous user_events need to be duplicated.

The fork() paths do not acquire the event_mutex. In the middle of a fork
an event could become enabled/disabled, which would call this part of
the code, at that time the list is actively being appended to when we
try to update the bits.

> Other code in that file happily just does
> 
>         mutex_lock(&event_mutex);
> 
>         list_for_each_entry_safe(enabler, next, &mm->enablers, link)
> 
> with no RCU anywhere. Why does user_event_enabler_update() not do that?
> 

This is due to the fork() case above without taking the event_mutex. I
really tried to not cause fork() to stall if a process uses user_events.
This required using RCU, maybe there is a simpler approach:

One approach I can think of is that during fork() we don't add the newly
created mm to the global list until we copy all the enablers. The COW
pages should reflect the bits if a timing window occurs there, since I
believe it's impossible for the newly forked() mm to cause a COW during
that time. Then I can drop this RCU on the enablers.

> Oh, and even those other loops are a bit strange. Why do they use the
> "_safe" variant, even when they just traverse the list without
> changing it? Look at user_event_enabler_exists(), for example.
> 

The other places in the code that do this either will remove the event
depending on the situation during the for_each, or they only hold the
register lock and don't hold the event_mutex. So the disabler could get
removed out from under it.

IE: user_events_ioctl_reg() -> current_user_event_enabler_exists()

This is a place where we could just simply change to grab the
event_mutex, it's pretty isolated and we take the lock anyway further
down the path.

> I must really be missing something. That code is confusing. Or I am
> very confused.
> 
>             Linus

Thanks,
-Beau

  reply	other threads:[~2023-05-17 17:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-08 16:37 [PATCH] tracing/user_events: Run BPF program if attached Beau Belgrave
2023-05-09 15:24 ` Alexei Starovoitov
2023-05-09 17:01   ` Steven Rostedt
2023-05-09 20:30     ` Steven Rostedt
2023-05-09 20:42       ` Steven Rostedt
2023-05-15 16:57       ` Alexei Starovoitov
2023-05-15 18:33         ` Steven Rostedt
2023-05-15 19:35           ` Beau Belgrave
2023-05-15 21:38             ` Steven Rostedt
2023-05-15 19:24         ` Beau Belgrave
2023-05-15 21:57           ` Steven Rostedt
2023-05-17  0:36           ` Alexei Starovoitov
2023-05-17  0:56             ` Linus Torvalds
2023-05-17  1:46               ` Linus Torvalds
2023-05-17  2:29                 ` Steven Rostedt
2023-05-17  3:03                   ` Linus Torvalds
2023-05-17 17:22                     ` Beau Belgrave [this message]
2023-05-17 18:15                       ` Linus Torvalds
2023-05-17 19:07                         ` Beau Belgrave
2023-05-17 19:26                           ` Linus Torvalds
2023-05-17 19:36                             ` Beau Belgrave
2023-05-17 19:36                             ` Linus Torvalds
2023-05-17 19:37                               ` Linus Torvalds
2023-05-17 23:00                                 ` Beau Belgrave
2023-05-17 23:14                                   ` Linus Torvalds
2023-05-17 23:25                                     ` Steven Rostedt
2023-05-18  0:14                                       ` Beau Belgrave
2023-05-18  0:23                                         ` Linus Torvalds
2023-05-17 20:08                               ` Linus Torvalds
2023-05-17  1:26             ` Steven Rostedt
2023-05-17 16:50               ` Beau Belgrave
2023-05-18  0:10                 ` Alexei Starovoitov
2023-05-18  0:19                   ` Beau Belgrave
2023-05-18  0:56                     ` Alexei Starovoitov
2023-05-18  1:18                       ` Beau Belgrave
2023-05-18  2:08                         ` Steven Rostedt
2023-05-18  3:14                           ` Alexei Starovoitov
2023-05-18 13:36                             ` Steven Rostedt
2023-05-18 17:28                               ` Beau Belgrave
2023-06-01  9:46                   ` Christian Brauner
2023-06-01 15:24                     ` Beau Belgrave
2023-06-01 15:57                       ` Christian Brauner
2023-06-01 16:29                         ` Beau Belgrave
2023-06-06 13:37                           ` Masami Hiramatsu
2023-06-06 17:05                             ` Beau Belgrave
2023-06-07 14:07                               ` Masami Hiramatsu
2023-06-07 19:26                                 ` Beau Belgrave
2023-06-08  0:25                                   ` Masami Hiramatsu
2023-05-17 17:51             ` Beau Belgrave
2023-06-06 13:57             ` Masami Hiramatsu
2023-06-06 16:57               ` Andrii Nakryiko
2023-06-06 20:57                 ` Beau Belgrave

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230517172243.GA152@W11-BEAU-MD.localdomain \
    --to=beaub@linux.microsoft.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dthaler@microsoft.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).