All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Hangbin Liu <liuhangbin@gmail.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Magnus Karlsson <magnus.karlsson@gmail.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
Date: Fri, 11 Jun 2021 00:27:52 +0200	[thread overview]
Message-ID: <874ke5we1j.fsf@toke.dk> (raw)
In-Reply-To: <c5192ab3-1c05-8679-79f2-59d98299095b@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> Hi Paul,
>
> On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
>> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>> XDP programs are called from a NAPI poll context, which means the RCU
>>> reference liveness is ensured by local_bh_disable(). Add
>>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
>>> lockdep understands that the dereferences are safe from inside *either* an
>>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
>>> preparation for removing the redundant rcu_read_lock()s from the drivers.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>   kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>>>   kernel/bpf/helpers.c  |  6 +++---
>>>   kernel/bpf/lpm_trie.c |  6 ++++--
>>>   3 files changed, 21 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index 6f6681b07364..72c58cc516a3 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>>>          struct htab_elem *l;
>>>          u32 hash, key_size;
>>>
>>> -       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
>>> +       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
>>> +                    !rcu_read_lock_bh_held());
>> 
>> It's not clear to me whether rcu_read_lock_held() is still needed.
>> All comments sound like rcu_read_lock_bh_held() is a superset of rcu
>> that includes bh.
>> But reading rcu source code it looks like RCU_BH is its own rcu flavor...
>> which is confusing.
>
> The series is a bit confusing to me as well. I recall we had a discussion with
> Paul, but it was back in 2016 aka very early days of XDP to get some clarifications
> about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the
> below is not true anymore, and in this case (since we're removing rcu_read_lock()
> from drivers), the RCU-bh acts as a real superset?
>
> Back then from your clarifications this was not the case:
>
>    On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote:
>    > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney
>    > <paulmck@linux.vnet.ibm.com> wrote:
>    [...]
>    >>> The crux of the question is whether a particular driver rx handler, when
>    >>> called from __do_softirq, needs to add an additional rcu_read_lock or
>    >>> whether it can rely on the mechanics of softirq.
>    >>
>    >> If it was rcu_read_lock_bh(), you could.
>    >>
>    >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(),
>    >> which means that you absolutely cannot rely on softirq semantics.
>    >>
>    >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks()
>    >> will notice that there is no rcu_read_lock() in effect and report
>    >> a quiescent state for that CPU.  Because rcu_preempt_check_callbacks()
>    >> is invoked from the scheduling-clock interrupt, it absolutely can
>    >> execute during do_softirq(), and therefore being in softirq context
>    >> in no way provides rcu_read_lock()-style protection.
>    >>
>    >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels.  However, in
>    >> that case, rcu_read_lock() and rcu_read_unlock() generate no code
>    >> in recent production kernels, so there is no performance penalty for
>    >> using them.  (In older kernels, they implied a barrier().)
>    >>
>    >> So either way, with or without CONFIG_PREEMPT, you should use
>    >> rcu_read_lock() to get RCU protection.
>    >>
>    >> One alternative might be to switch to rcu_read_lock_bh(), but that
>    >> will add local_disable_bh() overhead to your read paths.
>    >>
>    >> Does that help, or am I missing the point of the question?
>    >
>    > thanks a lot for explanation.
>
>    Glad you liked it!
>
>    > I mistakenly assumed that _bh variants are 'stronger' and
>    > act as inclusive, but sounds like they're completely orthogonal
>    > especially with preempt_rcu=y.
>
>    Yes, they are pretty much orthogonal.
>
>    > With preempt_rcu=n and preempt=y, it would be the case, since
>    > bh disables preemption and rcu_read_lock does the same as well,
>    > right? Of course, the code shouldn't be relying on that, so we
>    > have to fix our stuff.
>
>    Indeed, especially given that the kernel currently won't allow you
>    to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y.  If it does,
>    please let me know, as that would be a bug that needs to be fixed.
>    (For one thing, I do not test that combination.)
>
> 							Thanx, Paul
>
> And now, fast-forward again to 2021 ... :)

We covered this in the thread I linked from the cover letter.
Specifically, this seems to have been a change from v4.20, see Paul's
reply here:
https://lore.kernel.org/bpf/20210417002301.GO4212@paulmck-ThinkPad-P17-Gen-1/

and the follow-up covering -rt here:
https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

-Toke


  reply	other threads:[~2021-06-10 22:27 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
2021-06-10 18:38   ` Alexei Starovoitov
2021-06-10 21:24     ` Daniel Borkmann
2021-06-10 22:27       ` Toke Høiland-Jørgensen [this message]
2021-06-10 19:33   ` Martin KaFai Lau
2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
2021-06-10 19:37   ` Martin KaFai Lau
2021-06-10 23:05     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
2021-06-10 21:09   ` Martin KaFai Lau
2021-06-10 23:19     ` Toke Høiland-Jørgensen
2021-06-10 23:32       ` Martin KaFai Lau
2021-06-10 23:41         ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
2021-06-09 13:57   ` Paul E. McKenney
2021-06-09 10:33 ` [PATCH bpf-next 06/17] bnxt: " Toke Høiland-Jørgensen
2021-06-09 13:58   ` Paul E. McKenney
2021-06-10  8:47     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 07/17] thunderx: " Toke Høiland-Jørgensen
2021-06-09 10:33   ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 08/17] freescale: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 09/17] net: intel: " Toke Høiland-Jørgensen
2021-06-09 10:33   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-06-09 10:33 ` [PATCH bpf-next 10/17] marvell: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
2021-06-10  7:10   ` Tariq Toukan
2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
2021-06-11 16:30   ` Simon Horman
2021-06-09 10:33 ` [PATCH bpf-next 13/17] qede: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
2021-06-09 12:15   ` Edward Cree
2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
2021-06-10  5:30   ` Ilias Apalodimas
2021-06-09 10:33 ` [PATCH bpf-next 16/17] stmmac: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
2021-06-09 17:04   ` Grygorii Strashko
2021-06-10  0:18 ` [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874ke5we1j.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=liuhangbin@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.