All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Björn Töpel" <bjorn.topel@intel.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Cc: magnus.karlsson@intel.com, maciej.fijalkowski@intel.com,
	kuba@kernel.org, jonathan.lemon@gmail.com, maximmi@nvidia.com,
	davem@davemloft.net, hawk@kernel.org, john.fastabend@gmail.com,
	ciara.loftus@intel.com, weqaar.a.janjua@intel.com
Subject: Re: [PATCH bpf-next v2 1/8] xdp: restructure redirect actions
Date: Wed, 20 Jan 2021 17:30:09 +0100	[thread overview]
Message-ID: <87bldj4lm6.fsf@toke.dk> (raw)
In-Reply-To: <0a7d1a0b-de2e-b973-a807-b9377bb89737@intel.com>

Björn Töpel <bjorn.topel@intel.com> writes:

> On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote:
>> Björn Töpel <bjorn.topel@intel.com> writes:
>> 
>>> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote:
>>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>>
>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>
>>>>> The XDP_REDIRECT implementations for maps and non-maps are fairly
>>>>> similar, but obviously need to take different code paths depending on
>>>>> if the target is using a map or not. Today, the redirect targets for
>>>>> XDP either uses a map, or is based on ifindex.
>>>>>
>>>>> Future commits will introduce yet another redirect target via the a
>>>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce
>>>>> an explicit redirect type to bpf_redirect_info. This makes the code
>>>>> easier to follow, and makes it easier to add new redirect targets.
>>>>>
>>>>> Further, using an explicit type in bpf_redirect_info has a slight
>>>>> positive performance impact by avoiding a pointer indirection for the
>>>>> map type lookup, and instead use the hot cacheline for
>>>>> bpf_redirect_info.
>>>>>
>>>>> The bpf_redirect_info flags member is not used by XDP, and not
>>>>> read/written any more. The map member is only written to when
>>>>> required/used, and not unconditionally.
>>>>
>>>> I like the simplification. However, the handling of map clearing becomes
>>>> a bit murky with this change:
>>>>
>>>> You're not changing anything in bpf_clear_redirect_map(), and you're
>>>> removing most of the reads and writes of ri->map. Instead,
>>>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in
>>>> ri->tgt_value, which xdp_do_redirect() will just read and use without
>>>> checking. But if the map element (or the entire map) has been freed in
>>>> the meantime that will be a dangling pointer. I *think* the RCU callback
>>>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free()
>>>> protects against this, but that is by no means obvious. So confirming
>>>> this, and explaining it in a comment would be good.
>>>>
>>>
>>> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only
>>> the bpf_redirect_map(), and as you write, the tracepoints.
>>>
>>> The content/element of the map is RCU protected, and actually even the
>>> map will be around until the XDP processing is complete. Note the
>>> synchronize_rcu() followed after all bpf_clear_redirect_map() calls.
>>>
>>> I'll try to make it clearer in the commit message! Thanks for pointing
>>> that out!
>>>
>>>> Also, as far as I can tell after this, ri->map is only used for the
>>>> tracepoint. So how about just storing the map ID and getting rid of the
>>>> READ/WRITE_ONCE() entirely?
>>>>
>>>
>>> ...and the bpf_redirect_map() helper. Don't you think the current
>>> READ_ONCE(ri->map) scheme is more obvious/clear?
>> 
>> Yeah, after your patch we WRITE_ONCE() the pointer in
>> bpf_redirect_map(), but the only place it is actually *read* is in the
>> tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure
>> that an invalid pointer is not read in the tracepoint function. Which
>> seems a bit excessive when we could just store the map ID for direct use
>> in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no?
>> 
>> Besides, from a UX point of view, having the tracepoint display the map
>> ID even if that map ID is no longer valid seems to me like it makes more
>> sense than just displaying a map ID of 0 and leaving it up to the user
>> to figure out that this is because the map was cleared. I mean, at the
>> time the redirect was made, that *was* the map ID that was used...
>>
>
> Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll
> take a stab at this for v3!

Cool!

>> Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I
>> think this whole discussion is superfluous anyway, since it can't
>> actually happen that the map gets freed between the setting and reading
>> of ri->map, no?
>>
>
> It can't be free'd but, ri->map can be cleared via
> bpf_clear_redirect_map(). So, between the helper (setting) and the
> tracepoint in xdp_do_redirect() it can be cleared (say if the XDP
> program is swapped out, prior running xdp_do_redirect()).

But xdp_do_redirect() should be called on driver flush before exiting
the NAPI cycle, so how can the XDP program be swapped out?

> Moving to the scheme you suggested, does make the discussion
> superfluous. :-)

Yup, awesome :)

-Toke


  reply	other threads:[~2021-01-20 16:38 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 15:50 [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 1/8] xdp: restructure redirect actions Björn Töpel
2021-01-20 12:44   ` Toke Høiland-Jørgensen
2021-01-20 13:40     ` Björn Töpel
2021-01-20 14:52       ` Toke Høiland-Jørgensen
2021-01-20 15:49         ` Björn Töpel
2021-01-20 16:30           ` Toke Høiland-Jørgensen [this message]
2021-01-20 17:26             ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 2/8] xsk: remove explicit_free parameter from __xsk_rcv() Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 3/8] xsk: fold xp_assign_dev and __xp_assign_dev Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 4/8] xsk: register XDP sockets at bind(), and add new AF_XDP BPF helper Björn Töpel
2021-01-20  8:25   ` kernel test robot
2021-01-20  8:25     ` kernel test robot
2021-01-20  8:41     ` Björn Töpel
2021-01-20  8:41       ` Björn Töpel
2021-01-20  8:50   ` kernel test robot
2021-01-20  8:50     ` kernel test robot
2021-01-20 12:50   ` Toke Høiland-Jørgensen
2021-01-20 13:25     ` Björn Töpel
2021-01-20 14:54       ` Toke Høiland-Jørgensen
2021-01-20 15:18         ` Björn Töpel
2021-01-20 17:29           ` Toke Høiland-Jørgensen
2021-01-20 18:22             ` Björn Töpel
2021-01-20 20:26               ` Toke Høiland-Jørgensen
2021-01-20 21:15                 ` Alexei Starovoitov
2021-01-21  8:18                   ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 5/8] libbpf, xsk: select AF_XDP BPF program based on kernel version Björn Töpel
2021-01-20 12:52   ` Toke Høiland-Jørgensen
2021-01-20 13:25     ` Björn Töpel
2021-01-20 14:49       ` Björn Töpel
2021-01-20 15:11         ` Toke Høiland-Jørgensen
2021-01-20 15:27           ` Björn Töpel
2021-01-20 17:30             ` Toke Høiland-Jørgensen
2021-01-20 18:25             ` Alexei Starovoitov
2021-01-20 18:30               ` Björn Töpel
2021-01-20 14:56       ` Toke Høiland-Jørgensen
2021-01-19 15:50 ` [PATCH bpf-next v2 6/8] libbpf, xsk: select bpf_redirect_xsk(), if supported Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 7/8] selftest/bpf: add XDP socket tests for bpf_redirect_{xsk, map}() Björn Töpel
2021-01-21  7:39   ` Andrii Nakryiko
2021-01-21 12:31     ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 8/8] selftest/bpf: remove a lot of ifobject casting in xdpxceiver Björn Töpel
2021-01-20 13:15 ` [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Maxim Mikityanskiy
2021-01-20 13:27   ` Björn Töpel
2021-01-20 15:57   ` Jesper Dangaard Brouer
2021-01-20 16:19     ` Maciej Fijalkowski
2021-01-21 17:01       ` Jesper Dangaard Brouer
2021-01-22  8:59         ` Magnus Karlsson
2021-01-22  9:45           ` Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bldj4lm6.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=ciara.loftus@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=maximmi@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=weqaar.a.janjua@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.