bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn.topel@intel.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org
Cc: magnus.karlsson@intel.com, maciej.fijalkowski@intel.com,
	kuba@kernel.org, jonathan.lemon@gmail.com, maximmi@nvidia.com,
	davem@davemloft.net, hawk@kernel.org, john.fastabend@gmail.com,
	ciara.loftus@intel.com, weqaar.a.janjua@intel.com
Subject: Re: [PATCH bpf-next v2 1/8] xdp: restructure redirect actions
Date: Wed, 20 Jan 2021 16:49:22 +0100	[thread overview]
Message-ID: <0a7d1a0b-de2e-b973-a807-b9377bb89737@intel.com> (raw)
In-Reply-To: <87mtx34q48.fsf@toke.dk>

On 2021-01-20 15:52, Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@intel.com> writes:
> 
>> On 2021-01-20 13:44, Toke Høiland-Jørgensen wrote:
>>> Björn Töpel <bjorn.topel@gmail.com> writes:
>>>
>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>
>>>> The XDP_REDIRECT implementations for maps and non-maps are fairly
>>>> similar, but obviously need to take different code paths depending on
>>>> if the target is using a map or not. Today, the redirect targets for
>>>> XDP either uses a map, or is based on ifindex.
>>>>
>>>> Future commits will introduce yet another redirect target via the a
>>>> new helper, bpf_redirect_xsk(). To pave the way for that, we introduce
>>>> an explicit redirect type to bpf_redirect_info. This makes the code
>>>> easier to follow, and makes it easier to add new redirect targets.
>>>>
>>>> Further, using an explicit type in bpf_redirect_info has a slight
>>>> positive performance impact by avoiding a pointer indirection for the
>>>> map type lookup, and instead use the hot cacheline for
>>>> bpf_redirect_info.
>>>>
>>>> The bpf_redirect_info flags member is not used by XDP, and not
>>>> read/written any more. The map member is only written to when
>>>> required/used, and not unconditionally.
>>>
>>> I like the simplification. However, the handling of map clearing becomes
>>> a bit murky with this change:
>>>
>>> You're not changing anything in bpf_clear_redirect_map(), and you're
>>> removing most of the reads and writes of ri->map. Instead,
>>> bpf_xdp_redirect_map() will store the bpf_dtab_netdev pointer in
>>> ri->tgt_value, which xdp_do_redirect() will just read and use without
>>> checking. But if the map element (or the entire map) has been freed in
>>> the meantime that will be a dangling pointer. I *think* the RCU callback
>>> in dev_map_delete_elem() and the rcu_barrier() in dev_map_free()
>>> protects against this, but that is by no means obvious. So confirming
>>> this, and explaining it in a comment would be good.
>>>
>>
>> Yes, *most* of the READ_ONCE(ri->map) are removed, it's pretty much only
>> the bpf_redirect_map(), and as you write, the tracepoints.
>>
>> The content/element of the map is RCU protected, and actually even the
>> map will be around until the XDP processing is complete. Note the
>> synchronize_rcu() followed after all bpf_clear_redirect_map() calls.
>>
>> I'll try to make it clearer in the commit message! Thanks for pointing
>> that out!
>>
>>> Also, as far as I can tell after this, ri->map is only used for the
>>> tracepoint. So how about just storing the map ID and getting rid of the
>>> READ/WRITE_ONCE() entirely?
>>>
>>
>> ...and the bpf_redirect_map() helper. Don't you think the current
>> READ_ONCE(ri->map) scheme is more obvious/clear?
> 
> Yeah, after your patch we WRITE_ONCE() the pointer in
> bpf_redirect_map(), but the only place it is actually *read* is in the
> tracepoint. So the only purpose of bpf_clear_redirect_map() is to ensure
> that an invalid pointer is not read in the tracepoint function. Which
> seems a bit excessive when we could just store the map ID for direct use
> in the tracepoint and get rid of bpf_clear_redirect_map() entirely, no?
> 
> Besides, from a UX point of view, having the tracepoint display the map
> ID even if that map ID is no longer valid seems to me like it makes more
> sense than just displaying a map ID of 0 and leaving it up to the user
> to figure out that this is because the map was cleared. I mean, at the
> time the redirect was made, that *was* the map ID that was used...
>

Convinced! Getting rid of bpf_clear_redirect_map() would be good! I'll
take a stab at this for v3!


> Oh, and as you say due to the synchronize_rcu() call in dev_map_free() I
> think this whole discussion is superfluous anyway, since it can't
> actually happen that the map gets freed between the setting and reading
> of ri->map, no?
>

It can't be free'd but, ri->map can be cleared via
bpf_clear_redirect_map(). So, between the helper (setting) and the
tracepoint in xdp_do_redirect() it can be cleared (say if the XDP
program is swapped out, prior running xdp_do_redirect()).

Moving to the scheme you suggested, does make the discussion
superfluous. :-)


Thanks for the input!
Björn


  reply	other threads:[~2021-01-20 17:02 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 15:50 [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 1/8] xdp: restructure redirect actions Björn Töpel
2021-01-20 12:44   ` Toke Høiland-Jørgensen
2021-01-20 13:40     ` Björn Töpel
2021-01-20 14:52       ` Toke Høiland-Jørgensen
2021-01-20 15:49         ` Björn Töpel [this message]
2021-01-20 16:30           ` Toke Høiland-Jørgensen
2021-01-20 17:26             ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 2/8] xsk: remove explicit_free parameter from __xsk_rcv() Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 3/8] xsk: fold xp_assign_dev and __xp_assign_dev Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 4/8] xsk: register XDP sockets at bind(), and add new AF_XDP BPF helper Björn Töpel
2021-01-20  8:25   ` kernel test robot
2021-01-20  8:41     ` Björn Töpel
2021-01-20  8:50   ` kernel test robot
2021-01-20 12:50   ` Toke Høiland-Jørgensen
2021-01-20 13:25     ` Björn Töpel
2021-01-20 14:54       ` Toke Høiland-Jørgensen
2021-01-20 15:18         ` Björn Töpel
2021-01-20 17:29           ` Toke Høiland-Jørgensen
2021-01-20 18:22             ` Björn Töpel
2021-01-20 20:26               ` Toke Høiland-Jørgensen
2021-01-20 21:15                 ` Alexei Starovoitov
2021-01-21  8:18                   ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 5/8] libbpf, xsk: select AF_XDP BPF program based on kernel version Björn Töpel
2021-01-20 12:52   ` Toke Høiland-Jørgensen
2021-01-20 13:25     ` Björn Töpel
2021-01-20 14:49       ` Björn Töpel
2021-01-20 15:11         ` Toke Høiland-Jørgensen
2021-01-20 15:27           ` Björn Töpel
2021-01-20 17:30             ` Toke Høiland-Jørgensen
2021-01-20 18:25             ` Alexei Starovoitov
2021-01-20 18:30               ` Björn Töpel
2021-01-20 14:56       ` Toke Høiland-Jørgensen
2021-01-19 15:50 ` [PATCH bpf-next v2 6/8] libbpf, xsk: select bpf_redirect_xsk(), if supported Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 7/8] selftest/bpf: add XDP socket tests for bpf_redirect_{xsk, map}() Björn Töpel
2021-01-21  7:39   ` Andrii Nakryiko
2021-01-21 12:31     ` Björn Töpel
2021-01-19 15:50 ` [PATCH bpf-next v2 8/8] selftest/bpf: remove a lot of ifobject casting in xdpxceiver Björn Töpel
2021-01-20 13:15 ` [PATCH bpf-next v2 0/8] Introduce bpf_redirect_xsk() helper Maxim Mikityanskiy
2021-01-20 13:27   ` Björn Töpel
2021-01-20 15:57   ` Jesper Dangaard Brouer
2021-01-20 16:19     ` Maciej Fijalkowski
2021-01-21 17:01       ` Jesper Dangaard Brouer
2021-01-22  8:59         ` Magnus Karlsson
2021-01-22  9:45           ` Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a7d1a0b-de2e-b973-a807-b9377bb89737@intel.com \
    --to=bjorn.topel@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=ciara.loftus@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=maximmi@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    --cc=weqaar.a.janjua@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).