All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Thomas Ptacek" <thomas@sockpuppet.org>,
	"Adhipati Blambangan" <adhipati@tuta.io>,
	"David Ahern" <dsahern@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"David S . Miller" <davem@davemloft.net>,
	brouer@redhat.com
Subject: Re: [PATCH net v6] net: xdp: account for layer 3 packets in generic skb handler
Date: Wed, 19 Aug 2020 09:07:56 +0200	[thread overview]
Message-ID: <20200819090756.07f76eb9@carbon> (raw)
In-Reply-To: <20200815074102.5357-1-Jason@zx2c4.com>

On Sat, 15 Aug 2020 09:41:02 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> A user reported that packets from wireguard were possibly ignored by XDP
> [1]. Another user reported that modifying packets from layer 3
> interfaces results in impossible to diagnose drops.
> 
> Apparently, the generic skb xdp handler path seems to assume that
> packets will always have an ethernet header, which really isn't always
> the case for layer 3 packets, which are produced by multiple drivers.
> This patch fixes the oversight. If the mac_len is 0 and so is
> hard_header_len, then we know that the skb is a layer 3 packet, and in
> that case prepend a pseudo ethhdr to the packet whose h_proto is copied
> from skb->protocol, which will have the appropriate v4 or v6 ethertype.
> This allows us to keep XDP programs' assumption correct about packets
> always having that ethernet header, so that existing code doesn't break,
> while still allowing layer 3 devices to use the generic XDP handler.
> 
> We push on the ethernet header and then pull it right off and set
> mac_len to the ethernet header size, so that the rest of the XDP code
> does not need any changes. That is, it makes it so that the skb has its
> ethernet header just before the data pointer, of size ETH_HLEN.
> 
> Previous discussions have included the point that maybe XDP should just
> be intentionally broken on layer 3 interfaces, by design, and that layer
> 3 people should be using cls_bpf. However, I think there are good
> grounds to reconsider this perspective:
> 
> - Complicated deployments wind up applying XDP modifications to a
>   variety of different devices on a given host, some of which are using
>   specialized ethernet cards and other ones using virtual layer 3
>   interfaces, such as WireGuard. Being able to apply one codebase to
>   each of these winds up being essential.
> 
> - cls_bpf does not support the same feature set as XDP, and operates at
>   a slightly different stage in the networking stack. You may reply,
>   "then add all the features you want to cls_bpf", but that seems to be
>   missing the point, and would still result in there being two ways to
>   do everything, which is not desirable for anyone actually _using_ this
>   code.
> 
> - While XDP was originally made for hardware offloading, and while many
>   look disdainfully upon the generic mode, it nevertheless remains a
>   highly useful and popular way of adding bespoke packet
>   transformations, and from that perspective, a difference between layer
>   2 and layer 3 packets is immaterial if the user is primarily concerned
>   with transformations to layer 3 and beyond.
> 
> - It's not impossible to imagine layer 3 hardware (e.g. a WireGuard PCIe
>   card) including eBPF/XDP functionality built-in. In that case, why
>   limit XDP as a technology to only layer 2? Then, having generic XDP
>   work for layer 3 would naturally fit as well.
> 
> [1] https://lore.kernel.org/wireguard/M5WzVK5--3-2@tuta.io/
> 
> Reported-by: Thomas Ptacek <thomas@sockpuppet.org>
> Reported-by: Adhipati Blambangan <adhipati@tuta.io>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Toke Høiland-Jørgensen <toke@redhat.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Jesper Dangaard Brouer <brouer@redhat.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> 
> I had originally dropped this patch, but the issue kept coming up in
> user reports, so here's a v4 of it. Testing of it is still rather slim,
> but hopefully that will change in the coming days.
> 
> Changes v5->v6:
> - The fix to the skb->protocol changing case is now in a separate
>   stand-alone patch, and removed from this one, so that it can be
>   evaluated separately.
> 
> Changes v4->v5:
> - Rather than tracking in a messy manner whether the skb is l3, we just
>   do the check once, and then adjust the skb geometry to be identical to
>   the l2 case. This simplifies the code quite a bit.
> - Fix a preexisting bug where the l2 header remained attached if
>   skb->protocol was updated.
> 
> Changes v3->v4:
> - We now preserve the same logic for XDP_TX/XDP_REDIRECT as before.
> - hard_header_len is checked in addition to mac_len.
> 
>  net/core/dev.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 151f1651439f..79c15f4244e6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4630,6 +4630,18 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>  	 * header.
>  	 */
>  	mac_len = skb->data - skb_mac_header(skb);
> +	if (!mac_len && !skb->dev->hard_header_len) {
> +		/* For l3 packets, we push on a fake mac header, and then
> +		 * pull it off again, so that it has the same skb geometry
> +		 * as for the l2 case.
> +		 */
> +		eth = skb_push(skb, ETH_HLEN);
> +		eth_zero_addr(eth->h_source);
> +		eth_zero_addr(eth->h_dest);
> +		eth->h_proto = skb->protocol;
> +		__skb_pull(skb, ETH_HLEN);
> +		mac_len = ETH_HLEN;
> +	}

You are consuming a little bit of the headroom here.

>  	hlen = skb_headlen(skb) + mac_len;
>  	xdp->data = skb->data - mac_len;
>  	xdp->data_meta = xdp->data;

The XDP-prog is allowed to change eth->h_proto.  Later (in code) we
detect this and update skb->protocol with the new protocol.

What will happen if my XDP-prog adds a VLAN header?

The selftest tools/testing/selftests/bpf/test_xdp_vlan.sh test these
situations.  You can use it as an example, and write/construct a test
that does the same for your Wireguard devices.  As minimum you need to
provide such a selftest together with this patch.

Generally speaking, IMHO generic-XDP was a mistake, because it is hard
to maintain and slows down the development of XDP.  (I have a number of
fixes in my TODO backlog for generic-XDP).  Adding this will just give
us more corner-cases that need to be maintained.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2020-08-19  7:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-13 19:58 [PATCH net v4] net: xdp: account for layer 3 packets in generic skb handler Jason A. Donenfeld
2020-08-13 21:01 ` Jakub Kicinski
2020-08-14  6:56   ` Jason A. Donenfeld
2020-08-14  7:30     ` [PATCH net v5] " Jason A. Donenfeld
2020-08-14 20:55       ` David Miller
2020-08-14 20:57         ` Jason A. Donenfeld
2020-08-15  7:41         ` [PATCH net v6] " Jason A. Donenfeld
2020-08-19  7:07           ` Jesper Dangaard Brouer [this message]
2020-08-19 23:22           ` David Miller
2020-08-20  9:13             ` Jason A. Donenfeld
2020-08-20 18:55               ` David Miller
2020-08-20 20:29                 ` Jason A. Donenfeld
2020-08-14 15:31     ` [PATCH net v4] " Jakub Kicinski
2020-08-14 21:04       ` Jason A. Donenfeld
2020-08-14 21:26         ` David Miller
2020-08-15  7:54           ` Jason A. Donenfeld
2020-08-14 21:14       ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200819090756.07f76eb9@carbon \
    --to=brouer@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=adhipati@tuta.io \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=thomas@sockpuppet.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.