netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Gospodarek <andrew.gospodarek@broadcom.com>
To: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>,
	ast@kernel.org, daniel@iogearbox.net, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, john.fastabend@gmail.com,
	andrii@kernel.org, kafai@fb.com, songliubraving@fb.com,
	yhs@fb.com, kpsingh@kernel.org, toke@redhat.com,
	lorenzo.bianconi@redhat.com, netdev@vger.kernel.org,
	bpf@vger.kernel.org, Jesper Dangaard Brouer <brouer@redhat.com>,
	Ilias Apalodimas <ilias.apalodimas@linaro.org>,
	Lorenzo Bianconi <lorenzo@kernel.org>,
	gal@nvidia.com, Saeed Mahameed <saeedm@nvidia.com>,
	tariqt@nvidia.com
Subject: Re: [PATCH net-next v2] samples/bpf: fixup some tools to be able to support xdp multibuffer
Date: Thu, 5 Jan 2023 11:18:06 -0500	[thread overview]
Message-ID: <Y7b4Pj0ASpV7Z8TS@C02YVCJELVCG.dhcp.broadcom.net> (raw)
In-Reply-To: <40fd78fc-2bb1-8eed-0b64-55cb3db71664@gmail.com>

On Tue, Jan 03, 2023 at 02:55:22PM +0200, Tariq Toukan wrote:
> 
> 
> On 21/06/2022 20:54, Andy Gospodarek wrote:
> > This changes the section name for the bpf program embedded in these
> > files to "xdp.frags" to allow the programs to be loaded on drivers that
> > are using an MTU greater than PAGE_SIZE.  Rather than directly accessing
> > the buffers, the packet data is now accessed via xdp helper functions to
> > provide an example for those who may need to write more complex
> > programs.
> > 
> > v2: remove new unnecessary variable
> > 
> 
> Hi,
> 
> I'm trying to understand if there are any assumptions/requirements on the
> length of the xdp_buf linear part when passed to XDP multi-buf programs?
> Can the linear part be empty, with all data residing in the fragments? Is it
> valid?

That's a great question.  The implementation in bnxt_en was based on the
implementation as I understood it in mvneta where the linear area
contained approx the first 4k of data - xdp headroom - dma_offset.  This
means that you have something that looks like this with a 9k MTU:

skb->data	[~3.6k of packet data]
skb->frag[0]	[4k of paket data]
     frag[1]	[remainder of packet data]

At some point, I'd like to take the opportunity to test something like
this:

skb->data	[header only + space for header expansion]
skb->frag[0]	[first 4k of data]
     frag[1]	[second 4k of data]
     frag[2]	[remainder of packet data]

Though this will use a bit more memory, I think it will be much more
performant for data that is ultimately consumed rather than forwarded
by the host as the actual packet data will be aligned on page boundaries.

With the ability to have packets that are handled by an XDP program
span buffers, I would also like to test out whether or not it would be
worthwhile to have standard MTU packets also look like this:

skb->data	[header only + space for header expansion]
skb->frag[0]	[packet data]

I think the overall system performance would be better in the XDP_PASS
case, but until there is data to back this up, that's just speculation. 

> Per the proposed pattern below (calling bpf_xdp_load_bytes() to memcpy
> packet data into a local buffer), no such assumption is required, and an
> xdp_buf created by the driver with an empty linear part is valid.
> 
> However, in the _xdp_tx_iptunnel example program, it fails (returns
> XDP_DROP) in case the headers are not in the linear part.
> 
> Regards,
> Tariq
> 
> > Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
> > Acked-by: John Fastabend <john.fastabend@gmail.com>
> > Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >   samples/bpf/xdp1_kern.c            | 11 ++++++++---
> >   samples/bpf/xdp2_kern.c            | 11 ++++++++---
> >   samples/bpf/xdp_tx_iptunnel_kern.c |  2 +-
> >   3 files changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/samples/bpf/xdp1_kern.c b/samples/bpf/xdp1_kern.c
> > index f0c5d95084de..0a5c704badd0 100644
> > --- a/samples/bpf/xdp1_kern.c
> > +++ b/samples/bpf/xdp1_kern.c
> > @@ -39,11 +39,13 @@ static int parse_ipv6(void *data, u64 nh_off, void *data_end)
> >   	return ip6h->nexthdr;
> >   }
> > -SEC("xdp1")
> > +#define XDPBUFSIZE	64
> > +SEC("xdp.frags")
> >   int xdp_prog1(struct xdp_md *ctx)
> >   {
> > -	void *data_end = (void *)(long)ctx->data_end;
> > -	void *data = (void *)(long)ctx->data;
> > +	__u8 pkt[XDPBUFSIZE] = {};
> > +	void *data_end = &pkt[XDPBUFSIZE-1];
> > +	void *data = pkt;
> >   	struct ethhdr *eth = data;
> >   	int rc = XDP_DROP;
> >   	long *value;
> > @@ -51,6 +53,9 @@ int xdp_prog1(struct xdp_md *ctx)
> >   	u64 nh_off;
> >   	u32 ipproto;
> > +	if (bpf_xdp_load_bytes(ctx, 0, pkt, sizeof(pkt)))
> > +		return rc;
> > +
> >   	nh_off = sizeof(*eth);
> >   	if (data + nh_off > data_end)
> >   		return rc;
> > diff --git a/samples/bpf/xdp2_kern.c b/samples/bpf/xdp2_kern.c
> > index d8a64ab077b0..3332ba6bb95f 100644
> > --- a/samples/bpf/xdp2_kern.c
> > +++ b/samples/bpf/xdp2_kern.c
> > @@ -55,11 +55,13 @@ static int parse_ipv6(void *data, u64 nh_off, void *data_end)
> >   	return ip6h->nexthdr;
> >   }
> > -SEC("xdp1")
> > +#define XDPBUFSIZE	64
> > +SEC("xdp.frags")
> >   int xdp_prog1(struct xdp_md *ctx)
> >   {
> > -	void *data_end = (void *)(long)ctx->data_end;
> > -	void *data = (void *)(long)ctx->data;
> > +	__u8 pkt[XDPBUFSIZE] = {};
> > +	void *data_end = &pkt[XDPBUFSIZE-1];
> > +	void *data = pkt;
> >   	struct ethhdr *eth = data;
> >   	int rc = XDP_DROP;
> >   	long *value;
> > @@ -67,6 +69,9 @@ int xdp_prog1(struct xdp_md *ctx)
> >   	u64 nh_off;
> >   	u32 ipproto;
> > +	if (bpf_xdp_load_bytes(ctx, 0, pkt, sizeof(pkt)))
> > +		return rc;
> > +
> >   	nh_off = sizeof(*eth);
> >   	if (data + nh_off > data_end)
> >   		return rc;
> > diff --git a/samples/bpf/xdp_tx_iptunnel_kern.c b/samples/bpf/xdp_tx_iptunnel_kern.c
> > index 575d57e4b8d6..0e2bca3a3fff 100644
> > --- a/samples/bpf/xdp_tx_iptunnel_kern.c
> > +++ b/samples/bpf/xdp_tx_iptunnel_kern.c
> > @@ -212,7 +212,7 @@ static __always_inline int handle_ipv6(struct xdp_md *xdp)
> >   	return XDP_TX;
> >   }
> > -SEC("xdp_tx_iptunnel")
> > +SEC("xdp.frags")
> >   int _xdp_tx_iptunnel(struct xdp_md *xdp)
> >   {
> >   	void *data_end = (void *)(long)xdp->data_end;

      parent reply	other threads:[~2023-01-05 16:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-21 17:54 [PATCH net-next v2] samples/bpf: fixup some tools to be able to support xdp multibuffer Andy Gospodarek
2022-06-22  2:00 ` patchwork-bot+netdevbpf
2023-01-03 12:55 ` Tariq Toukan
2023-01-03 15:19   ` Toke Høiland-Jørgensen
2023-01-04  1:21     ` Jakub Kicinski
2023-01-04  8:44       ` Lorenzo Bianconi
2023-01-04 12:28         ` Toke Høiland-Jørgensen
2023-01-05  1:17           ` Jakub Kicinski
2023-01-05  7:20           ` Tariq Toukan
2023-01-05 15:43             ` Toke Høiland-Jørgensen
2023-01-05 16:57               ` Andy Gospodarek
2023-01-05 18:16                 ` Jakub Kicinski
2023-01-06 13:56                   ` Andy Gospodarek
2023-01-08 12:33                   ` Tariq Toukan
     [not found]                   ` <8369e348-a8ec-cb10-f91f-4277e5041a27@nvidia.com>
2023-01-08 12:42                     ` Tariq Toukan
2023-01-09 13:50                       ` Toke Høiland-Jørgensen
2023-01-05 22:07                 ` Toke Høiland-Jørgensen
2023-01-06 17:54                   ` Toke Høiland-Jørgensen
2023-01-05 16:22       ` Andy Gospodarek
2023-01-10 20:59       ` Maxim Mikityanskiy
2023-01-13 21:07         ` Tariq Toukan
2023-01-25 12:49           ` Tariq Toukan
2023-01-05 16:18   ` Andy Gospodarek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7b4Pj0ASpV7Z8TS@C02YVCJELVCG.dhcp.broadcom.net \
    --to=andrew.gospodarek@broadcom.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=gal@nvidia.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=lorenzo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=songliubraving@fb.com \
    --cc=tariqt@nvidia.com \
    --cc=toke@redhat.com \
    --cc=ttoukan.linux@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).