Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: "Jonathan Lemon" <jonathan.lemon@gmail.com>
To: "Jesper Dangaard Brouer" <brouer@redhat.com>
Cc: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Machulsky, Zorik" <zorik@amazon.com>,
	"Jubran, Samih" <sameehj@amazon.com>,
	davem@davemloft.net, netdev@vger.kernel.org, "Woodhouse,
	David" <dwmw@amazon.co.uk>,
	"Matushevsky, Alexander" <matua@amazon.com>,
	"Bshara, Saeed" <saeedb@amazon.com>,
	"Wilson, Matt" <msw@amazon.com>,
	"Liguori, Anthony" <aliguori@amazon.com>,
	"Bshara, Nafea" <nafea@amazon.com>,
	"Tzalik, Guy" <gtzalik@amazon.com>,
	"Belgazal, Netanel" <netanel@amazon.com>,
	"Saidi, Ali" <alisaidi@amazon.com>,
	"Herrenschmidt, Benjamin" <benh@amazon.com>,
	"Kiyanovski, Arthur" <akiyano@amazon.com>,
	"Daniel Borkmann" <borkmann@iogearbox.net>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Jakub Kicinski" <jakub.kicinski@netronome.com>,
	xdp-newbies@vger.kernel.org
Subject: Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support)
Date: Thu, 27 Jun 2019 15:07:56 -0700
Message-ID: <0D1286E7-02CD-4252-823C-4D1CBB8F2807@gmail.com> (raw)
In-Reply-To: <20190626220028.2bb12196@carbon>

On 26 Jun 2019, at 13:00, Jesper Dangaard Brouer wrote:

> On Wed, 26 Jun 2019 09:42:07 -0700 "Jonathan Lemon" 
> <jonathan.lemon@gmail.com> wrote:
>
>> If all packets are collected together (like the bulk queue does), and
>> then passed to XDP, this could easily be made backwards compatible.
>> If the XDP program isn't 'multi-frag' aware, then each packet is just
>> passed in individually.
>
> My proposal#1 is XDP only access first-buffer[1], as this simplifies 
> things.
>
> (AFAIK) What you are proposing is that all the buffers are passed to
> the XDP prog (in form of a iovec).  I need some more details about 
> your
> suggestion.

I was thinking this over yesterday - and was probably conflating packets
and buffers a bit.  Suppose that for the purposes of this discussion, 
we're
talking about a single packet that is split over multiple buffer areas.

Say, on RX, with header split:
    buf[0] = header
    buf[1] = data

For LRO (hw recv) and jumbo frames (and TSO):
    buf[0] = hdr + data
    buf[1] = data
    buf[n] = data

GRO cases, where individual packets are reassembled by software, aren't
handled here.


> Specifically:
>
> - What is the semantic when a 3 buffer packet is input and XDP prog
> choose to return XDP_DROP for packet #2 ?
>
> - Same situation of packet #2 wants a XDP_TX or redirect?

The collection of buffers represents a single packet, so this isn't
applicable here, right?

However, just thinking about incomplete data words (aka: pullup) gives
me a headache - seems this would complicate the BPF/verifier quite a 
bit.

So perhaps just restricting things to the first entry would do for now?

As far as the exact data structure used to hold the buffers, it would
be nice if it had the same layout as a bio_vec, in case someone wanted
to get clever and start transferring things over directly.
-- 
Jonathan


>> Of course, passing in the equivalent of a iovec requires some form of
>> loop support on the BPF side, doesn't it?
>
> The data structure used for holding these packet buffers/segments also
> needs to be discussed.  I would either use an array of bio_vec[2] or
> skb_frag_t (aka skb_frag_struct).  The skb_frag_t would be most
> obvious, as we already have to write this when creating an SKB, in
> skb_shared_info area. (Structs listed below signature).
>
> The problem is also that size of these structs (16 bytes) per
> buffer/segment, and we likely need to support 17 segments, as this 
> need
> to be compatible with SKBs (size 272 bytes).
>
> My idea here is that we simply use the same memory area, that we have 
> to
> store skb_shared_info into.  As this allow us to get the SKB setup for
> free, when doing XDP_PASS or when doing SKB alloc after XDP_REDIRECT.
>
>
> [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#proposal1-xdp-only-access-first-buffer
>
> [2] 
> https://lore.kernel.org/netdev/20190501041757.8647-1-willy@infradead.org/
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>
>
> $ pahole -C skb_frag_struct vmlinux
> struct skb_frag_struct {
> 	struct {
> 		struct page * p;                         /*     0     8 */
> 	} page;                                          /*     0     8 */
> 	__u32                      page_offset;          /*     8     4 */
> 	__u32                      size;                 /*    12     4 */
>
> 	/* size: 16, cachelines: 1, members: 3 */
> 	/* last cacheline: 16 bytes */
> };
>
> $ pahole -C bio_vec vmlinux
> struct bio_vec {
> 	struct page        * bv_page;                    /*     0     8 */
> 	unsigned int               bv_len;               /*     8     4 */
> 	unsigned int               bv_offset;            /*    12     4 */
>
> 	/* size: 16, cachelines: 1, members: 3 */
> 	/* last cacheline: 16 bytes */
> };
>
> $ pahole -C skb_shared_info vmlinux
> struct skb_shared_info {
> 	__u8                       __unused;             /*     0     1 */
> 	__u8                       meta_len;             /*     1     1 */
> 	__u8                       nr_frags;             /*     2     1 */
> 	__u8                       tx_flags;             /*     3     1 */
> 	short unsigned int         gso_size;             /*     4     2 */
> 	short unsigned int         gso_segs;             /*     6     2 */
> 	struct sk_buff     * frag_list;                  /*     8     8 */
> 	struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
> 	unsigned int               gso_type;             /*    24     4 */
> 	u32                        tskey;                /*    28     4 */
> 	atomic_t                   dataref;              /*    32     0 */
>
> 	/* XXX 8 bytes hole, try to pack */
>
> 	void *                     destructor_arg;       /*    40     8 */
> 	skb_frag_t                 frags[17];            /*    48   272 */
>
> 	/* size: 320, cachelines: 5, members: 13 */
> 	/* sum members: 312, holes: 1, sum holes: 8 */
> };

  reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-23  7:06 [RFC V1 net-next 0/1] Introduce xdp to ena sameehj
2019-06-23  7:06 ` [RFC V1 net-next 1/1] net: ena: implement XDP drop support sameehj
2019-06-23 14:09   ` Jesper Dangaard Brouer
2019-06-23 14:21   ` Jesper Dangaard Brouer
2019-06-25  3:19     ` Machulsky, Zorik
2019-06-26  8:38       ` XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support) Jesper Dangaard Brouer
2019-06-26 11:52         ` Toke Høiland-Jørgensen
2019-06-26 14:40           ` Jesper Dangaard Brouer
2019-06-26 15:01             ` Toke Høiland-Jørgensen
2019-06-26 15:20               ` Willem de Bruijn
2019-06-26 16:42                 ` Jonathan Lemon
2019-06-26 20:00                   ` Jesper Dangaard Brouer
2019-06-27 22:07                     ` Jonathan Lemon [this message]
2019-06-28  8:46                 ` Jesper Dangaard Brouer
2019-06-26 15:14             ` Toke Høiland-Jørgensen
2019-06-26 16:36               ` Jesper Dangaard Brouer
2019-06-28  7:14         ` Eelco Chaudron
2019-06-28  7:46           ` Toke Høiland-Jørgensen
2019-06-28 11:49             ` Eelco Chaudron
2019-06-28  8:22           ` Jesper Dangaard Brouer
2019-06-23 14:28   ` [RFC V1 net-next 1/1] net: ena: implement XDP drop support David Ahern
2019-06-23 14:51   ` Maciej Fijalkowski

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0D1286E7-02CD-4252-823C-4D1CBB8F2807@gmail.com \
    --to=jonathan.lemon@gmail.com \
    --cc=akiyano@amazon.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=aliguori@amazon.com \
    --cc=alisaidi@amazon.com \
    --cc=benh@amazon.com \
    --cc=borkmann@iogearbox.net \
    --cc=brouer@redhat.com \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=gtzalik@amazon.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=matua@amazon.com \
    --cc=msw@amazon.com \
    --cc=nafea@amazon.com \
    --cc=netanel@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedb@amazon.com \
    --cc=sameehj@amazon.com \
    --cc=toke@redhat.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=xdp-newbies@vger.kernel.org \
    --cc=zorik@amazon.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git