From: Jesper Dangaard Brouer <brouer@redhat.com> To: "Jonathan Lemon" <jonathan.lemon@gmail.com> Cc: "Willem de Bruijn" <willemdebruijn.kernel@gmail.com>, "Toke Høiland-Jørgensen" <toke@redhat.com>, "Machulsky, Zorik" <zorik@amazon.com>, "Jubran, Samih" <sameehj@amazon.com>, davem@davemloft.net, netdev@vger.kernel.org, "Woodhouse, David" <dwmw@amazon.co.uk>, "Matushevsky, Alexander" <matua@amazon.com>, "Bshara, Saeed" <saeedb@amazon.com>, "Wilson, Matt" <msw@amazon.com>, "Liguori, Anthony" <aliguori@amazon.com>, "Bshara, Nafea" <nafea@amazon.com>, "Tzalik, Guy" <gtzalik@amazon.com>, "Belgazal, Netanel" <netanel@amazon.com>, "Saidi, Ali" <alisaidi@amazon.com>, "Herrenschmidt, Benjamin" <benh@amazon.com>, "Kiyanovski, Arthur" <akiyano@amazon.com>, "Daniel Borkmann" <borkmann@iogearbox.net>, "Ilias Apalodimas" <ilias.apalodimas@linaro.org>, "Alexei Starovoitov" <alexei.starovoitov@gmail.com>, "Jakub Kicinski" <jakub.kicinski@netronome.com>, xdp-newbies@vger.kernel.org, brouer@redhat.com Subject: Re: XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support) Date: Wed, 26 Jun 2019 22:00:28 +0200 Message-ID: <20190626220028.2bb12196@carbon> (raw) In-Reply-To: <99AFC1EE-E27E-4D4D-B9B8-CA2215E68E1B@gmail.com> On Wed, 26 Jun 2019 09:42:07 -0700 "Jonathan Lemon" <jonathan.lemon@gmail.com> wrote: > If all packets are collected together (like the bulk queue does), and > then passed to XDP, this could easily be made backwards compatible. > If the XDP program isn't 'multi-frag' aware, then each packet is just > passed in individually. My proposal#1 is XDP only access first-buffer[1], as this simplifies things. (AFAIK) What you are proposing is that all the buffers are passed to the XDP prog (in form of a iovec). I need some more details about your suggestion. Specifically: - What is the semantic when a 3 buffer packet is input and XDP prog choose to return XDP_DROP for packet #2 ? - Same situation of packet #2 wants a XDP_TX or redirect? > Of course, passing in the equivalent of a iovec requires some form of > loop support on the BPF side, doesn't it? The data structure used for holding these packet buffers/segments also needs to be discussed. I would either use an array of bio_vec[2] or skb_frag_t (aka skb_frag_struct). The skb_frag_t would be most obvious, as we already have to write this when creating an SKB, in skb_shared_info area. (Structs listed below signature). The problem is also that size of these structs (16 bytes) per buffer/segment, and we likely need to support 17 segments, as this need to be compatible with SKBs (size 272 bytes). My idea here is that we simply use the same memory area, that we have to store skb_shared_info into. As this allow us to get the SKB setup for free, when doing XDP_PASS or when doing SKB alloc after XDP_REDIRECT. [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org#proposal1-xdp-only-access-first-buffer [2] https://lore.kernel.org/netdev/20190501041757.8647-1-willy@infradead.org/ -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer $ pahole -C skb_frag_struct vmlinux struct skb_frag_struct { struct { struct page * p; /* 0 8 */ } page; /* 0 8 */ __u32 page_offset; /* 8 4 */ __u32 size; /* 12 4 */ /* size: 16, cachelines: 1, members: 3 */ /* last cacheline: 16 bytes */ }; $ pahole -C bio_vec vmlinux struct bio_vec { struct page * bv_page; /* 0 8 */ unsigned int bv_len; /* 8 4 */ unsigned int bv_offset; /* 12 4 */ /* size: 16, cachelines: 1, members: 3 */ /* last cacheline: 16 bytes */ }; $ pahole -C skb_shared_info vmlinux struct skb_shared_info { __u8 __unused; /* 0 1 */ __u8 meta_len; /* 1 1 */ __u8 nr_frags; /* 2 1 */ __u8 tx_flags; /* 3 1 */ short unsigned int gso_size; /* 4 2 */ short unsigned int gso_segs; /* 6 2 */ struct sk_buff * frag_list; /* 8 8 */ struct skb_shared_hwtstamps hwtstamps; /* 16 8 */ unsigned int gso_type; /* 24 4 */ u32 tskey; /* 28 4 */ atomic_t dataref; /* 32 0 */ /* XXX 8 bytes hole, try to pack */ void * destructor_arg; /* 40 8 */ skb_frag_t frags[17]; /* 48 272 */ /* size: 320, cachelines: 5, members: 13 */ /* sum members: 312, holes: 1, sum holes: 8 */ };
next prev parent reply index Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-23 7:06 [RFC V1 net-next 0/1] Introduce xdp to ena sameehj 2019-06-23 7:06 ` [RFC V1 net-next 1/1] net: ena: implement XDP drop support sameehj 2019-06-23 14:09 ` Jesper Dangaard Brouer 2019-06-23 14:21 ` Jesper Dangaard Brouer 2019-06-25 3:19 ` Machulsky, Zorik 2019-06-26 8:38 ` XDP multi-buffer incl. jumbo-frames (Was: [RFC V1 net-next 1/1] net: ena: implement XDP drop support) Jesper Dangaard Brouer 2019-06-26 11:52 ` Toke Høiland-Jørgensen 2019-06-26 14:40 ` Jesper Dangaard Brouer 2019-06-26 15:01 ` Toke Høiland-Jørgensen 2019-06-26 15:20 ` Willem de Bruijn 2019-06-26 16:42 ` Jonathan Lemon 2019-06-26 20:00 ` Jesper Dangaard Brouer [this message] 2019-06-27 22:07 ` Jonathan Lemon 2019-06-28 8:46 ` Jesper Dangaard Brouer 2019-06-26 15:14 ` Toke Høiland-Jørgensen 2019-06-26 16:36 ` Jesper Dangaard Brouer 2019-06-28 7:14 ` Eelco Chaudron 2019-06-28 7:46 ` Toke Høiland-Jørgensen 2019-06-28 11:49 ` Eelco Chaudron 2019-06-28 8:22 ` Jesper Dangaard Brouer 2019-06-23 14:28 ` [RFC V1 net-next 1/1] net: ena: implement XDP drop support David Ahern 2019-06-23 14:51 ` Maciej Fijalkowski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190626220028.2bb12196@carbon \ --to=brouer@redhat.com \ --cc=akiyano@amazon.com \ --cc=alexei.starovoitov@gmail.com \ --cc=aliguori@amazon.com \ --cc=alisaidi@amazon.com \ --cc=benh@amazon.com \ --cc=borkmann@iogearbox.net \ --cc=davem@davemloft.net \ --cc=dwmw@amazon.co.uk \ --cc=gtzalik@amazon.com \ --cc=ilias.apalodimas@linaro.org \ --cc=jakub.kicinski@netronome.com \ --cc=jonathan.lemon@gmail.com \ --cc=matua@amazon.com \ --cc=msw@amazon.com \ --cc=nafea@amazon.com \ --cc=netanel@amazon.com \ --cc=netdev@vger.kernel.org \ --cc=saeedb@amazon.com \ --cc=sameehj@amazon.com \ --cc=toke@redhat.com \ --cc=willemdebruijn.kernel@gmail.com \ --cc=xdp-newbies@vger.kernel.org \ --cc=zorik@amazon.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Netdev Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \ netdev@vger.kernel.org public-inbox-index netdev Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.netdev AGPL code for this site: git clone https://public-inbox.org/public-inbox.git