From: Saeed Mahameed <saeed@kernel.org> To: Jesper Dangaard Brouer <brouer@redhat.com>, David Ahern <dsahern@gmail.com>, Frey Alfredsson <freysteinn@freysteinn.com> Cc: "Maciej Fijalkowski" <maciej.fijalkowski@intel.com>, "John Fastabend" <john.fastabend@gmail.com>, "Daniel Borkmann" <daniel@iogearbox.net>, "Toke Høiland-Jørgensen" <toke@redhat.com>, alardam@gmail.com, magnus.karlsson@intel.com, bjorn.topel@intel.com, andrii.nakryiko@gmail.com, kuba@kernel.org, ast@kernel.org, netdev@vger.kernel.org, davem@davemloft.net, hawk@kernel.org, jonathan.lemon@gmail.com, bpf@vger.kernel.org, jeffrey.t.kirsher@intel.com, maciejromanfijalkowski@gmail.com, intel-wired-lan@lists.osuosl.org, "Marek Majtyka" <marekx.majtyka@intel.com>, "Michael S. Tsirkin" <mst@redhat.com> Subject: Re: Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set) Date: Thu, 10 Dec 2020 11:20:31 -0800 [thread overview] Message-ID: <e8d17e650f641be4aabf119753aa07cacfda2182.camel@kernel.org> (raw) In-Reply-To: <20201210143211.2490f7f4@carbon> On Thu, 2020-12-10 at 14:32 +0100, Jesper Dangaard Brouer wrote: > On Wed, 9 Dec 2020 08:44:33 -0700 > David Ahern <dsahern@gmail.com> wrote: > > > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote: > > > But I have redesigned the ndo_xdp_xmit call to take a bulk of > > > packets > > > (up-to 16) so it should not be a problem to solve this by sharing > > > TX-queue and talking a lock per 16 packets. I still recommend > > > that, > > > for fallback case, you allocated a number a TX-queue and > > > distribute > > > this across CPUs to avoid hitting a congested lock (above > > > measurements > > > are the optimal non-congested atomic lock operation) > > > > I have been meaning to ask you why 16 for the XDP batching? If the > > netdev budget is 64, why not something higher like 32 or 64? > > Thanks you for asking as there are multiple good reasons and > consideration for this 16 batch size. Notice cpumap have batch size > 8, > which is also an explicit choice. And AF_XDP went in the wrong > direction IMHO and I think have 256. I designed this to be a choice > in > the map code, for the level of bulking it needs/wants. > > The low level explanation is that these 8 and 16 batch sizes are > optimized towards cache sizes and Intel's Line-Fill-Buffer > (prefetcher > with 10 elements). I'm betting on that memory backing these 8 or 16 > packets have higher chance to remain/being in cache, and I can > prefetch > them without evicting them from cache again. In some cases the > pointer > to these packets are queued into a ptr_ring, and it is more optimal > to > write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the > ptr_ring. > I've warned people about this once or twice on the mailing list, for example re-populating the rx ring, a common mistake is to use the napi budget, which has the exact side effects as you are explaining here Jesper ! these 8/16 numbers are used in more than one place in the stack, xdp, gro, hw buffer re-population, etc.. how can we enforce such numbers and a uniform handling in all drivers? 1. have a clear documentation ? well know defines, for people to copy? 2. for XDP we must keep track on the memory backing of the xdp bulked data as Jesper pointed out, so we always make sure whatever bulk-size we define it always remains cache friendly, especially now where people stated working on multi-buff and other features that will extend the xdp_buff and xdp_frame, do we need a selftest that maybe runs pahole to see the those data strcutre remain within reasonable format/sizes ? > The general explanation is my goal to do bulking without adding > latency. > This is explicitly stated in my presentation[1] as of Feb 2016, slide > 20. > Sure, you/we can likely make the micro-benchmarks look better by > using > 64 batch size, but that will introduce added latency and likely shoot > our-selves in the foot for real workloads. With experience from > bufferbloat and real networks, we know that massive TX bulking have > bad > effects. Still XDP-redirect does massive bulking (NIC flush is after > full 64 budget) and we don't have pushback or a queue mechanism (so I > know we are already shooting ourselves in the foot) ... Fortunately > we > now have a PhD student working on queuing for XDP. > > It is also important to understand that this is an adaptive bulking > scheme, which comes from NAPI. We don't wait for packets arriving > shortly, we pickup what NIC have available, but by only taking 8 or > 16 > packets (instead of emptying the entire RX-queue), and then spending > some time to send them along, I'm hoping that NIC could have gotten > some more frame. For cpumap and veth (in-some-cases) they can start > to > consume packets from these batches, but NIC drivers gets > XDP_XMIT_FLUSH > signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to > update their internal queue state (and BQL), and if it gets close to > full they can choose to flush/doorbell the NIC earlier. When doing > queuing for XDP we need to expose these NIC queue states, and having > 4 > calls with 16 packets (64 budget) also gives us more chances to get > NIC > queue state info which the NIC already touch. > > > [1] > https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
WARNING: multiple messages have this Message-ID (diff)
From: Saeed Mahameed <saeed@kernel.org> To: intel-wired-lan@osuosl.org Subject: [Intel-wired-lan] Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set) Date: Thu, 10 Dec 2020 11:20:31 -0800 [thread overview] Message-ID: <e8d17e650f641be4aabf119753aa07cacfda2182.camel@kernel.org> (raw) In-Reply-To: <20201210143211.2490f7f4@carbon> On Thu, 2020-12-10 at 14:32 +0100, Jesper Dangaard Brouer wrote: > On Wed, 9 Dec 2020 08:44:33 -0700 > David Ahern <dsahern@gmail.com> wrote: > > > On 12/9/20 4:52 AM, Jesper Dangaard Brouer wrote: > > > But I have redesigned the ndo_xdp_xmit call to take a bulk of > > > packets > > > (up-to 16) so it should not be a problem to solve this by sharing > > > TX-queue and talking a lock per 16 packets. I still recommend > > > that, > > > for fallback case, you allocated a number a TX-queue and > > > distribute > > > this across CPUs to avoid hitting a congested lock (above > > > measurements > > > are the optimal non-congested atomic lock operation) > > > > I have been meaning to ask you why 16 for the XDP batching? If the > > netdev budget is 64, why not something higher like 32 or 64? > > Thanks you for asking as there are multiple good reasons and > consideration for this 16 batch size. Notice cpumap have batch size > 8, > which is also an explicit choice. And AF_XDP went in the wrong > direction IMHO and I think have 256. I designed this to be a choice > in > the map code, for the level of bulking it needs/wants. > > The low level explanation is that these 8 and 16 batch sizes are > optimized towards cache sizes and Intel's Line-Fill-Buffer > (prefetcher > with 10 elements). I'm betting on that memory backing these 8 or 16 > packets have higher chance to remain/being in cache, and I can > prefetch > them without evicting them from cache again. In some cases the > pointer > to these packets are queued into a ptr_ring, and it is more optimal > to > write cacheline sizes 1 (8 pointers) or 2 (16 pointers) into the > ptr_ring. > I've warned people about this once or twice on the mailing list, for example re-populating the rx ring, a common mistake is to use the napi budget, which has the exact side effects as you are explaining here Jesper ! these 8/16 numbers are used in more than one place in the stack, xdp, gro, hw buffer re-population, etc.. how can we enforce such numbers and a uniform handling in all drivers? 1. have a clear documentation ? well know defines, for people to copy? 2. for XDP we must keep track on the memory backing of the xdp bulked data as Jesper pointed out, so we always make sure whatever bulk-size we define it always remains cache friendly, especially now where people stated working on multi-buff and other features that will extend the xdp_buff and xdp_frame, do we need a selftest that maybe runs pahole to see the those data strcutre remain within reasonable format/sizes ? > The general explanation is my goal to do bulking without adding > latency. > This is explicitly stated in my presentation[1] as of Feb 2016, slide > 20. > Sure, you/we can likely make the micro-benchmarks look better by > using > 64 batch size, but that will introduce added latency and likely shoot > our-selves in the foot for real workloads. With experience from > bufferbloat and real networks, we know that massive TX bulking have > bad > effects. Still XDP-redirect does massive bulking (NIC flush is after > full 64 budget) and we don't have pushback or a queue mechanism (so I > know we are already shooting ourselves in the foot) ... Fortunately > we > now have a PhD student working on queuing for XDP. > > It is also important to understand that this is an adaptive bulking > scheme, which comes from NAPI. We don't wait for packets arriving > shortly, we pickup what NIC have available, but by only taking 8 or > 16 > packets (instead of emptying the entire RX-queue), and then spending > some time to send them along, I'm hoping that NIC could have gotten > some more frame. For cpumap and veth (in-some-cases) they can start > to > consume packets from these batches, but NIC drivers gets > XDP_XMIT_FLUSH > signal at NAPI-end (xdp_do_flush). Still design allows NIC drivers to > update their internal queue state (and BQL), and if it gets close to > full they can choose to flush/doorbell the NIC earlier. When doing > queuing for XDP we need to expose these NIC queue states, and having > 4 > calls with 16 packets (64 budget) also gives us more chances to get > NIC > queue state info which the NIC already touch. > > > [1] > https://people.netfilter.org/hawk/presentations/devconf2016/net_stack_challenges_100G_Feb2016.pdf
next prev parent reply other threads:[~2020-12-10 19:21 UTC|newest] Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-12-04 10:28 [PATCH v2 bpf 0/5] New netdev feature flags for XDP alardam 2020-12-04 10:28 ` [Intel-wired-lan] " alardam 2020-12-04 10:28 ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set alardam 2020-12-04 10:28 ` [Intel-wired-lan] " alardam 2020-12-04 12:18 ` Toke Høiland-Jørgensen 2020-12-04 12:18 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-04 12:46 ` Maciej Fijalkowski 2020-12-04 12:46 ` [Intel-wired-lan] " Maciej Fijalkowski 2020-12-04 15:21 ` Daniel Borkmann 2020-12-04 15:21 ` [Intel-wired-lan] " Daniel Borkmann 2020-12-04 17:20 ` Toke Høiland-Jørgensen 2020-12-04 17:20 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-04 22:19 ` Daniel Borkmann 2020-12-04 22:19 ` [Intel-wired-lan] " Daniel Borkmann 2020-12-07 11:54 ` Jesper Dangaard Brouer 2020-12-07 11:54 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-07 12:08 ` Toke Høiland-Jørgensen 2020-12-07 12:08 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-07 12:03 ` Toke Høiland-Jørgensen 2020-12-07 12:03 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-07 12:54 ` Jesper Dangaard Brouer 2020-12-07 12:54 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-07 20:52 ` John Fastabend 2020-12-07 20:52 ` [Intel-wired-lan] " John Fastabend 2020-12-07 22:38 ` Saeed Mahameed 2020-12-07 22:38 ` [Intel-wired-lan] " Saeed Mahameed 2020-12-07 23:07 ` Maciej Fijalkowski 2020-12-07 23:07 ` [Intel-wired-lan] " Maciej Fijalkowski 2020-12-09 6:03 ` John Fastabend 2020-12-09 6:03 ` [Intel-wired-lan] " John Fastabend 2020-12-09 9:54 ` Maciej Fijalkowski 2020-12-09 9:54 ` [Intel-wired-lan] " Maciej Fijalkowski 2020-12-09 11:52 ` Jesper Dangaard Brouer 2020-12-09 11:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-09 15:41 ` David Ahern 2020-12-09 15:41 ` [Intel-wired-lan] " David Ahern 2020-12-09 17:15 ` Saeed Mahameed 2020-12-09 17:15 ` [Intel-wired-lan] " Saeed Mahameed 2020-12-10 3:34 ` David Ahern 2020-12-10 3:34 ` [Intel-wired-lan] " David Ahern 2020-12-10 6:48 ` Saeed Mahameed 2020-12-10 6:48 ` [Intel-wired-lan] " Saeed Mahameed 2020-12-10 15:30 ` David Ahern 2020-12-10 15:30 ` [Intel-wired-lan] " David Ahern 2020-12-10 18:58 ` Saeed Mahameed 2020-12-10 18:58 ` [Intel-wired-lan] " Saeed Mahameed 2021-01-05 11:56 ` Marek Majtyka 2021-01-05 11:56 ` [Intel-wired-lan] " Marek Majtyka 2021-02-01 16:16 ` Toke Høiland-Jørgensen 2021-02-01 16:16 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-02-02 11:26 ` Marek Majtyka 2021-02-02 11:26 ` [Intel-wired-lan] " Marek Majtyka 2021-02-02 12:05 ` Toke Høiland-Jørgensen 2021-02-02 12:05 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-02-02 19:34 ` Jakub Kicinski 2021-02-02 19:34 ` [Intel-wired-lan] " Jakub Kicinski 2021-02-03 12:50 ` Marek Majtyka 2021-02-03 12:50 ` [Intel-wired-lan] " Marek Majtyka 2021-02-03 17:02 ` Jakub Kicinski 2021-02-03 17:02 ` [Intel-wired-lan] " Jakub Kicinski 2021-02-10 10:53 ` Toke Høiland-Jørgensen 2021-02-10 10:53 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-02-10 18:31 ` Jakub Kicinski 2021-02-10 18:31 ` [Intel-wired-lan] " Jakub Kicinski 2021-02-10 22:52 ` Toke Høiland-Jørgensen 2021-02-10 22:52 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-02-12 1:26 ` Jakub Kicinski 2021-02-12 1:26 ` [Intel-wired-lan] " Jakub Kicinski 2021-02-12 2:05 ` Alexei Starovoitov 2021-02-12 2:05 ` [Intel-wired-lan] " Alexei Starovoitov 2021-02-12 7:02 ` Marek Majtyka 2021-02-12 7:02 ` [Intel-wired-lan] " Marek Majtyka 2021-02-16 14:30 ` Toke Høiland-Jørgensen 2021-02-16 14:30 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-09 15:44 ` David Ahern 2020-12-09 15:44 ` [Intel-wired-lan] " David Ahern 2020-12-10 13:32 ` Explaining XDP redirect bulk size design (Was: [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set) Jesper Dangaard Brouer 2020-12-10 13:32 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-10 14:14 ` Magnus Karlsson 2020-12-10 14:14 ` Magnus Karlsson 2020-12-10 17:30 ` Jesper Dangaard Brouer 2020-12-10 17:30 ` Jesper Dangaard Brouer 2020-12-10 19:20 ` Saeed Mahameed [this message] 2020-12-10 19:20 ` Saeed Mahameed 2020-12-08 1:01 ` [PATCH v2 bpf 1/5] net: ethtool: add xdp properties flag set David Ahern 2020-12-08 1:01 ` [Intel-wired-lan] " David Ahern 2020-12-08 8:28 ` Jesper Dangaard Brouer 2020-12-08 8:28 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-08 11:58 ` Toke Høiland-Jørgensen 2020-12-08 11:58 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-09 5:50 ` John Fastabend 2020-12-09 5:50 ` [Intel-wired-lan] " John Fastabend 2020-12-09 10:26 ` Toke Høiland-Jørgensen 2020-12-09 10:26 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-08 9:00 ` Jesper Dangaard Brouer 2020-12-08 9:00 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-12-08 9:42 ` Daniel Borkmann 2020-12-08 9:42 ` [Intel-wired-lan] " Daniel Borkmann 2020-12-04 12:57 ` Maciej Fijalkowski 2020-12-04 12:57 ` [Intel-wired-lan] " Maciej Fijalkowski 2020-12-04 10:28 ` [PATCH v2 bpf 2/5] drivers/net: turn XDP properties on alardam 2020-12-04 10:28 ` [Intel-wired-lan] " alardam 2020-12-04 12:19 ` Toke Høiland-Jørgensen 2020-12-04 12:19 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-09 19:05 ` kernel test robot 2020-12-09 19:05 ` kernel test robot 2020-12-04 10:28 ` [PATCH v2 bpf 3/5] xsk: add usage of xdp properties flags alardam 2020-12-04 10:28 ` [Intel-wired-lan] " alardam 2020-12-04 10:29 ` [PATCH v2 bpf 4/5] xsk: add check for full support of XDP in bind alardam 2020-12-04 10:29 ` [Intel-wired-lan] " alardam 2020-12-04 10:29 ` [PATCH v2 bpf 5/5] ethtool: provide xdp info with XDP_PROPERTIES_GET alardam 2020-12-04 10:29 ` [Intel-wired-lan] " alardam 2020-12-04 17:20 ` [PATCH v2 bpf 0/5] New netdev feature flags for XDP Jakub Kicinski 2020-12-04 17:20 ` [Intel-wired-lan] " Jakub Kicinski 2020-12-04 17:26 ` Toke Høiland-Jørgensen 2020-12-04 17:26 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2020-12-04 19:22 ` Jakub Kicinski 2020-12-04 19:22 ` [Intel-wired-lan] " Jakub Kicinski 2020-12-07 12:04 ` Toke Høiland-Jørgensen 2020-12-07 12:04 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=e8d17e650f641be4aabf119753aa07cacfda2182.camel@kernel.org \ --to=saeed@kernel.org \ --cc=alardam@gmail.com \ --cc=andrii.nakryiko@gmail.com \ --cc=ast@kernel.org \ --cc=bjorn.topel@intel.com \ --cc=bpf@vger.kernel.org \ --cc=brouer@redhat.com \ --cc=daniel@iogearbox.net \ --cc=davem@davemloft.net \ --cc=dsahern@gmail.com \ --cc=freysteinn@freysteinn.com \ --cc=hawk@kernel.org \ --cc=intel-wired-lan@lists.osuosl.org \ --cc=jeffrey.t.kirsher@intel.com \ --cc=john.fastabend@gmail.com \ --cc=jonathan.lemon@gmail.com \ --cc=kuba@kernel.org \ --cc=maciej.fijalkowski@intel.com \ --cc=maciejromanfijalkowski@gmail.com \ --cc=magnus.karlsson@intel.com \ --cc=marekx.majtyka@intel.com \ --cc=mst@redhat.com \ --cc=netdev@vger.kernel.org \ --cc=toke@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.