From: "Björn Töpel" <bjorn.topel@intel.com> To: Jakub Kicinski <kuba@kernel.org> Cc: "Jesper Dangaard Brouer" <brouer@redhat.com>, "Björn Töpel" <bjorn.topel@gmail.com>, "Eric Dumazet" <eric.dumazet@gmail.com>, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, bpf@vger.kernel.org, magnus.karlsson@intel.com, davem@davemloft.net, john.fastabend@gmail.com, intel-wired-lan@lists.osuosl.org Subject: Re: [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full Date: Tue, 8 Sep 2020 08:58:30 +0200 [thread overview] Message-ID: <8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com> (raw) In-Reply-To: <20200907114055.27c95483@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> On 2020-09-07 20:40, Jakub Kicinski wrote: > On Mon, 7 Sep 2020 15:37:40 +0200 Björn Töpel wrote: >> > I've been pondering the exact problem you're solving with Maciej >> > recently. The efficiency of AF_XDP on one core with the NAPI processing. >> > >> > Your solution (even though it admittedly helps, and is quite simple) >> > still has the application potentially not able to process packets >> > until the queue fills up. This will be bad for latency. >> > >> > Why don't we move closer to application polling? Never re-arm the NAPI >> > after RX, let the application ask for packets, re-arm if 0 polled. >> > You'd get max batching, min latency. >> > >> > Who's the rambling one now? :-D >> > >> >> :-D No, these are all very good ideas! We've actually experimented >> with it with the busy-poll series a while back -- NAPI busy-polling >> does exactly "application polling". >> >> However, I wonder if the busy-polling would have better performance >> than the scenario above (i.e. when the ksoftirqd never kicks in)? >> Executing the NAPI poll *explicitly* in the syscall, or implicitly >> from the softirq. >> >> Hmm, thinking out loud here. A simple(r) patch enabling busy poll; >> Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to >> sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg. >> >> Or did you have something completely different in mind? > > My understanding is that busy-polling is allowing application to pick > up packets from the ring before the IRQ fires. > > What we're more concerned about is the IRQ firing in the first place. > > application: busy | needs packets | idle > -----------------------+---------------+---------------------- > standard | | polls NAPI | keep polling? sleep? > busy poll | IRQ on | IRQ off | IRQ off IRQ on > -------------+---------+---------------+---------------------- > | | polls once | > AF_XDP | IRQ off | IRQ off | IRQ on > > > So busy polling is pretty orthogonal. It only applies to the > "application needs packets" time. What we'd need is for the application > to be able to suppress NAPI polls, promising the kernel that it will > busy poll when appropriate. > Ah, nice write-up! Thanks! A strict busy-poll mechanism, not the opportunistic (existing) NAPI busy-poll. This would be a new kind of mechanism, and a very much welcome one in AF_XDP-land. More below. >> As for this patch set, I think it would make sense to pull it in since >> it makes the single-core scenario *much* better, and it is pretty >> simple. Then do the application polling as another, potentially, >> improvement series. > > Up to you, it's extra code in the driver so mostly your code to > maintain. > > I think that if we implement what I described above - everyone will > use that on a single core setup, so this set would be dead code > (assuming RQ is sized appropriately). But again, your call :) > Now, I agree that the busy-poll you describe above would be the best option, but from my perspective it's a much larger set that involves experimenting. I will explore that, but I still think this series should go in sooner to make the single core scenario usable *today*. Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll", i.e. the NAPI loop is *only* driven by userland, and interrupts stay disabled. "Syscall driven poll-mode driver". :-) On the driver side (again, only talking Intel here, since that's what I know the details of), the NAPI context would only cover AF_XDP queues, so that other queues are not starved. Any ideas how strict busy-poll would look, API/implmentation-wise? An option only for AF_XDP sockets? Would this make sense to regular sockets? If so, maybe extend the existing NAPI busy-poll with a "strict" mode? I'll start playing around a bit, but again, I think this simple series should go in just to make AF_XDP single core usable *today*. Thanks! Björn
WARNING: multiple messages have this Message-ID (diff)
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= <bjorn.topel@intel.com> To: intel-wired-lan@osuosl.org Subject: [Intel-wired-lan] [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full Date: Tue, 8 Sep 2020 08:58:30 +0200 [thread overview] Message-ID: <8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com> (raw) In-Reply-To: <20200907114055.27c95483@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> On 2020-09-07 20:40, Jakub Kicinski wrote: > On Mon, 7 Sep 2020 15:37:40 +0200 Bj?rn T?pel wrote: >> > I've been pondering the exact problem you're solving with Maciej >> > recently. The efficiency of AF_XDP on one core with the NAPI processing. >> > >> > Your solution (even though it admittedly helps, and is quite simple) >> > still has the application potentially not able to process packets >> > until the queue fills up. This will be bad for latency. >> > >> > Why don't we move closer to application polling? Never re-arm the NAPI >> > after RX, let the application ask for packets, re-arm if 0 polled. >> > You'd get max batching, min latency. >> > >> > Who's the rambling one now? :-D >> > >> >> :-D No, these are all very good ideas! We've actually experimented >> with it with the busy-poll series a while back -- NAPI busy-polling >> does exactly "application polling". >> >> However, I wonder if the busy-polling would have better performance >> than the scenario above (i.e. when the ksoftirqd never kicks in)? >> Executing the NAPI poll *explicitly* in the syscall, or implicitly >> from the softirq. >> >> Hmm, thinking out loud here. A simple(r) patch enabling busy poll; >> Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to >> sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg. >> >> Or did you have something completely different in mind? > > My understanding is that busy-polling is allowing application to pick > up packets from the ring before the IRQ fires. > > What we're more concerned about is the IRQ firing in the first place. > > application: busy | needs packets | idle > -----------------------+---------------+---------------------- > standard | | polls NAPI | keep polling? sleep? > busy poll | IRQ on | IRQ off | IRQ off IRQ on > -------------+---------+---------------+---------------------- > | | polls once | > AF_XDP | IRQ off | IRQ off | IRQ on > > > So busy polling is pretty orthogonal. It only applies to the > "application needs packets" time. What we'd need is for the application > to be able to suppress NAPI polls, promising the kernel that it will > busy poll when appropriate. > Ah, nice write-up! Thanks! A strict busy-poll mechanism, not the opportunistic (existing) NAPI busy-poll. This would be a new kind of mechanism, and a very much welcome one in AF_XDP-land. More below. >> As for this patch set, I think it would make sense to pull it in since >> it makes the single-core scenario *much* better, and it is pretty >> simple. Then do the application polling as another, potentially, >> improvement series. > > Up to you, it's extra code in the driver so mostly your code to > maintain. > > I think that if we implement what I described above - everyone will > use that on a single core setup, so this set would be dead code > (assuming RQ is sized appropriately). But again, your call :) > Now, I agree that the busy-poll you describe above would be the best option, but from my perspective it's a much larger set that involves experimenting. I will explore that, but I still think this series should go in sooner to make the single core scenario usable *today*. Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll", i.e. the NAPI loop is *only* driven by userland, and interrupts stay disabled. "Syscall driven poll-mode driver". :-) On the driver side (again, only talking Intel here, since that's what I know the details of), the NAPI context would only cover AF_XDP queues, so that other queues are not starved. Any ideas how strict busy-poll would look, API/implmentation-wise? An option only for AF_XDP sockets? Would this make sense to regular sockets? If so, maybe extend the existing NAPI busy-poll with a "strict" mode? I'll start playing around a bit, but again, I think this simple series should go in just to make AF_XDP single core usable *today*. Thanks! Bj?rn
next prev parent reply other threads:[~2020-09-08 6:58 UTC|newest] Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-04 13:53 [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:53 ` [PATCH bpf-next 1/6] xsk: improve xdp_do_redirect() error codes Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:53 ` [PATCH bpf-next 2/6] xdp: introduce xdp_do_redirect_ext() function Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:53 ` [PATCH bpf-next 3/6] xsk: introduce xsk_do_redirect_rx_full() helper Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 15:11 ` Jesper Dangaard Brouer 2020-09-04 15:11 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-09-04 15:39 ` Björn Töpel 2020-09-04 15:39 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-07 12:45 ` Jesper Dangaard Brouer 2020-09-07 12:45 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-09-04 13:53 ` [PATCH bpf-next 4/6] i40e, xsk: finish napi loop if AF_XDP Rx queue is full Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:53 ` [PATCH bpf-next 5/6] ice, " Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:53 ` [PATCH bpf-next 6/6] ixgbe, " Björn Töpel 2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 15:35 ` Jesper Dangaard Brouer 2020-09-04 15:35 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-09-04 15:54 ` Björn Töpel 2020-09-04 15:54 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 13:59 ` [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring " Björn Töpel 2020-09-04 13:59 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-08 10:32 ` Maxim Mikityanskiy 2020-09-08 10:32 ` [Intel-wired-lan] " Maxim Mikityanskiy 2020-09-08 11:37 ` Magnus Karlsson 2020-09-08 11:37 ` [Intel-wired-lan] " Magnus Karlsson 2020-09-08 12:21 ` Björn Töpel 2020-09-08 12:21 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-09 15:37 ` Jesper Dangaard Brouer 2020-09-09 15:37 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-09-04 14:27 ` Jesper Dangaard Brouer 2020-09-04 14:27 ` [Intel-wired-lan] " Jesper Dangaard Brouer 2020-09-04 14:32 ` Björn Töpel 2020-09-04 14:32 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-04 23:58 ` Jakub Kicinski 2020-09-04 23:58 ` [Intel-wired-lan] " Jakub Kicinski 2020-09-07 13:37 ` Björn Töpel 2020-09-07 13:37 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-07 18:40 ` Jakub Kicinski 2020-09-07 18:40 ` [Intel-wired-lan] " Jakub Kicinski 2020-09-08 6:58 ` Björn Töpel [this message] 2020-09-08 6:58 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-08 17:24 ` Jakub Kicinski 2020-09-08 17:24 ` [Intel-wired-lan] " Jakub Kicinski 2020-09-08 18:28 ` Björn Töpel 2020-09-08 18:28 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= 2020-09-08 18:34 ` Jakub Kicinski 2020-09-08 18:34 ` [Intel-wired-lan] " Jakub Kicinski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com \ --to=bjorn.topel@intel.com \ --cc=ast@kernel.org \ --cc=bjorn.topel@gmail.com \ --cc=bpf@vger.kernel.org \ --cc=brouer@redhat.com \ --cc=daniel@iogearbox.net \ --cc=davem@davemloft.net \ --cc=eric.dumazet@gmail.com \ --cc=intel-wired-lan@lists.osuosl.org \ --cc=john.fastabend@gmail.com \ --cc=kuba@kernel.org \ --cc=magnus.karlsson@intel.com \ --cc=netdev@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.