All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Björn Töpel" <bjorn.topel@intel.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Björn Töpel" <bjorn.topel@gmail.com>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org,
	bpf@vger.kernel.org, magnus.karlsson@intel.com,
	davem@davemloft.net, john.fastabend@gmail.com,
	intel-wired-lan@lists.osuosl.org
Subject: Re: [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full
Date: Tue, 8 Sep 2020 08:58:30 +0200	[thread overview]
Message-ID: <8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com> (raw)
In-Reply-To: <20200907114055.27c95483@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 2020-09-07 20:40, Jakub Kicinski wrote:
> On Mon, 7 Sep 2020 15:37:40 +0200 Björn Töpel wrote:
>>   > I've been pondering the exact problem you're solving with Maciej
>>   > recently. The efficiency of AF_XDP on one core with the NAPI processing.
>>   >
>>   > Your solution (even though it admittedly helps, and is quite simple)
>>   > still has the application potentially not able to process packets
>>   > until the queue fills up. This will be bad for latency.
>>   >
>>   > Why don't we move closer to application polling? Never re-arm the NAPI
>>   > after RX, let the application ask for packets, re-arm if 0 polled.
>>   > You'd get max batching, min latency.
>>   >
>>   > Who's the rambling one now? :-D
>>   >
>>
>> :-D No, these are all very good ideas! We've actually experimented
>> with it with the busy-poll series a while back -- NAPI busy-polling
>> does exactly "application polling".
>>
>> However, I wonder if the busy-polling would have better performance
>> than the scenario above (i.e. when the ksoftirqd never kicks in)?
>> Executing the NAPI poll *explicitly* in the syscall, or implicitly
>> from the softirq.
>>
>> Hmm, thinking out loud here. A simple(r) patch enabling busy poll;
>> Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to
>> sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg.
>>
>> Or did you have something completely different in mind?
> 
> My understanding is that busy-polling is allowing application to pick
> up packets from the ring before the IRQ fires.
> 
> What we're more concerned about is the IRQ firing in the first place.
> 
>   application:   busy    | needs packets | idle
>   -----------------------+---------------+----------------------
>     standard   |         |   polls NAPI  | keep polling? sleep?
>     busy poll  | IRQ on  |    IRQ off    |  IRQ off      IRQ on
>   -------------+---------+---------------+----------------------
>                |         |   polls once  |
>      AF_XDP    | IRQ off |    IRQ off    |  IRQ on
> 
> 
> So busy polling is pretty orthogonal. It only applies to the
> "application needs packets" time. What we'd need is for the application
> to be able to suppress NAPI polls, promising the kernel that it will
> busy poll when appropriate.
>

Ah, nice write-up! Thanks! A strict busy-poll mechanism, not the
opportunistic (existing) NAPI busy-poll.

This would be a new kind of mechanism, and a very much welcome one in
AF_XDP-land. More below.

>> As for this patch set, I think it would make sense to pull it in since
>> it makes the single-core scenario *much* better, and it is pretty
>> simple. Then do the application polling as another, potentially,
>> improvement series.
> 
> Up to you, it's extra code in the driver so mostly your code to
> maintain.
> 
> I think that if we implement what I described above - everyone will
> use that on a single core setup, so this set would be dead code
> (assuming RQ is sized appropriately). But again, your call :)
> 

Now, I agree that the busy-poll you describe above would be the best
option, but from my perspective it's a much larger set that involves
experimenting. I will explore that, but I still think this series should
go in sooner to make the single core scenario usable *today*.

Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll",
i.e. the NAPI loop is *only* driven by userland, and interrupts stay
disabled. "Syscall driven poll-mode driver". :-)

On the driver side (again, only talking Intel here, since that's what I
know the details of), the NAPI context would only cover AF_XDP queues,
so that other queues are not starved.

Any ideas how strict busy-poll would look, API/implmentation-wise? An
option only for AF_XDP sockets? Would this make sense to regular
sockets? If so, maybe extend the existing NAPI busy-poll with a "strict"
mode?

I'll start playing around a bit, but again, I think this simple series
should go in just to make AF_XDP single core usable *today*.


Thanks!
Björn

WARNING: multiple messages have this Message-ID (diff)
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= <bjorn.topel@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full
Date: Tue, 8 Sep 2020 08:58:30 +0200	[thread overview]
Message-ID: <8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com> (raw)
In-Reply-To: <20200907114055.27c95483@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On 2020-09-07 20:40, Jakub Kicinski wrote:
> On Mon, 7 Sep 2020 15:37:40 +0200 Bj?rn T?pel wrote:
>>   > I've been pondering the exact problem you're solving with Maciej
>>   > recently. The efficiency of AF_XDP on one core with the NAPI processing.
>>   >
>>   > Your solution (even though it admittedly helps, and is quite simple)
>>   > still has the application potentially not able to process packets
>>   > until the queue fills up. This will be bad for latency.
>>   >
>>   > Why don't we move closer to application polling? Never re-arm the NAPI
>>   > after RX, let the application ask for packets, re-arm if 0 polled.
>>   > You'd get max batching, min latency.
>>   >
>>   > Who's the rambling one now? :-D
>>   >
>>
>> :-D No, these are all very good ideas! We've actually experimented
>> with it with the busy-poll series a while back -- NAPI busy-polling
>> does exactly "application polling".
>>
>> However, I wonder if the busy-polling would have better performance
>> than the scenario above (i.e. when the ksoftirqd never kicks in)?
>> Executing the NAPI poll *explicitly* in the syscall, or implicitly
>> from the softirq.
>>
>> Hmm, thinking out loud here. A simple(r) patch enabling busy poll;
>> Exporting the napi_id to the AF_XDP socket (xdp->rxq->napi_id to
>> sk->sk_napi_id), and do the sk_busy_poll_loop() in sendmsg.
>>
>> Or did you have something completely different in mind?
> 
> My understanding is that busy-polling is allowing application to pick
> up packets from the ring before the IRQ fires.
> 
> What we're more concerned about is the IRQ firing in the first place.
> 
>   application:   busy    | needs packets | idle
>   -----------------------+---------------+----------------------
>     standard   |         |   polls NAPI  | keep polling? sleep?
>     busy poll  | IRQ on  |    IRQ off    |  IRQ off      IRQ on
>   -------------+---------+---------------+----------------------
>                |         |   polls once  |
>      AF_XDP    | IRQ off |    IRQ off    |  IRQ on
> 
> 
> So busy polling is pretty orthogonal. It only applies to the
> "application needs packets" time. What we'd need is for the application
> to be able to suppress NAPI polls, promising the kernel that it will
> busy poll when appropriate.
>

Ah, nice write-up! Thanks! A strict busy-poll mechanism, not the
opportunistic (existing) NAPI busy-poll.

This would be a new kind of mechanism, and a very much welcome one in
AF_XDP-land. More below.

>> As for this patch set, I think it would make sense to pull it in since
>> it makes the single-core scenario *much* better, and it is pretty
>> simple. Then do the application polling as another, potentially,
>> improvement series.
> 
> Up to you, it's extra code in the driver so mostly your code to
> maintain.
> 
> I think that if we implement what I described above - everyone will
> use that on a single core setup, so this set would be dead code
> (assuming RQ is sized appropriately). But again, your call :)
> 

Now, I agree that the busy-poll you describe above would be the best
option, but from my perspective it's a much larger set that involves
experimenting. I will explore that, but I still think this series should
go in sooner to make the single core scenario usable *today*.

Ok, back to the busy-poll ideas. I'll call your idea "strict busy-poll",
i.e. the NAPI loop is *only* driven by userland, and interrupts stay
disabled. "Syscall driven poll-mode driver". :-)

On the driver side (again, only talking Intel here, since that's what I
know the details of), the NAPI context would only cover AF_XDP queues,
so that other queues are not starved.

Any ideas how strict busy-poll would look, API/implmentation-wise? An
option only for AF_XDP sockets? Would this make sense to regular
sockets? If so, maybe extend the existing NAPI busy-poll with a "strict"
mode?

I'll start playing around a bit, but again, I think this simple series
should go in just to make AF_XDP single core usable *today*.


Thanks!
Bj?rn

  reply	other threads:[~2020-09-08  6:58 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04 13:53 [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring is full Björn Töpel
2020-09-04 13:53 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:53 ` [PATCH bpf-next 1/6] xsk: improve xdp_do_redirect() error codes Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:53 ` [PATCH bpf-next 2/6] xdp: introduce xdp_do_redirect_ext() function Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:53 ` [PATCH bpf-next 3/6] xsk: introduce xsk_do_redirect_rx_full() helper Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 15:11   ` Jesper Dangaard Brouer
2020-09-04 15:11     ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 15:39     ` Björn Töpel
2020-09-04 15:39       ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-07 12:45       ` Jesper Dangaard Brouer
2020-09-07 12:45         ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 13:53 ` [PATCH bpf-next 4/6] i40e, xsk: finish napi loop if AF_XDP Rx queue is full Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:53 ` [PATCH bpf-next 5/6] ice, " Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:53 ` [PATCH bpf-next 6/6] ixgbe, " Björn Töpel
2020-09-04 13:53   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 15:35   ` Jesper Dangaard Brouer
2020-09-04 15:35     ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 15:54     ` Björn Töpel
2020-09-04 15:54       ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 13:59 ` [PATCH bpf-next 0/6] xsk: exit NAPI loop when AF_XDP Rx ring " Björn Töpel
2020-09-04 13:59   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-08 10:32   ` Maxim Mikityanskiy
2020-09-08 10:32     ` [Intel-wired-lan] " Maxim Mikityanskiy
2020-09-08 11:37     ` Magnus Karlsson
2020-09-08 11:37       ` [Intel-wired-lan] " Magnus Karlsson
2020-09-08 12:21       ` Björn Töpel
2020-09-08 12:21         ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-09 15:37     ` Jesper Dangaard Brouer
2020-09-09 15:37       ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 14:27 ` Jesper Dangaard Brouer
2020-09-04 14:27   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2020-09-04 14:32   ` Björn Töpel
2020-09-04 14:32     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-04 23:58     ` Jakub Kicinski
2020-09-04 23:58       ` [Intel-wired-lan] " Jakub Kicinski
2020-09-07 13:37       ` Björn Töpel
2020-09-07 13:37         ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-07 18:40         ` Jakub Kicinski
2020-09-07 18:40           ` [Intel-wired-lan] " Jakub Kicinski
2020-09-08  6:58           ` Björn Töpel [this message]
2020-09-08  6:58             ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-08 17:24             ` Jakub Kicinski
2020-09-08 17:24               ` [Intel-wired-lan] " Jakub Kicinski
2020-09-08 18:28               ` Björn Töpel
2020-09-08 18:28                 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2020-09-08 18:34                 ` Jakub Kicinski
2020-09-08 18:34                   ` [Intel-wired-lan] " Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f698ac5-916f-9bb0-cce2-f00fba6ba407@intel.com \
    --to=bjorn.topel@intel.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.