All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
@ 2020-02-07  9:37 Magnus Karlsson
  2020-02-07 12:34 ` Maxim Mikityanskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-02-07  9:37 UTC (permalink / raw)
  To: magnus.karlsson, maximmi
  Cc: xdp-newbies, rgoodfel, bjorn.topel, tariqt, saeedm, moshe

The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
introduced a much more lazy way of updating the global consumer
pointers from the kernel side, by only doing so when running out of
entries in the fill or Tx rings (the rings consumed by the
kernel). This can result in a deadlock with the user application if
the kernel requires more than one entry to proceed and the application
cannot put these entries in the fill ring because the kernel has not
updated the global consumer pointer since the ring is not empty.

Fix this by publishing the local kernel side consumer pointer whenever
we have copmleted Rx or Tx processing in the kernel. This way, user
space will have an up-to-date view of the consumer pointers whenever it
gets to execute in the one core case (application and driver on the
same core), or after a certain number of packets have been processed
in the two core case (application and driver on different cores).

A side effect of this patch is that the one core case gets better
performance, but the two core case gets worse. The reason that the one
core case improves is that updating the global consumer pointer is
relatively cheap since the application by definition is not running
when the kernel is (they are on the same core) and it is beneficial
for the application, once it gets to run, to have a pointers that are
as up to date as possible since it then can operate on more packets
and buffers. In the two core case, the most important performance
aspect is to minimize the number of accesses to the global pointers
since they are shared between two cores and bounces between the caches
of those cores. This patch results in more updates to global state,
which means lower performance in the two core case.

Fixes: 4b638f13bab4 ("xsk: Eliminate the RX batch size")
Reported-by: Ryan Goodfellow <rgoodfel@isi.edu>
Reported-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
---
 net/xdp/xsk.c       | 2 ++
 net/xdp/xsk_queue.h | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index df60048..356f90e 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -217,6 +217,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 static void xsk_flush(struct xdp_sock *xs)
 {
 	xskq_prod_submit(xs->rx);
+	__xskq_cons_release(xs->umem->fq);
 	sock_def_readable(&xs->sk);
 }
 
@@ -304,6 +305,7 @@ void xsk_umem_consume_tx_done(struct xdp_umem *umem)
 
 	rcu_read_lock();
 	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
+		__xskq_cons_release(xs->tx);
 		xs->sk.sk_write_space(&xs->sk);
 	}
 	rcu_read_unlock();
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index bec2af1..89a01ac 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -271,7 +271,8 @@ static inline void xskq_cons_release(struct xsk_queue *q)
 {
 	/* To improve performance, only update local state here.
 	 * Reflect this to global state when we get new entries
-	 * from the ring in xskq_cons_get_entries().
+	 * from the ring in xskq_cons_get_entries() and whenever
+	 * Rx or Tx processing are completed in the NAPI loop.
 	 */
 	q->cached_cons++;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
  2020-02-07  9:37 [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed Magnus Karlsson
@ 2020-02-07 12:34 ` Maxim Mikityanskiy
  2020-02-07 21:40   ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Maxim Mikityanskiy @ 2020-02-07 12:34 UTC (permalink / raw)
  To: Magnus Karlsson; +Cc: xdp-newbies, rgoodfel, bjorn.topel, tariqt, saeedm, moshe

On 2020-02-07 11:37, Magnus Karlsson wrote:
> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> introduced a much more lazy way of updating the global consumer
> pointers from the kernel side, by only doing so when running out of
> entries in the fill or Tx rings (the rings consumed by the
> kernel). This can result in a deadlock with the user application if
> the kernel requires more than one entry to proceed and the application
> cannot put these entries in the fill ring because the kernel has not
> updated the global consumer pointer since the ring is not empty.
> 
> Fix this by publishing the local kernel side consumer pointer whenever
> we have copmleted Rx or Tx processing in the kernel. This way, user
> space will have an up-to-date view of the consumer pointers whenever it
> gets to execute in the one core case (application and driver on the
> same core), or after a certain number of packets have been processed
> in the two core case (application and driver on different cores).
> 
> A side effect of this patch is that the one core case gets better
> performance, but the two core case gets worse. The reason that the one
> core case improves is that updating the global consumer pointer is
> relatively cheap since the application by definition is not running
> when the kernel is (they are on the same core) and it is beneficial
> for the application, once it gets to run, to have a pointers that are
> as up to date as possible since it then can operate on more packets
> and buffers. In the two core case, the most important performance
> aspect is to minimize the number of accesses to the global pointers
> since they are shared between two cores and bounces between the caches
> of those cores. This patch results in more updates to global state,
> which means lower performance in the two core case.
> 
> Fixes: 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> Reported-by: Ryan Goodfellow <rgoodfel@isi.edu>
> Reported-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
> ---
>   net/xdp/xsk.c       | 2 ++
>   net/xdp/xsk_queue.h | 3 ++-
>   2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index df60048..356f90e 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -217,6 +217,7 @@ static int xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
>   static void xsk_flush(struct xdp_sock *xs)
>   {
>   	xskq_prod_submit(xs->rx);
> +	__xskq_cons_release(xs->umem->fq);
>   	sock_def_readable(&xs->sk);
>   }
>   
> @@ -304,6 +305,7 @@ void xsk_umem_consume_tx_done(struct xdp_umem *umem)
>   
>   	rcu_read_lock();
>   	list_for_each_entry_rcu(xs, &umem->xsk_list, list) {
> +		__xskq_cons_release(xs->tx);
>   		xs->sk.sk_write_space(&xs->sk);
>   	}
>   	rcu_read_unlock();
> diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
> index bec2af1..89a01ac 100644
> --- a/net/xdp/xsk_queue.h
> +++ b/net/xdp/xsk_queue.h
> @@ -271,7 +271,8 @@ static inline void xskq_cons_release(struct xsk_queue *q)
>   {
>   	/* To improve performance, only update local state here.
>   	 * Reflect this to global state when we get new entries
> -	 * from the ring in xskq_cons_get_entries().
> +	 * from the ring in xskq_cons_get_entries() and whenever
> +	 * Rx or Tx processing are completed in the NAPI loop.
>   	 */
>   	q->cached_cons++;
>   }
> 

Acked-by: Maxim Mikityanskiy <maximmi@mellanox.com>

Is it intentional that you didn't send it to bpf and netdev mailing lists?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
  2020-02-07 12:34 ` Maxim Mikityanskiy
@ 2020-02-07 21:40   ` Daniel Borkmann
  2020-02-09  9:58     ` Magnus Karlsson
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2020-02-07 21:40 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Magnus Karlsson
  Cc: xdp-newbies, rgoodfel, bjorn.topel, tariqt, saeedm, moshe

On 2/7/20 1:34 PM, Maxim Mikityanskiy wrote:
> On 2020-02-07 11:37, Magnus Karlsson wrote:
>> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
>> introduced a much more lazy way of updating the global consumer
>> pointers from the kernel side, by only doing so when running out of
>> entries in the fill or Tx rings (the rings consumed by the
>> kernel). This can result in a deadlock with the user application if
>> the kernel requires more than one entry to proceed and the application
>> cannot put these entries in the fill ring because the kernel has not
>> updated the global consumer pointer since the ring is not empty.
[...]
> 
> Acked-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> 
> Is it intentional that you didn't send it to bpf and netdev mailing lists?

Yep, please resend with Maxim's ACK to bpf + netdev in Cc. Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
  2020-02-07 21:40   ` Daniel Borkmann
@ 2020-02-09  9:58     ` Magnus Karlsson
  2020-02-10 13:43       ` Maxim Mikityanskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Magnus Karlsson @ 2020-02-09  9:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Maxim Mikityanskiy, Magnus Karlsson, Xdp, Ryan Goodfellow,
	Björn Töpel, Tariq Toukan, Saeed Mahameed,
	Moshe Shemesh

On Fri, Feb 7, 2020 at 10:41 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 2/7/20 1:34 PM, Maxim Mikityanskiy wrote:
> > On 2020-02-07 11:37, Magnus Karlsson wrote:
> >> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> >> introduced a much more lazy way of updating the global consumer
> >> pointers from the kernel side, by only doing so when running out of
> >> entries in the fill or Tx rings (the rings consumed by the
> >> kernel). This can result in a deadlock with the user application if
> >> the kernel requires more than one entry to proceed and the application
> >> cannot put these entries in the fill ring because the kernel has not
> >> updated the global consumer pointer since the ring is not empty.
> [...]
> >
> > Acked-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> >
> > Is it intentional that you didn't send it to bpf and netdev mailing lists?
>
> Yep, please resend with Maxim's ACK to bpf + netdev in Cc. Thanks!

It was intentional, but maybe confusing. In the mail just before this
patch I suggested that this patch should be part of a patch set that
we send to bpf and netdev. The purpose of sending it was for you Maxim
to ok it, and you did with your ack :-). But I will repeat the other
questions from that mail here.

Does the Mellanox driver set the need_wakeup flag on Rx when it needs
more buffers in the fill ring to form its stride and the HW Rx ring is
empty? If yes, great. If not, please submit a patch for this.

I will produce a documentation patch that clarifies that when the
need_wakeup flag is set for the fill ring, the user need to put more
packets on the fill ring and wakeup the kernel. It is already
mentioned in the documentation, but I will make it more explicit.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
  2020-02-09  9:58     ` Magnus Karlsson
@ 2020-02-10 13:43       ` Maxim Mikityanskiy
  2020-02-10 14:02         ` Magnus Karlsson
  0 siblings, 1 reply; 6+ messages in thread
From: Maxim Mikityanskiy @ 2020-02-10 13:43 UTC (permalink / raw)
  To: Magnus Karlsson, Magnus Karlsson
  Cc: Daniel Borkmann, Xdp, Ryan Goodfellow, Björn Töpel,
	Tariq Toukan, Saeed Mahameed, Moshe Shemesh

On 2020-02-09 11:58, Magnus Karlsson wrote:
> On Fri, Feb 7, 2020 at 10:41 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> On 2/7/20 1:34 PM, Maxim Mikityanskiy wrote:
>>> On 2020-02-07 11:37, Magnus Karlsson wrote:
>>>> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
>>>> introduced a much more lazy way of updating the global consumer
>>>> pointers from the kernel side, by only doing so when running out of
>>>> entries in the fill or Tx rings (the rings consumed by the
>>>> kernel). This can result in a deadlock with the user application if
>>>> the kernel requires more than one entry to proceed and the application
>>>> cannot put these entries in the fill ring because the kernel has not
>>>> updated the global consumer pointer since the ring is not empty.
>> [...]
>>>
>>> Acked-by: Maxim Mikityanskiy <maximmi@mellanox.com>
>>>
>>> Is it intentional that you didn't send it to bpf and netdev mailing lists?
>>
>> Yep, please resend with Maxim's ACK to bpf + netdev in Cc. Thanks!
> 
> It was intentional, but maybe confusing. In the mail just before this
> patch I suggested that this patch should be part of a patch set that
> we send to bpf and netdev. The purpose of sending it was for you Maxim
> to ok it, and you did with your ack :-). But I will repeat the other
> questions from that mail here.

OK, I see. Sorry, I didn't see the previous email (and still can't find 
it, BTW).

> Does the Mellanox driver set the need_wakeup flag on Rx when it needs
> more buffers in the fill ring to form its stride and the HW Rx ring is
> empty? If yes, great. If not, please submit a patch for this.

Yes, it does (regardless of emptiness of the HW RX ring). If 
xsk_umem_has_addrs_rq returns false, the driver sets the need_wakeup flag.

> I will produce a documentation patch that clarifies that when the
> need_wakeup flag is set for the fill ring, the user need to put more
> packets on the fill ring and wakeup the kernel. It is already
> mentioned in the documentation, but I will make it more explicit.
> 

Great, thanks!

There's still room for optimization here, though: how many is "more"? If 
the application puts them one by one and wakes up the driver every time, 
it's not very effective comparing to the case if it knew the exact 
amount of missing frames. On the other hand, normally the application 
would put enough frames in advance, and it shouldn't get to this point.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed
  2020-02-10 13:43       ` Maxim Mikityanskiy
@ 2020-02-10 14:02         ` Magnus Karlsson
  0 siblings, 0 replies; 6+ messages in thread
From: Magnus Karlsson @ 2020-02-10 14:02 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Magnus Karlsson, Daniel Borkmann, Xdp, Ryan Goodfellow,
	Björn Töpel, Tariq Toukan, Saeed Mahameed,
	Moshe Shemesh

On Mon, Feb 10, 2020 at 2:43 PM Maxim Mikityanskiy <maximmi@mellanox.com> wrote:
>
> On 2020-02-09 11:58, Magnus Karlsson wrote:
> > On Fri, Feb 7, 2020 at 10:41 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> On 2/7/20 1:34 PM, Maxim Mikityanskiy wrote:
> >>> On 2020-02-07 11:37, Magnus Karlsson wrote:
> >>>> The commit 4b638f13bab4 ("xsk: Eliminate the RX batch size")
> >>>> introduced a much more lazy way of updating the global consumer
> >>>> pointers from the kernel side, by only doing so when running out of
> >>>> entries in the fill or Tx rings (the rings consumed by the
> >>>> kernel). This can result in a deadlock with the user application if
> >>>> the kernel requires more than one entry to proceed and the application
> >>>> cannot put these entries in the fill ring because the kernel has not
> >>>> updated the global consumer pointer since the ring is not empty.
> >> [...]
> >>>
> >>> Acked-by: Maxim Mikityanskiy <maximmi@mellanox.com>
> >>>
> >>> Is it intentional that you didn't send it to bpf and netdev mailing lists?
> >>
> >> Yep, please resend with Maxim's ACK to bpf + netdev in Cc. Thanks!
> >
> > It was intentional, but maybe confusing. In the mail just before this
> > patch I suggested that this patch should be part of a patch set that
> > we send to bpf and netdev. The purpose of sending it was for you Maxim
> > to ok it, and you did with your ack :-). But I will repeat the other
> > questions from that mail here.
>
> OK, I see. Sorry, I didn't see the previous email (and still can't find
> it, BTW).
>
> > Does the Mellanox driver set the need_wakeup flag on Rx when it needs
> > more buffers in the fill ring to form its stride and the HW Rx ring is
> > empty? If yes, great. If not, please submit a patch for this.
>
> Yes, it does (regardless of emptiness of the HW RX ring). If
> xsk_umem_has_addrs_rq returns false, the driver sets the need_wakeup flag.

Perfect. I will then submit the patch to the netdev mailing list.
Please ack it again there.

> > I will produce a documentation patch that clarifies that when the
> > need_wakeup flag is set for the fill ring, the user need to put more
> > packets on the fill ring and wakeup the kernel. It is already
> > mentioned in the documentation, but I will make it more explicit.
> >
>
> Great, thanks!
>
> There's still room for optimization here, though: how many is "more"? If
> the application puts them one by one and wakes up the driver every time,
> it's not very effective comparing to the case if it knew the exact
> amount of missing frames. On the other hand, normally the application
> would put enough frames in advance, and it shouldn't get to this point.

Yes, there are many ways of messing up performance ;-). Ignoring the
advice of using batching is one of them.

Thanks: Magnus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-10 14:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07  9:37 [PATCH bpf] xsk: publish global consumer pointers when NAPI is finsihed Magnus Karlsson
2020-02-07 12:34 ` Maxim Mikityanskiy
2020-02-07 21:40   ` Daniel Borkmann
2020-02-09  9:58     ` Magnus Karlsson
2020-02-10 13:43       ` Maxim Mikityanskiy
2020-02-10 14:02         ` Magnus Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.