netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* low latency/busy poll feedback and bugs
@ 2013-08-05 21:22 Shawn Bohrer
  2013-08-05 22:16 ` [PATCH net-next] net: Add low-latency/polling support for UDP multicast Shawn Bohrer
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Shawn Bohrer @ 2013-08-05 21:22 UTC (permalink / raw)
  To: Amir Vadai; +Cc: eliezer.tamir, netdev

I did a little testing of the new low latency/busy poll sockets today
and found a few things that surprised me and at least one bug.

1) I'm testing with a Mellanox ConnectX-3 card.  Currently polling
with mlx4_en is broken when GRO is enabled.  In
mlx4_en_process_rx_cq() when GRO is enabled skb_mark_napi_id() is
never called.  It appears like low latency sockets with GRO is
supposed to work because the following code checks that we are not
ll_polling:

 /* This packet is eligible for GRO if it is:
   * - DIX Ethernet (type interpretation)
   * - TCP/IP (v4)
   * - without IP options
   * - not an IP fragment
   * - no LLS polling in progress
   */
  if (!mlx4_en_cq_ll_polling(cq) &&
      (dev->features & NETIF_F_GRO)) {

However since we never call skb_mark_napi_id() mlx4_en_cq_ll_polling()
will never be true.

2) Why is LowLatencyRxPackets reported as a TcpExt stat?  Perhaps I've
been confused and misguided but I've always assumed those are
statistics related to TCP and this feature is protocol neutral.  I'm
not entirely sure where it should be moved to perhaps IpExt?

3) I don't know if this was intentional, an oversight, or simply a
missing feature but UDP multicast currently is not supported.  In
order to add support I believe you would need to call
sk_mark_napi_id() in __udp4_lib_mcast_deliver().  Assuming there isn't
some intentional reason this wasn't done I'd be happy to test this and
send a patch.

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-05 21:22 low latency/busy poll feedback and bugs Shawn Bohrer
@ 2013-08-05 22:16 ` Shawn Bohrer
  2013-08-06  7:13   ` Eliezer Tamir
  2013-08-06  7:41 ` low latency/busy poll feedback and bugs Eliezer Tamir
  2013-08-06 12:15 ` Amir Vadai
  2 siblings, 1 reply; 17+ messages in thread
From: Shawn Bohrer @ 2013-08-05 22:16 UTC (permalink / raw)
  To: davem; +Cc: eliezer.tamir, netdev, Amir Vadai, tomk, Shawn Bohrer

Set the napi id for each socket in the multicast path to enable
low-latency/polling support.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
 net/ipv4/udp.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 766e6ba..0d0da17 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1596,6 +1596,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
 	while (sk) {
+		sk_mark_napi_id(sk, skb);
 		stack[count++] = sk;
 		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
 				       daddr, uh->source, saddr, dif);
-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-05 22:16 ` [PATCH net-next] net: Add low-latency/polling support for UDP multicast Shawn Bohrer
@ 2013-08-06  7:13   ` Eliezer Tamir
  2013-08-06 19:51     ` [PATCH v2 " Shawn Bohrer
  0 siblings, 1 reply; 17+ messages in thread
From: Eliezer Tamir @ 2013-08-06  7:13 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: davem, netdev, Amir Vadai, tomk

On 06/08/2013 01:16, Shawn Bohrer wrote:
> Set the napi id for each socket in the multicast path to enable
> low-latency/polling support.
> 
> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> ---
>  net/ipv4/udp.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 766e6ba..0d0da17 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1596,6 +1596,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	dif = skb->dev->ifindex;
>  	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
>  	while (sk) {
> +		sk_mark_napi_id(sk, skb);
>  		stack[count++] = sk;
>  		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
>  				       daddr, uh->source, saddr, dif);
> 

Looks OK.
To be complete, you want to do the same for __udp6_lib_mcast_deliver().

-Eliezer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-05 21:22 low latency/busy poll feedback and bugs Shawn Bohrer
  2013-08-05 22:16 ` [PATCH net-next] net: Add low-latency/polling support for UDP multicast Shawn Bohrer
@ 2013-08-06  7:41 ` Eliezer Tamir
  2013-08-06 18:08   ` Shawn Bohrer
  2013-08-06 20:39   ` Or Gerlitz
  2013-08-06 12:15 ` Amir Vadai
  2 siblings, 2 replies; 17+ messages in thread
From: Eliezer Tamir @ 2013-08-06  7:41 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: Amir Vadai, netdev

On 06/08/2013 00:22, Shawn Bohrer wrote:
> 1) I'm testing with a Mellanox ConnectX-3 card.

There's your problem ;)

> 2) Why is LowLatencyRxPackets reported as a TcpExt stat?  Perhaps I've
> been confused and misguided but I've always assumed those are
> statistics related to TCP and this feature is protocol neutral.  I'm
> not entirely sure where it should be moved to perhaps IpExt?

Actually, after all of the rewrite this has gone through,
it's now at the Ethernet level, not even IP specific.

So where should it go?

Should we also rename this to BusyPollRxPackets?

> 3) I don't know if this was intentional, an oversight, or simply a
> missing feature but UDP multicast currently is not supported.  In
> order to add support I believe you would need to call
> sk_mark_napi_id() in __udp4_lib_mcast_deliver().  Assuming there isn't
> some intentional reason this wasn't done I'd be happy to test this and
> send a patch.

This is still WIP, so our goal was to make it easy to extend for new
cases and protocols.

For multicast, it is possible that incoming packets to come from more
than one port (and therefore more than one queue).
I'm not sure how we could handle that, but what we have today won't do
well for that use-case.

What do you use for testing?

-Eliezer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-05 21:22 low latency/busy poll feedback and bugs Shawn Bohrer
  2013-08-05 22:16 ` [PATCH net-next] net: Add low-latency/polling support for UDP multicast Shawn Bohrer
  2013-08-06  7:41 ` low latency/busy poll feedback and bugs Eliezer Tamir
@ 2013-08-06 12:15 ` Amir Vadai
  2 siblings, 0 replies; 17+ messages in thread
From: Amir Vadai @ 2013-08-06 12:15 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: eliezer.tamir, netdev

On 06/08/2013 00:22, Shawn Bohrer wrote:
> I did a little testing of the new low latency/busy poll sockets today
> and found a few things that surprised me and at least one bug.
> 
> 1) I'm testing with a Mellanox ConnectX-3 card.  
Of course :)

> Currently polling
> with mlx4_en is broken when GRO is enabled.  In
> mlx4_en_process_rx_cq() when GRO is enabled skb_mark_napi_id() is
> never called.  It appears like low latency sockets with GRO is
> supposed to work because the following code checks that we are not
> ll_polling:
> 
>  /* This packet is eligible for GRO if it is:
>    * - DIX Ethernet (type interpretation)
>    * - TCP/IP (v4)
>    * - without IP options
>    * - not an IP fragment
>    * - no LLS polling in progress
>    */
>   if (!mlx4_en_cq_ll_polling(cq) &&
>       (dev->features & NETIF_F_GRO)) {
> 
> However since we never call skb_mark_napi_id() mlx4_en_cq_ll_polling()
> will never be true.

Currently GRO and LLS are mutually exclusive, and shouldn't be used
together.
I will send a fix to make the code clearer soon.

Thanks for reporting,
Amir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-06  7:41 ` low latency/busy poll feedback and bugs Eliezer Tamir
@ 2013-08-06 18:08   ` Shawn Bohrer
  2013-08-06 18:25     ` Eliezer Tamir
  2013-08-06 20:39   ` Or Gerlitz
  1 sibling, 1 reply; 17+ messages in thread
From: Shawn Bohrer @ 2013-08-06 18:08 UTC (permalink / raw)
  To: Eliezer Tamir; +Cc: Amir Vadai, netdev

On Tue, Aug 06, 2013 at 10:41:48AM +0300, Eliezer Tamir wrote:
> On 06/08/2013 00:22, Shawn Bohrer wrote:
> > 3) I don't know if this was intentional, an oversight, or simply a
> > missing feature but UDP multicast currently is not supported.  In
> > order to add support I believe you would need to call
> > sk_mark_napi_id() in __udp4_lib_mcast_deliver().  Assuming there isn't
> > some intentional reason this wasn't done I'd be happy to test this and
> > send a patch.
> 
> This is still WIP, so our goal was to make it easy to extend for new
> cases and protocols.
> 
> For multicast, it is possible that incoming packets to come from more
> than one port (and therefore more than one queue).
> I'm not sure how we could handle that, but what we have today won't do
> well for that use-case.
 
It is unclear to me exactly what happens in this case.  With my simple
patch I'm assuming it will spin on the receive queue that received the
last packet for that socket.  What happens when a packet arrives on a
different receive queue than the one we were spinning on? I assume it
is still delivered but perhaps the spinning process won't get it until
the spinning time expires?  I'm just guessing and haven't attempted to
figure it out from looking through the code.

I put together a small test case with two senders and a single
receiver, and visually (by watching /proc/interrups) verified that
their traffic went to two different queues.  The receiver received all
of the packets with busy_read enabled so it appears that it at least
superficially works.  I did not verify the effect on latency.

> What do you use for testing?

In fio 2.1.2 [1] I added support for UDP multicast.  It's not quite as
flexible as I would like but you can still test a number of scenarios
like the one above or do a basic pingpong test.

Here are my fio job files for a pingpong test:

$ cat mcudp_rr_receive
[global]
ioengine=net
protocol=udp
bs=64
size=100m
# IP address of interface to receive packets
#interface=10.8.16.21
rw=read

[pingpong]
pingpong=1
port=10000
hostname=239.0.0.0

$ cat mcudp_rr_send
[global]
ioengine=net
protocol=udp
bs=64
size=100m
# IP address of interface to send packets
#interface=10.8.16.22
rw=write

[pingpong]
pingpong=1
port=10000
hostname=239.0.0.0

Just start the receiver on one host then start the sender on a second
host:

[host1] $ fio mcudp_rr_receive

[host2] $ fio mcudp_rr_send

[1] http://brick.kernel.dk/snaps/fio-2.1.2.tar.bz2

--
Shawn

-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-06 18:08   ` Shawn Bohrer
@ 2013-08-06 18:25     ` Eliezer Tamir
  2013-08-07 20:05       ` Ben Hutchings
  0 siblings, 1 reply; 17+ messages in thread
From: Eliezer Tamir @ 2013-08-06 18:25 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: Amir Vadai, netdev

On 06/08/2013 21:08, Shawn Bohrer wrote:
> On Tue, Aug 06, 2013 at 10:41:48AM +0300, Eliezer Tamir wrote:
>> For multicast, it is possible that incoming packets to come from more
>> than one port (and therefore more than one queue).
>> I'm not sure how we could handle that, but what we have today won't do
>> well for that use-case.
>  
> It is unclear to me exactly what happens in this case.  With my simple
> patch I'm assuming it will spin on the receive queue that received the
> last packet for that socket.  What happens when a packet arrives on a
> different receive queue than the one we were spinning on? I assume it
> is still delivered but perhaps the spinning process won't get it until
> the spinning time expires?  I'm just guessing and haven't attempted to
> figure it out from looking through the code.

What will happen is that the current code will only busy poll on one
queue, sometimes on this one, sometimes on that one.

packets arriving on the other queue will still be serviced but will
suffer the latency of waiting for NAPI to schedule.

So your avg will be better, but your std. dev. much worse and it's
probably not worth it if you really expect two devices to receive
data at the same time.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-06  7:13   ` Eliezer Tamir
@ 2013-08-06 19:51     ` Shawn Bohrer
  2013-08-07 20:22       ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Shawn Bohrer @ 2013-08-06 19:51 UTC (permalink / raw)
  To: davem; +Cc: eliezer.tamir, netdev, Amir Vadai, tomk, Shawn Bohrer

Set the napi id for each socket in the multicast path to enable
low-latency/polling support.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
v2 include ipv6 support

 net/ipv4/udp.c |    1 +
 net/ipv6/udp.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 766e6ba..0d0da17 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1596,6 +1596,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	dif = skb->dev->ifindex;
 	sk = udp_v4_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
 	while (sk) {
+		sk_mark_napi_id(sk, skb);
 		stack[count++] = sk;
 		sk = udp_v4_mcast_next(net, sk_nulls_next(sk), uh->dest,
 				       daddr, uh->source, saddr, dif);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f405815..82be372 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -756,6 +756,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	dif = inet6_iif(skb);
 	sk = udp_v6_mcast_next(net, sk, uh->dest, daddr, uh->source, saddr, dif);
 	while (sk) {
+		sk_mark_napi_id(sk, skb);
 		stack[count++] = sk;
 		sk = udp_v6_mcast_next(net, sk_nulls_next(sk), uh->dest, daddr,
 				       uh->source, saddr, dif);
-- 
1.7.7.6


-- 

---------------------------------------------------------------
This email, along with any attachments, is confidential. If you 
believe you received this message in error, please contact the 
sender immediately and delete all copies of the message.  
Thank you.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-06  7:41 ` low latency/busy poll feedback and bugs Eliezer Tamir
  2013-08-06 18:08   ` Shawn Bohrer
@ 2013-08-06 20:39   ` Or Gerlitz
  2013-08-06 21:02     ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Or Gerlitz @ 2013-08-06 20:39 UTC (permalink / raw)
  To: Eliezer Tamir
  Cc: Shawn Bohrer, Amir Vadai, netdev, Kirsher, Jeffrey T,
	Alexander Duyck, John Fastabend, Rose, Gregory V

On Tue, Aug 6, 2013 at 10:41 AM, Eliezer Tamir
<eliezer.tamir@linux.intel.com> wrote:
> On 06/08/2013 00:22, Shawn Bohrer wrote:
>> 1) I'm testing with a Mellanox ConnectX-3 card.
>
> There's your problem ;)

Can you be more specific? I love testing with Intel Niantic, do code
inspetion of the ixgbe driver and interact with Jeff/Greg/Alex/John
and all the other great folks from the Intel Linux networking team,
actually that (happily working with your competitors) is crucial part
of the fun in open source / upstream work. What's wrong with using
NICs which don't have intel.com on their paper cover?

Or.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-06 20:39   ` Or Gerlitz
@ 2013-08-06 21:02     ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2013-08-06 21:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eliezer Tamir, Shawn Bohrer, Amir Vadai, netdev, Kirsher,
	Jeffrey T, Alexander Duyck, John Fastabend, Rose, Gregory V

On Tue, 2013-08-06 at 23:39 +0300, Or Gerlitz wrote:
> On Tue, Aug 6, 2013 at 10:41 AM, Eliezer Tamir
> <eliezer.tamir@linux.intel.com> wrote:
> > On 06/08/2013 00:22, Shawn Bohrer wrote:
> >> 1) I'm testing with a Mellanox ConnectX-3 card.
> >
> > There's your problem ;)
> 
> Can you be more specific? I love testing with Intel Niantic, do code
> inspetion of the ixgbe driver and interact with Jeff/Greg/Alex/John
> and all the other great folks from the Intel Linux networking team,
> actually that (happily working with your competitors) is crucial part
> of the fun in open source / upstream work. What's wrong with using
> NICs which don't have intel.com on their paper cover?

I don't think there is something wrong.

Eliezer tone was quite clear ;)

BTW, I do not have a Niantic cards to play with, Intel guys feel free to
send me samples ;)

Thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-06 18:25     ` Eliezer Tamir
@ 2013-08-07 20:05       ` Ben Hutchings
  2013-08-07 20:23         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Ben Hutchings @ 2013-08-07 20:05 UTC (permalink / raw)
  To: Eliezer Tamir; +Cc: Shawn Bohrer, Amir Vadai, netdev

On Tue, 2013-08-06 at 21:25 +0300, Eliezer Tamir wrote:
> On 06/08/2013 21:08, Shawn Bohrer wrote:
> > On Tue, Aug 06, 2013 at 10:41:48AM +0300, Eliezer Tamir wrote:
> >> For multicast, it is possible that incoming packets to come from more
> >> than one port (and therefore more than one queue).
> >> I'm not sure how we could handle that, but what we have today won't do
> >> well for that use-case.
> >  
> > It is unclear to me exactly what happens in this case.  With my simple
> > patch I'm assuming it will spin on the receive queue that received the
> > last packet for that socket.  What happens when a packet arrives on a
> > different receive queue than the one we were spinning on? I assume it
> > is still delivered but perhaps the spinning process won't get it until
> > the spinning time expires?  I'm just guessing and haven't attempted to
> > figure it out from looking through the code.
> 
> What will happen is that the current code will only busy poll on one
> queue, sometimes on this one, sometimes on that one.
> 
> packets arriving on the other queue will still be serviced but will
> suffer the latency of waiting for NAPI to schedule.
> 
> So your avg will be better, but your std. dev. much worse and it's
> probably not worth it if you really expect two devices to receive
> data at the same time.

It seems like sk_mark_napi_id() should only be called on connected
sockets for now.

At least Solarflare controllers have 'wildcard' filters that can match
destination address (host, l4proto, port) only.  Perhaps ARFS could be
extended to include steering based on destination address when there is
a single unconnected socket bound to that address.  When that is
successful, busy- polling a single NAPI context should work nicely.

Where there are multiple unconnected sockets bound to the same unicast
address, it might make sense for polling on those sockets to prefer the
'local' NAPI context according to CPU topology.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-06 19:51     ` [PATCH v2 " Shawn Bohrer
@ 2013-08-07 20:22       ` Eric Dumazet
  2013-08-08  8:46         ` Eliezer Tamir
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-08-07 20:22 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: davem, eliezer.tamir, netdev, Amir Vadai, tomk

On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote:
> Set the napi id for each socket in the multicast path to enable
> low-latency/polling support.
> 
> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> ---
> v2 include ipv6 support

This might help your workload, but I doubt it is generic enough.

One UDP socket is supposed to receive traffic from many endpoints,
so we have no guarantee all the received traffic will end on a single RX
queue on the NIC.

That's the same logic than RFS here.

sk_mark_napi_id() in UDP are wrong IMHO.

It should be guarded by the following test in  
__udp_queue_rcv_skb()

if (inet_sk(sk)->inet_daddr) {
    sock_rps_save_rxhash(sk, skb);
    sk_mark_napi_id(sk, skb);
}

(To occur only for connected UDP sockets, where we are 100% sure all
packets will use this same rxhash/rx queue)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-07 20:05       ` Ben Hutchings
@ 2013-08-07 20:23         ` Eric Dumazet
  2013-08-07 23:41           ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-08-07 20:23 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Eliezer Tamir, Shawn Bohrer, Amir Vadai, netdev

On Wed, 2013-08-07 at 22:05 +0200, Ben Hutchings wrote:

> It seems like sk_mark_napi_id() should only be called on connected
> sockets for now.

I concur.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: low latency/busy poll feedback and bugs
  2013-08-07 20:23         ` Eric Dumazet
@ 2013-08-07 23:41           ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2013-08-07 23:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhutchings, eliezer.tamir, sbohrer, amirv, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 07 Aug 2013 13:23:25 -0700

> On Wed, 2013-08-07 at 22:05 +0200, Ben Hutchings wrote:
> 
>> It seems like sk_mark_napi_id() should only be called on connected
>> sockets for now.
> 
> I concur.

Me too.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-07 20:22       ` Eric Dumazet
@ 2013-08-08  8:46         ` Eliezer Tamir
  2013-08-08 23:55           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Eliezer Tamir @ 2013-08-08  8:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shawn Bohrer, davem, netdev, Amir Vadai, tomk


On 07/08/2013 23:22, Eric Dumazet wrote:
> On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote:
>> Set the napi id for each socket in the multicast path to enable
>> low-latency/polling support.
>>
>> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
>> ---
>> v2 include ipv6 support
> 
> This might help your workload, but I doubt it is generic enough.
> 
> One UDP socket is supposed to receive traffic from many endpoints,
> so we have no guarantee all the received traffic will end on a single RX
> queue on the NIC.
> 
> That's the same logic than RFS here.
> 
> sk_mark_napi_id() in UDP are wrong IMHO.
> 
> It should be guarded by the following test in  
> __udp_queue_rcv_skb()
> 
> if (inet_sk(sk)->inet_daddr) {
>     sock_rps_save_rxhash(sk, skb);
>     sk_mark_napi_id(sk, skb);
> }
> 
> (To occur only for connected UDP sockets, where we are 100% sure all
> packets will use this same rxhash/rx queue)

This would also be safe if there is only one NIC and said NIC was
programmed to always place this socket's data on the same queue.

I don't have a good suggestion on how to detect this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-08  8:46         ` Eliezer Tamir
@ 2013-08-08 23:55           ` Eric Dumazet
  2013-08-11  7:59             ` Eliezer Tamir
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2013-08-08 23:55 UTC (permalink / raw)
  To: Eliezer Tamir; +Cc: Shawn Bohrer, davem, netdev, Amir Vadai, tomk

On Thu, 2013-08-08 at 11:46 +0300, Eliezer Tamir wrote:
> On 07/08/2013 23:22, Eric Dumazet wrote:
> > On Tue, 2013-08-06 at 14:51 -0500, Shawn Bohrer wrote:
> >> Set the napi id for each socket in the multicast path to enable
> >> low-latency/polling support.
> >>
> >> Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
> >> ---
> >> v2 include ipv6 support
> > 
> > This might help your workload, but I doubt it is generic enough.
> > 
> > One UDP socket is supposed to receive traffic from many endpoints,
> > so we have no guarantee all the received traffic will end on a single RX
> > queue on the NIC.
> > 
> > That's the same logic than RFS here.
> > 
> > sk_mark_napi_id() in UDP are wrong IMHO.
> > 
> > It should be guarded by the following test in  
> > __udp_queue_rcv_skb()
> > 
> > if (inet_sk(sk)->inet_daddr) {
> >     sock_rps_save_rxhash(sk, skb);
> >     sk_mark_napi_id(sk, skb);
> > }
> > 
> > (To occur only for connected UDP sockets, where we are 100% sure all
> > packets will use this same rxhash/rx queue)
> 
> This would also be safe if there is only one NIC and said NIC was
> programmed to always place this socket's data on the same queue.
> 
> I don't have a good suggestion on how to detect this.

Well, this stuff relies on flows being correctly steered.

TCP stack performs much better if we avoid reorders ;)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 net-next] net: Add low-latency/polling support for UDP multicast
  2013-08-08 23:55           ` Eric Dumazet
@ 2013-08-11  7:59             ` Eliezer Tamir
  0 siblings, 0 replies; 17+ messages in thread
From: Eliezer Tamir @ 2013-08-11  7:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Shawn Bohrer, davem, netdev, Amir Vadai, tomk

On 09/08/2013 02:55, Eric Dumazet wrote:
> On Thu, 2013-08-08 at 11:46 +0300, Eliezer Tamir wrote:
>> On 07/08/2013 23:22, Eric Dumazet wrote:
>>> sk_mark_napi_id() in UDP are wrong IMHO.
>>>
>>> It should be guarded by the following test in  
>>> __udp_queue_rcv_skb()
>>>
>>> if (inet_sk(sk)->inet_daddr) {
>>>     sock_rps_save_rxhash(sk, skb);
>>>     sk_mark_napi_id(sk, skb);
>>> }
>>>
>>> (To occur only for connected UDP sockets, where we are 100% sure all
>>> packets will use this same rxhash/rx queue)
>>
>> This would also be safe if there is only one NIC and said NIC was
>> programmed to always place this socket's data on the same queue.
>>
>> I don't have a good suggestion on how to detect this.
> 
> Well, this stuff relies on flows being correctly steered.
> 
> TCP stack performs much better if we avoid reorders ;)

Let's limit the discussion to UDP for now.
If you are getting packets on multiple queues for a TCP socket,
it's hard to see how you can avoid caused reordering.

Maybe it would be enough to have a socket option that the user
can set to tell the stack "I know what I'm doing, please allow busy
polling on this UDP socket, even though it's not bound".

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-11  7:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-05 21:22 low latency/busy poll feedback and bugs Shawn Bohrer
2013-08-05 22:16 ` [PATCH net-next] net: Add low-latency/polling support for UDP multicast Shawn Bohrer
2013-08-06  7:13   ` Eliezer Tamir
2013-08-06 19:51     ` [PATCH v2 " Shawn Bohrer
2013-08-07 20:22       ` Eric Dumazet
2013-08-08  8:46         ` Eliezer Tamir
2013-08-08 23:55           ` Eric Dumazet
2013-08-11  7:59             ` Eliezer Tamir
2013-08-06  7:41 ` low latency/busy poll feedback and bugs Eliezer Tamir
2013-08-06 18:08   ` Shawn Bohrer
2013-08-06 18:25     ` Eliezer Tamir
2013-08-07 20:05       ` Ben Hutchings
2013-08-07 20:23         ` Eric Dumazet
2013-08-07 23:41           ` David Miller
2013-08-06 20:39   ` Or Gerlitz
2013-08-06 21:02     ` Eric Dumazet
2013-08-06 12:15 ` Amir Vadai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).