netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* udp: Question about busy_poll change
@ 2014-04-09 14:13 Edward Cree
  2014-04-09 14:51 ` Shawn Bohrer
  0 siblings, 1 reply; 8+ messages in thread
From: Edward Cree @ 2014-04-09 14:13 UTC (permalink / raw)
  To: netdev, Shawn Bohrer; +Cc: Jonathan Cooper

Commit 005ec9743394010cd37d86c3fd2e81978231cdbf, "udp: Only allow busy
read/poll on connected sockets",
causes a performance regression (increased latency) on some
micro-benchmarks which don't connect() their UDP socket, and might well
have a similar effect on real applications that do the same thing.
As far as I can tell, the change is only needed in the case where the
UDP socket is bound to INADDR_ANY, _and_ traffic on the chosen UDP port
is received through more than one NIC (or perhaps through a NIC with
separate NAPI contexts per receive queue?)
In particular, busy polling makes sense for a client (which will only be
receiving packets from one remote address even though the socket is
unconnected), or for a socket which has been bound to an interface's
address (at least in the case of sfc, where we have one NAPI context for
all the receive queues on an interface).

So, what was the deeper rationale for this change?  Is there a
correctness issue or does the old behaviour just affect performance
through unnecessary busy_polling?  Or have I just misunderstood things
completely?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: udp: Question about busy_poll change
  2014-04-09 14:13 udp: Question about busy_poll change Edward Cree
@ 2014-04-09 14:51 ` Shawn Bohrer
  2014-04-09 16:20   ` Edward Cree
  0 siblings, 1 reply; 8+ messages in thread
From: Shawn Bohrer @ 2014-04-09 14:51 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Shawn Bohrer, Jonathan Cooper, eric.dumazet

On Wed, Apr 09, 2014 at 03:13:21PM +0100, Edward Cree wrote:
> Commit 005ec9743394010cd37d86c3fd2e81978231cdbf, "udp: Only allow busy
> read/poll on connected sockets",
> causes a performance regression (increased latency) on some
> micro-benchmarks which don't connect() their UDP socket, and might well
> have a similar effect on real applications that do the same thing.
> As far as I can tell, the change is only needed in the case where the
> UDP socket is bound to INADDR_ANY, _and_ traffic on the chosen UDP port
> is received through more than one NIC (or perhaps through a NIC with
> separate NAPI contexts per receive queue?)

Yep.  The separate NAPI contexts per receive queue part was one of the
main reasons, though I suppose the INADDR_ANY case is relevant as
well.

> In particular, busy polling makes sense for a client (which will only be
> receiving packets from one remote address even though the socket is
> unconnected), or for a socket which has been bound to an interface's
> address (at least in the case of sfc, where we have one NAPI context for
> all the receive queues on an interface).

I agree that it makes sense in this case but if you meet these
requirements then you can also connect your UDP socket.  The real
problem is that there is no way for the kernel to know that you will
only receive packets from a single remote address, so you have to
connect.

I believe the sfc case where you only have a single NAPI context is
also valid and it seems reasonable to me that if you can detect that
specific case that busy polling could be allowed.  I'm not sure how to
detect this.  I'm sure patches are welcome.

> So, what was the deeper rationale for this change?  Is there a
> correctness issue or does the old behaviour just affect performance
> through unnecessary busy_polling?  Or have I just misunderstood things
> completely?

If we are spinning on a NAPI context and a packet arrives in a
different rx queue then you'll get unpredictable latencies and
out of order packets.  For the people using this feature that is
probably not desirable.

--
Shawn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: udp: Question about busy_poll change
  2014-04-09 14:51 ` Shawn Bohrer
@ 2014-04-09 16:20   ` Edward Cree
  2014-04-10 18:04     ` [RFC PATCH] udp: allow busy_poll on some unconnected sockets Edward Cree
  0 siblings, 1 reply; 8+ messages in thread
From: Edward Cree @ 2014-04-09 16:20 UTC (permalink / raw)
  To: Shawn Bohrer; +Cc: netdev, Shawn Bohrer, Jonathan Cooper, eric.dumazet

On 09/04/14 15:51, Shawn Bohrer wrote:
> I believe the sfc case where you only have a single NAPI context is
> also valid and it seems reasonable to me that if you can detect that
> specific case that busy polling could be allowed.  I'm not sure how to
> detect this.  I'm sure patches are welcome.
I think that to detect this we would have to have (a) a flag in the
struct sock to say that "this socket is bound to a real address" (ie.
not INADDR_ANY, multicast or anycast), (b) a flag in the skb to say that
"the driver that received this packet only uses one NAPI context per
intf.  Then if a && b you can fill in sk_napi_id even on unconnected
sockets.  If this sounds reasonable, I'll put a patch together.
> If we are spinning on a NAPI context and a packet arrives in a
> different rx queue then you'll get unpredictable latencies and
> out of order packets.  For the people using this feature that is
> probably not desirable.
A further question is, what happens if the user removes the IP address
from one interface and adds it to another?  AFAICT they will keep
busy_polling the old NAPI context until a packet is received on the new
interface and updates sk->sk_napi_id; and this is still the case even if
the socket is connected.  Of course this could be a case of "don't do
that then", but I can imagine some kind of hot-failover setup doing this.
This is also why we'd need a flag in the skb and can't just find out at
bind() time what the driver is and whether it uses multiple NAPI
contexts - because when the socket moves interface, it could end up on a
different driver.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH] udp: allow busy_poll on some unconnected sockets
  2014-04-09 16:20   ` Edward Cree
@ 2014-04-10 18:04     ` Edward Cree
  2014-04-10 18:32       ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Edward Cree @ 2014-04-10 18:04 UTC (permalink / raw)
  To: netdev; +Cc: Shawn Bohrer, Shawn Bohrer, Jonathan Cooper, eric.dumazet

Specifically, if the socket has been bound to a local address, and the
 net_device only uses a single NAPI context, then we don't need to be
 connected for busy_poll to do the right thing, since all our received
 packets have to come through the same NAPI context.

[ I previously thought we would need a flag in the skb to determine if the
  net_device uses single NAPI, but in fact we can just follow skb->dev. ]

Being bound to a local address is determined by the sk's inet_saddr,
 since that appears to be set to 0 when binding to a multicast or
 broadcast address (which is what we want, because they might get packets
 from any interface).  But I'm not entirely confident this is correct.

Only implemented for IPv4, because I'm even less sure about the "are we
 bound to a real unicast address" logic in the IPv6 case.

Tested by setting IFF_SINGLE_NAPI in sfc; a UDP ping-pong test showed a
 performance benefit from sysctl net.core.busy_{read,poll}=50 in both the
 connected and unconnected case, where previously it only saw the benefit
 when the socket had been connected.
---
 include/linux/netdevice.h |  3 +++
 net/ipv4/udp.c            | 11 +++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 775cc95..d63e10e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1173,6 +1173,7 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  *	change when it's running
  * @IFF_MACVLAN: Macvlan device
+ * @IFF_SINGLE_NAPI: device uses a single NAPI context for all rx queues
  */
 enum netdev_priv_flags {
 	IFF_802_1Q_VLAN			= 1<<0,
@@ -1197,6 +1198,7 @@ enum netdev_priv_flags {
 	IFF_SUPP_NOFCS			= 1<<19,
 	IFF_LIVE_ADDR_CHANGE		= 1<<20,
 	IFF_MACVLAN			= 1<<21,
+	IFF_SINGLE_NAPI			= 1<<22,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1221,6 +1223,7 @@ enum netdev_priv_flags {
 #define IFF_SUPP_NOFCS			IFF_SUPP_NOFCS
 #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
 #define IFF_MACVLAN			IFF_MACVLAN
+#define IFF_SINGLE_NAPI			IFF_SINGLE_NAPI
 
 /*
  *	The DEVICE structure.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4468e1a..f4561fa 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1420,14 +1420,21 @@ static void udp_v4_rehash(struct sock *sk)
 	udp_lib_rehash(sk, new_hash);
 }
 
+/* If the device only uses one NAPI context for all receive queues, then
+ * we only need to be bound, not connected, to safely sk_mark_napi_id */
+static inline bool udp_single_napi(struct sock *sk, struct sk_buff *skb)
+{
+	return inet_sk(sk)->inet_saddr && (skb->dev->priv_flags & IFF_SINGLE_NAPI);
+}
+
 static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int rc;
 
-	if (inet_sk(sk)->inet_daddr) {
+	if (inet_sk(sk)->inet_daddr)
 		sock_rps_save_rxhash(sk, skb);
+	if (inet_sk(sk)->inet_daddr || udp_single_napi(sk, skb))
 		sk_mark_napi_id(sk, skb);
-	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] udp: allow busy_poll on some unconnected sockets
  2014-04-10 18:04     ` [RFC PATCH] udp: allow busy_poll on some unconnected sockets Edward Cree
@ 2014-04-10 18:32       ` Eric Dumazet
  2014-04-10 18:38         ` Edward Cree
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-04-10 18:32 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Shawn Bohrer, Shawn Bohrer, Jonathan Cooper

On Thu, 2014-04-10 at 19:04 +0100, Edward Cree wrote:

> Tested by setting IFF_SINGLE_NAPI in sfc; a UDP ping-pong test showed a
>  performance benefit from sysctl net.core.busy_{read,poll}=50 in both the
>  connected and unconnected case, where previously it only saw the benefit
>  when the socket had been connected.

Right, but how often do we have single NAPI devices on hosts wanting
very low latencies ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] udp: allow busy_poll on some unconnected sockets
  2014-04-10 18:32       ` Eric Dumazet
@ 2014-04-10 18:38         ` Edward Cree
  2014-04-10 19:04           ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Edward Cree @ 2014-04-10 18:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Shawn Bohrer, Shawn Bohrer, Jonathan Cooper

On 10/04/14 19:32, Eric Dumazet wrote:
> On Thu, 2014-04-10 at 19:04 +0100, Edward Cree wrote:
>
>> Tested by setting IFF_SINGLE_NAPI in sfc; a UDP ping-pong test showed a
>>  performance benefit from sysctl net.core.busy_{read,poll}=50 in both the
>>  connected and unconnected case, where previously it only saw the benefit
>>  when the socket had been connected.
> Right, but how often do we have single NAPI devices on hosts wanting
> very low latencies ?
>
Well, sfc only has a single NAPI context per device, and I'm fairly sure
most sfc users want very low latencies.
Or have I misunderstood?

(Note that it doesn't matter if there are other NAPI-using devices on
the host, since the socket is bound to a local address and thus is only
going to receive packets from the one device that has that address.  So
maybe IFF_SINGLE_NAPI is a bad name and it should be
IFF_DEVICE_ONLY_HAS_A_SINGLE_NAPI_CONTEXT.  But that's a bit unwieldy ;)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] udp: allow busy_poll on some unconnected sockets
  2014-04-10 18:38         ` Edward Cree
@ 2014-04-10 19:04           ` Eric Dumazet
  2014-04-11 10:44             ` Jonathan Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2014-04-10 19:04 UTC (permalink / raw)
  To: Edward Cree; +Cc: netdev, Shawn Bohrer, Shawn Bohrer, Jonathan Cooper

On Thu, 2014-04-10 at 19:38 +0100, Edward Cree wrote:
> On 10/04/14 19:32, Eric Dumazet wrote:
> > On Thu, 2014-04-10 at 19:04 +0100, Edward Cree wrote:
> >
> >> Tested by setting IFF_SINGLE_NAPI in sfc; a UDP ping-pong test showed a
> >>  performance benefit from sysctl net.core.busy_{read,poll}=50 in both the
> >>  connected and unconnected case, where previously it only saw the benefit
> >>  when the socket had been connected.
> > Right, but how often do we have single NAPI devices on hosts wanting
> > very low latencies ?
> >
> Well, sfc only has a single NAPI context per device, and I'm fairly sure
> most sfc users want very low latencies.
> Or have I misunderstood?

sfc is multi queue/channel, but has a single NAPI instance ?

Sounds wierd.

Please explain me, how GRO can be efficient.

I believe you have one napi per channel.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] udp: allow busy_poll on some unconnected sockets
  2014-04-10 19:04           ` Eric Dumazet
@ 2014-04-11 10:44             ` Jonathan Cooper
  0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cooper @ 2014-04-11 10:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Edward Cree, netdev, Shawn Bohrer, Shawn Bohrer

On 10/04/14 20:04, Eric Dumazet wrote:
> On Thu, 2014-04-10 at 19:38 +0100, Edward Cree wrote:
>> On 10/04/14 19:32, Eric Dumazet wrote:
>>> On Thu, 2014-04-10 at 19:04 +0100, Edward Cree wrote:
>>>
>>>> Tested by setting IFF_SINGLE_NAPI in sfc; a UDP ping-pong test showed a
>>>>   performance benefit from sysctl net.core.busy_{read,poll}=50 in both the
>>>>   connected and unconnected case, where previously it only saw the benefit
>>>>   when the socket had been connected.
>>> Right, but how often do we have single NAPI devices on hosts wanting
>>> very low latencies ?
>>>
>> Well, sfc only has a single NAPI context per device, and I'm fairly sure
>> most sfc users want very low latencies.
>> Or have I misunderstood?
> sfc is multi queue/channel, but has a single NAPI instance ?
>
> Sounds wierd.
>
> Please explain me, how GRO can be efficient.
>
> I believe you have one napi per channel.
>
Whoops, this is my fault, I derped reading our NAPI code. You are 
entirely correct, we do have multiple NAPI instances per device, so our 
proposed scheme is unworkable. We'll just have to connect the socket.
Sorry to waste your time.

Jon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-04-11 10:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-09 14:13 udp: Question about busy_poll change Edward Cree
2014-04-09 14:51 ` Shawn Bohrer
2014-04-09 16:20   ` Edward Cree
2014-04-10 18:04     ` [RFC PATCH] udp: allow busy_poll on some unconnected sockets Edward Cree
2014-04-10 18:32       ` Eric Dumazet
2014-04-10 18:38         ` Edward Cree
2014-04-10 19:04           ` Eric Dumazet
2014-04-11 10:44             ` Jonathan Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).