netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/3] Improve UDP multicast receive latency
@ 2013-10-07 16:01 Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer

From: Shawn Bohrer <sbohrer@rgmadvisors.com>

The removal of the routing cache in 3.6 had impacted the latency of our
UDP multicast workload.  This patch series brings down the latency to
what we were seeing with 3.4.

Patch 1 "udp: Only allow busy read/poll on connected sockets" is mostly
done for correctness and because it allows unifying the unicast and
multicast paths when a socket is found in early demux.  It can also
improve latency for a connected multicast socket if busy read/poll is
used.

Patches 2&3 remove the fib lookups and restore latency for our workload
to the pre 3.6 levels.

Benchmark results from a netperf UDP_RR test:
v3.12-rc3-447-g40dc9ab kernel   87961.22 transactions/s
v3.12-rc3-447-g40dc9ab + series 90587.62 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
v3.12-rc3-447-g40dc9ab kernel   12.97us RTT
v3.12-rc3-447-g40dc9ab + series 12.48us RTT

v2 Changes:
- Unicast UDP early demux now requires an exact socket match and only
  tests first socket in UDP hash chain.
- ipv4_pktinfo_prepare() now takes a const struct sock*

v3 Changes:
- Use secondary hash for UDP unicast early demux lookup
- Double check socket match after increasing refcount in both unicast
  and multicast early demux lookups

Shawn Bohrer (3):
  udp: Only allow busy read/poll on connected sockets
  udp: ipv4: Add udp early demux
  net: ipv4 only populate IP_PKTINFO when needed

 include/net/ip.h       |    2 +-
 include/net/sock.h     |    2 +-
 include/net/udp.h      |    1 +
 net/ipv4/af_inet.c     |    1 +
 net/ipv4/ip_sockglue.c |    5 +-
 net/ipv4/raw.c         |    2 +-
 net/ipv4/udp.c         |  209 +++++++++++++++++++++++++++++++++++++++++++-----
 net/ipv6/udp.c         |    5 +-
 8 files changed, 199 insertions(+), 28 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets
  2013-10-07 16:01 [PATCH net-next v3 0/3] Improve UDP multicast receive latency Shawn Bohrer
@ 2013-10-07 16:01 ` Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 2/3] udp: ipv4: Add udp early demux Shawn Bohrer
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer

From: Shawn Bohrer <sbohrer@rgmadvisors.com>

UDP sockets can receive packets from multiple endpoints and thus may be
received on multiple receive queues.  Since packets packets can arrive
on multiple receive queues we should not mark the napi_id for all
packets.  This makes busy read/poll only work for connected UDP sockets.

This additionally enables busy read/poll for UDP multicast packets as
long as the socket is connected by moving the check into
__udp_queue_rcv_skb().

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/udp.c |    5 +++--
 net/ipv6/udp.c |    5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c41833e..5950e12 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1405,8 +1405,10 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int rc;
 
-	if (inet_sk(sk)->inet_daddr)
+	if (inet_sk(sk)->inet_daddr) {
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
+	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
@@ -1716,7 +1718,6 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk != NULL) {
 		int ret;
 
-		sk_mark_napi_id(sk, skb);
 		ret = udp_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8119791..3753247 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -549,8 +549,10 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	int rc;
 
-	if (!ipv6_addr_any(&inet6_sk(sk)->daddr))
+	if (!ipv6_addr_any(&inet6_sk(sk)->daddr)) {
 		sock_rps_save_rxhash(sk, skb);
+		sk_mark_napi_id(sk, skb);
+	}
 
 	rc = sock_queue_rcv_skb(sk, skb);
 	if (rc < 0) {
@@ -844,7 +846,6 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk != NULL) {
 		int ret;
 
-		sk_mark_napi_id(sk, skb);
 		ret = udpv6_queue_rcv_skb(sk, skb);
 		sock_put(sk);
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 2/3] udp: ipv4: Add udp early demux
  2013-10-07 16:01 [PATCH net-next v3 0/3] Improve UDP multicast receive latency Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
@ 2013-10-07 16:01 ` Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
  2013-10-08 20:51 ` [PATCH net-next v3 0/3] Improve UDP multicast receive latency David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer

From: Shawn Bohrer <sbohrer@rgmadvisors.com>

The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.

For UDP multicast we can only cache the dst if there is only one
receiving socket on the host.  Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.

For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups.  Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.

Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After  89789.68 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After  12.63us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
v3 Changes:
- Use secondary hash for UDP unicast early demux lookup
- Double check socket match after increasing refcount in both unicast
  and multicast early demux lookups

 include/net/sock.h |    2 +-
 include/net/udp.h  |    1 +
 net/ipv4/af_inet.c |    1 +
 net/ipv4/udp.c     |  202 +++++++++++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 187 insertions(+), 19 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index e3bf213..7953254 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -218,7 +218,7 @@ struct cg_proto;
   *	@sk_lock:	synchronizer
   *	@sk_rcvbuf: size of receive buffer in bytes
   *	@sk_wq: sock wait queue and async head
-  *	@sk_rx_dst: receive input route used by early tcp demux
+  *	@sk_rx_dst: receive input route used by early demux
   *	@sk_dst_cache: destination cache
   *	@sk_dst_lock: destination cache lock
   *	@sk_policy: flow policy
diff --git a/include/net/udp.h b/include/net/udp.h
index 510b8cb..fe4ba9f 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,6 +175,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		     unsigned int hash2_nulladdr);
 
 /* net/ipv4/udp.c */
+void udp_v4_early_demux(struct sk_buff *skb);
 int udp_get_port(struct sock *sk, unsigned short snum,
 		 int (*saddr_cmp)(const struct sock *,
 				  const struct sock *));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cfeb85c..35913fb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1546,6 +1546,7 @@ static const struct net_protocol tcp_protocol = {
 };
 
 static const struct net_protocol udp_protocol = {
+	.early_demux =	udp_v4_early_demux,
 	.handler =	udp_rcv,
 	.err_handler =	udp_err,
 	.no_policy =	1,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5950e12..262ea39 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -103,6 +103,7 @@
 #include <linux/seq_file.h>
 #include <net/net_namespace.h>
 #include <net/icmp.h>
+#include <net/inet_hashtables.h>
 #include <net/route.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
@@ -565,6 +566,26 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 }
 EXPORT_SYMBOL_GPL(udp4_lib_lookup);
 
+static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
+				       __be16 loc_port, __be32 loc_addr,
+				       __be16 rmt_port, __be32 rmt_addr,
+				       int dif, unsigned short hnum)
+{
+	struct inet_sock *inet = inet_sk(sk);
+
+	if (!net_eq(sock_net(sk), net) ||
+	    udp_sk(sk)->udp_port_hash != hnum ||
+	    (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
+	    (inet->inet_dport != rmt_port && inet->inet_dport) ||
+	    (inet->inet_rcv_saddr && inet->inet_rcv_saddr != loc_addr) ||
+	    ipv6_only_sock(sk) ||
+	    (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif))
+		return false;
+	if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif))
+		return false;
+	return true;
+}
+
 static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
 					     __be16 loc_port, __be32 loc_addr,
 					     __be16 rmt_port, __be32 rmt_addr,
@@ -575,20 +596,11 @@ static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
 	unsigned short hnum = ntohs(loc_port);
 
 	sk_nulls_for_each_from(s, node) {
-		struct inet_sock *inet = inet_sk(s);
-
-		if (!net_eq(sock_net(s), net) ||
-		    udp_sk(s)->udp_port_hash != hnum ||
-		    (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
-		    (inet->inet_dport != rmt_port && inet->inet_dport) ||
-		    (inet->inet_rcv_saddr &&
-		     inet->inet_rcv_saddr != loc_addr) ||
-		    ipv6_only_sock(s) ||
-		    (s->sk_bound_dev_if && s->sk_bound_dev_if != dif))
-			continue;
-		if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif))
-			continue;
-		goto found;
+		if (__udp_is_mcast_sock(net, s,
+					loc_port, loc_addr,
+					rmt_port, rmt_addr,
+					dif, hnum))
+			goto found;
 	}
 	s = NULL;
 found:
@@ -1581,6 +1593,14 @@ static void flush_stack(struct sock **stack, unsigned int count,
 		kfree_skb(skb1);
 }
 
+static void udp_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+
+	dst_hold(dst);
+	sk->sk_rx_dst = dst;
+}
+
 /*
  *	Multicasts and broadcasts go to each listener.
  *
@@ -1709,11 +1729,28 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (udp4_csum_init(skb, uh, proto))
 		goto csum_error;
 
-	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
-		return __udp4_lib_mcast_deliver(net, skb, uh,
-				saddr, daddr, udptable);
+	if (skb->sk) {
+		int ret;
+		sk = skb->sk;
 
-	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+		if (unlikely(sk->sk_rx_dst == NULL))
+			udp_sk_rx_dst_set(sk, skb);
+
+		ret = udp_queue_rcv_skb(sk, skb);
+
+		/* a return value > 0 means to resubmit the input, but
+		 * it wants the return to be -protocol, or 0
+		 */
+		if (ret > 0)
+			return -ret;
+		return 0;
+	} else {
+		if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
+			return __udp4_lib_mcast_deliver(net, skb, uh,
+					saddr, daddr, udptable);
+
+		sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+	}
 
 	if (sk != NULL) {
 		int ret;
@@ -1771,6 +1808,135 @@ drop:
 	return 0;
 }
 
+/* We can only early demux multicast if there is a single matching socket.
+ * If more than one socket found returns NULL
+ */
+static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net,
+						  __be16 loc_port, __be32 loc_addr,
+						  __be16 rmt_port, __be32 rmt_addr,
+						  int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int count, slot = udp_hashfn(net, hnum, udp_table.mask);
+	struct udp_hslot *hslot = &udp_table.hash[slot];
+
+	rcu_read_lock();
+begin:
+	count = 0;
+	result = NULL;
+	sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+		if (__udp_is_mcast_sock(net, sk,
+					loc_port, loc_addr,
+					rmt_port, rmt_addr,
+					dif, hnum)) {
+			result = sk;
+			++count;
+		}
+	}
+	/*
+	 * if the nulls value we got at the end of this lookup is
+	 * not the expected one, we must restart lookup.
+	 * We probably met an item that was moved to another chain.
+	 */
+	if (get_nulls_value(node) != slot)
+		goto begin;
+
+	if (result) {
+		if (count != 1 ||
+		    unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+		else if (unlikely(!__udp_is_mcast_sock(net, sk,
+						       loc_port, loc_addr,
+						       rmt_port, rmt_addr,
+						       dif, hnum))) {
+			sock_put(result);
+			result = NULL;
+		}
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+/* For unicast we should only early demux connected sockets or we can
+ * break forwarding setups.  The chains here can be long so only check
+ * if the first socket is an exact match and if not move on.
+ */
+static struct sock *__udp4_lib_demux_lookup(struct net *net,
+					    __be16 loc_port, __be32 loc_addr,
+					    __be16 rmt_port, __be32 rmt_addr,
+					    int dif)
+{
+	struct sock *sk, *result;
+	struct hlist_nulls_node *node;
+	unsigned short hnum = ntohs(loc_port);
+	unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum);
+	unsigned int slot2 = hash2 & udp_table.mask;
+	struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
+	INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr)
+	const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
+
+	rcu_read_lock();
+	result = NULL;
+	udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+		if (INET_MATCH(sk, net, acookie,
+			       rmt_addr, loc_addr, ports, dif))
+			result = sk;
+		/* Only check first socket in chain */
+		break;
+	}
+
+	if (result) {
+		if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+			result = NULL;
+		else if (unlikely(!INET_MATCH(sk, net, acookie,
+					      rmt_addr, loc_addr,
+					      ports, dif))) {
+			sock_put(result);
+			result = NULL;
+		}
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+void udp_v4_early_demux(struct sk_buff *skb)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	const struct udphdr *uh = udp_hdr(skb);
+	struct sock *sk;
+	struct dst_entry *dst;
+	struct net *net = dev_net(skb->dev);
+	int dif = skb->dev->ifindex;
+
+	/* validate the packet */
+	if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr)))
+		return;
+
+	if (skb->pkt_type == PACKET_BROADCAST ||
+	    skb->pkt_type == PACKET_MULTICAST)
+		sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
+						   uh->source, iph->saddr, dif);
+	else if (skb->pkt_type == PACKET_HOST)
+		sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
+					     uh->source, iph->saddr, dif);
+	else
+		return;
+
+	if (!sk)
+		return;
+
+	skb->sk = sk;
+	skb->destructor = sock_edemux;
+	dst = sk->sk_rx_dst;
+
+	if (dst)
+		dst = dst_check(dst, 0);
+	if (dst)
+		skb_dst_set_noref(skb, dst);
+}
+
 int udp_rcv(struct sk_buff *skb)
 {
 	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next v3 3/3] net: ipv4 only populate IP_PKTINFO when needed
  2013-10-07 16:01 [PATCH net-next v3 0/3] Improve UDP multicast receive latency Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
  2013-10-07 16:01 ` [PATCH net-next v3 2/3] udp: ipv4: Add udp early demux Shawn Bohrer
@ 2013-10-07 16:01 ` Shawn Bohrer
  2013-10-08 20:51 ` [PATCH net-next v3 0/3] Improve UDP multicast receive latency David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer

From: Shawn Bohrer <sbohrer@rgmadvisors.com>

The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received.  This has introduced a performance regression for some
UDP workloads.

This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.

Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After  90587.62 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After  12.48us RTT

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ip.h       |    2 +-
 net/ipv4/ip_sockglue.c |    5 +++--
 net/ipv4/raw.c         |    2 +-
 net/ipv4/udp.c         |    2 +-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 16078f4..b39ebe5 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -459,7 +459,7 @@ int ip_options_rcv_srr(struct sk_buff *skb);
  *	Functions provided by ip_sockglue.c
  */
 
-void ipv4_pktinfo_prepare(struct sk_buff *skb);
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb);
 void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb);
 int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc);
 int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval,
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 56e3445..0626f2c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1052,11 +1052,12 @@ e_inval:
  * destination in skb->cb[] before dst drop.
  * This way, receiver doesnt make cache line misses to read rtable.
  */
-void ipv4_pktinfo_prepare(struct sk_buff *skb)
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
 {
 	struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb);
 
-	if (skb_rtable(skb)) {
+	if ((inet_sk(sk)->cmsg_flags & IP_CMSG_PKTINFO) &&
+	    skb_rtable(skb)) {
 		pktinfo->ipi_ifindex = inet_iif(skb);
 		pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
 	} else {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b2fa14c..41e1d28 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -299,7 +299,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
 {
 	/* Charge it to the socket. */
 
-	ipv4_pktinfo_prepare(skb);
+	ipv4_pktinfo_prepare(sk, skb);
 	if (sock_queue_rcv_skb(sk, skb) < 0) {
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 262ea39..4226c53 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1544,7 +1544,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 
 	rc = 0;
 
-	ipv4_pktinfo_prepare(skb);
+	ipv4_pktinfo_prepare(sk, skb);
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		rc = __udp_queue_rcv_skb(sk, skb);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next v3 0/3] Improve UDP multicast receive latency
  2013-10-07 16:01 [PATCH net-next v3 0/3] Improve UDP multicast receive latency Shawn Bohrer
                   ` (2 preceding siblings ...)
  2013-10-07 16:01 ` [PATCH net-next v3 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
@ 2013-10-08 20:51 ` David Miller
  2013-10-09  4:47   ` [PATCH net-next] udp: fix a typo in __udp4_lib_mcast_demux_lookup Eric Dumazet
  3 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2013-10-08 20:51 UTC (permalink / raw)
  To: shawn.bohrer; +Cc: netdev, tomk, eric.dumazet, sbohrer

From: Shawn Bohrer <shawn.bohrer@gmail.com>
Date: Mon,  7 Oct 2013 11:01:37 -0500

> From: Shawn Bohrer <sbohrer@rgmadvisors.com>
> 
> The removal of the routing cache in 3.6 had impacted the latency of our
> UDP multicast workload.  This patch series brings down the latency to
> what we were seeing with 3.4.
> 
> Patch 1 "udp: Only allow busy read/poll on connected sockets" is mostly
> done for correctness and because it allows unifying the unicast and
> multicast paths when a socket is found in early demux.  It can also
> improve latency for a connected multicast socket if busy read/poll is
> used.
> 
> Patches 2&3 remove the fib lookups and restore latency for our workload
> to the pre 3.6 levels.
> 
> Benchmark results from a netperf UDP_RR test:
> v3.12-rc3-447-g40dc9ab kernel   87961.22 transactions/s
> v3.12-rc3-447-g40dc9ab + series 90587.62 transactions/s
> 
> Benchmark results from a fio 1 byte UDP multicast pingpong test
> (Multicast one way unicast response):
> v3.12-rc3-447-g40dc9ab kernel   12.97us RTT
> v3.12-rc3-447-g40dc9ab + series 12.48us RTT

Great work, all applied to net-next, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next] udp: fix a typo in __udp4_lib_mcast_demux_lookup
  2013-10-08 20:51 ` [PATCH net-next v3 0/3] Improve UDP multicast receive latency David Miller
@ 2013-10-09  4:47   ` Eric Dumazet
  2013-10-09  6:01     ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2013-10-09  4:47 UTC (permalink / raw)
  To: David Miller; +Cc: shawn.bohrer, netdev, tomk, sbohrer

From: Eric Dumazet <edumazet@google.com>

At this point sk might contain garbage.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4226c53..9f27bb8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1847,7 +1847,7 @@ begin:
 		if (count != 1 ||
 		    unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
 			result = NULL;
-		else if (unlikely(!__udp_is_mcast_sock(net, sk,
+		else if (unlikely(!__udp_is_mcast_sock(net, result,
 						       loc_port, loc_addr,
 						       rmt_port, rmt_addr,
 						       dif, hnum))) {

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next] udp: fix a typo in __udp4_lib_mcast_demux_lookup
  2013-10-09  4:47   ` [PATCH net-next] udp: fix a typo in __udp4_lib_mcast_demux_lookup Eric Dumazet
@ 2013-10-09  6:01     ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2013-10-09  6:01 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shawn.bohrer, netdev, tomk, sbohrer

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 08 Oct 2013 21:47:29 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> At this point sk might contain garbage.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-09  6:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-07 16:01 [PATCH net-next v3 0/3] Improve UDP multicast receive latency Shawn Bohrer
2013-10-07 16:01 ` [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets Shawn Bohrer
2013-10-07 16:01 ` [PATCH net-next v3 2/3] udp: ipv4: Add udp early demux Shawn Bohrer
2013-10-07 16:01 ` [PATCH net-next v3 3/3] net: ipv4 only populate IP_PKTINFO when needed Shawn Bohrer
2013-10-08 20:51 ` [PATCH net-next v3 0/3] Improve UDP multicast receive latency David Miller
2013-10-09  4:47   ` [PATCH net-next] udp: fix a typo in __udp4_lib_mcast_demux_lookup Eric Dumazet
2013-10-09  6:01     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).