All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols
@ 2014-01-07 15:29 Or Gerlitz
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 15:29 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

These series adds GRO handlers for protocols that do UDP encapsulation, with the 
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type 
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Patches are against net-next 4a8deae2f4653 "r8152: correct some messages" plus
Jerry's gro-gre patch which adds the gro_find_receive/complete_by_type helpers

On my setup, which is net-next (now with the mlx4 vxlan offloads patches) -- 
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs

TODO:
 - identify udp encapsulated packets whose inner VM packet is udp and happen
   to carry a port which has registered offloads - and flush it.

 - invoke the udp offload protocol de-registration from the vxlan driver from 
   sleepable context

Or Gerlitz (3):
  net: Add GRO support for UDP encapsulating protocols
  net: Export gro_find_by_type helpers
  net: Add GRO support for vxlan traffic

 drivers/net/vxlan.c    |  101 +++++++++++++++++++++++++++++++++++++++++++++++-
 include/net/protocol.h |    6 +++
 net/core/dev.c         |    2 +
 net/ipv4/protocol.c    |   21 ++++++++++
 net/ipv4/udp_offload.c |   69 ++++++++++++++++++++++++++++++++
 5 files changed, 197 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 15:29 [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
@ 2014-01-07 15:29 ` Or Gerlitz
  2014-01-07 16:33   ` Eric Dumazet
                     ` (2 more replies)
  2014-01-07 15:29 ` [PATCH net-next V2 2/3] net: Export gro_find_by_type helpers Or Gerlitz
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 15:29 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type 
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/net/protocol.h |    6 ++++
 net/ipv4/protocol.c    |   21 ++++++++++++++
 net/ipv4/udp_offload.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 0 deletions(-)

diff --git a/include/net/protocol.h b/include/net/protocol.h
index fbf7676..d776c08 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -92,6 +92,10 @@ extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
 extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
 extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
 
+
+#define MAX_UDP_PORT (1 << 16)
+extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
+
 #if IS_ENABLED(CONFIG_IPV6)
 extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
 #endif
@@ -102,6 +106,8 @@ int inet_add_offload(const struct net_offload *prot, unsigned char num);
 int inet_del_offload(const struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
+int udp_add_offload(const struct net_offload *prot, __be16 port);
+int udp_del_offload(const struct net_offload *prot, __be16 port);
 
 #if IS_ENABLED(CONFIG_IPV6)
 int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 46d6a1c..426eae5 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -30,6 +30,7 @@
 
 const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
 const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
+const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT] __read_mostly;
 
 int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
 {
@@ -51,6 +52,13 @@ int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
 }
 EXPORT_SYMBOL(inet_add_offload);
 
+int udp_add_offload(const struct net_offload *prot, __be16 port)
+{
+	return !cmpxchg((const struct net_offload **)&udp_offloads[port],
+			NULL, prot) ? 0 : -1;
+}
+EXPORT_SYMBOL(udp_add_offload);
+
 int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
 {
 	int ret;
@@ -76,3 +84,16 @@ int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
 	return ret;
 }
 EXPORT_SYMBOL(inet_del_offload);
+
+int udp_del_offload(const struct net_offload *prot, __be16 port)
+{
+	int ret;
+
+	ret = (cmpxchg((const struct net_offload **)&udp_offloads[port],
+		       prot, NULL) == prot) ? 0 : -1;
+
+	synchronize_net();
+
+	return ret;
+}
+EXPORT_SYMBOL(udp_del_offload);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 79c62bd..0a8fdd6 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -89,10 +89,79 @@ out:
 	return segs;
 }
 
+
+static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	const struct net_offload *ops;
+	struct sk_buff *p, **pp = NULL;
+	struct udphdr *uh, *uh2;
+	unsigned int hlen, off;
+	int flush = 1;
+
+	off  = skb_gro_offset(skb);
+	hlen = off + sizeof(*uh);
+	uh   = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		uh = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!uh))
+			goto out;
+	}
+
+	rcu_read_lock();
+	ops = rcu_dereference(udp_offloads[uh->dest]);
+	if (!ops || !ops->callbacks.gro_receive)
+		goto out_unlock;
+
+	flush = 0;
+
+	for (p = *head; p; p = p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		uh2 = (struct udphdr   *)(p->data + off);
+		if ((*(u32 *)&uh->source ^ *(u32 *)&uh2->source)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+		goto found;
+	}
+
+found:
+	skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
+	pp = ops->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	NAPI_GRO_CB(skb)->flush |= flush;
+
+	return pp;
+}
+
+static int udp_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	const struct net_offload *ops;
+	__be16 newlen = htons(skb->len - nhoff);
+	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
+	int err = -ENOSYS;
+
+	uh->len = newlen;
+
+	rcu_read_lock();
+	ops = rcu_dereference(udp_offloads[uh->dest]);
+	if (ops && ops->callbacks.gro_complete)
+		err = ops->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
+
+	rcu_read_unlock();
+	return err;
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_send_check = udp4_ufo_send_check,
 		.gso_segment = udp4_ufo_fragment,
+		.gro_receive  =	udp_gro_receive,
+		.gro_complete =	udp_gro_complete,
 	},
 };
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next V2 2/3] net: Export gro_find_by_type helpers
  2014-01-07 15:29 [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
@ 2014-01-07 15:29 ` Or Gerlitz
  2014-01-07 15:29 ` [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
  2014-01-07 16:45 ` [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Eric Dumazet
  3 siblings, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 15:29 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 net/core/dev.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ca520ed..03c249f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3935,6 +3935,7 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(gro_find_receive_by_type);
 
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
@@ -3948,6 +3949,7 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(gro_find_complete_by_type);
 
 static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 15:29 [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
  2014-01-07 15:29 ` [PATCH net-next V2 2/3] net: Export gro_find_by_type helpers Or Gerlitz
@ 2014-01-07 15:29 ` Or Gerlitz
  2014-01-07 16:34   ` Eric Dumazet
  2014-01-07 18:08   ` Tom Herbert
  2014-01-07 16:45 ` [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Eric Dumazet
  3 siblings, 2 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 15:29 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Add gro handlers for vxlan using the udp gro infrastructure

On my setup, which is net-next (now with the mlx4 vxlan offloads patches) -- 
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs

--> UDP/VXLAN GRO disabled
$ netperf  -H 192.168.52.147 -c -C

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195

--> UDP/VXLAN GRO enabled

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/vxlan.c |  101 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..b51823b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -40,6 +40,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
 #include <net/addrconf.h>
@@ -554,6 +555,98 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	return 1;
 }
 
+static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct sk_buff *p, **pp = NULL;
+	struct vxlanhdr *vh, *vh2;
+	struct ethhdr *eh;
+	unsigned int hlen, off, off_eth;
+	const struct packet_offload *ptype;
+	__be16 type;
+	int flush = 1;
+
+	off  = skb_gro_offset(skb);
+	hlen = off + sizeof(*vh);
+	vh   = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		vh = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!vh))
+			goto out;
+	}
+
+	flush = 0;
+
+	for (p = *head; p; p = p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		vh2 = (struct vxlanhdr   *)(p->data + off);
+		if (vh->vx_vni ^ vh2->vx_vni) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+		goto found;
+	}
+
+found:
+	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
+
+	off_eth = skb_gro_offset(skb);
+	hlen = off_eth + sizeof(*eh);
+	eh   = skb_gro_header_fast(skb, off_eth);
+	if (skb_gro_header_hard(skb, hlen)) {
+		eh = skb_gro_header_slow(skb, hlen, off_eth);
+		if (unlikely(!eh))
+			goto out;
+	}
+	type = eh->h_proto;
+
+	rcu_read_lock();
+	ptype = gro_find_receive_by_type(type);
+	if (ptype == NULL) {
+		flush = 1;
+		goto out_unlock;
+	}
+
+	skb_gro_pull(skb, sizeof(*eh)); /* pull inner eth header */
+	pp = ptype->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	NAPI_GRO_CB(skb)->flush |= flush;
+
+	return pp;
+}
+
+static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct ethhdr *eh;
+	struct packet_offload *ptype;
+	__be16 type;
+	/* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
+	int vxlan_len  = 22;
+	int err = -ENOSYS;
+
+	eh = (struct ethhdr *)(skb->data + nhoff + sizeof (struct vxlanhdr));
+	type = eh->h_proto;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + vxlan_len);
+
+	rcu_read_unlock();
+	return err;
+}
+
+static const struct net_offload vxlan_offload = {
+	.callbacks = {
+		.gro_receive  =	vxlan_gro_receive,
+		.gro_complete =	vxlan_gro_complete,
+	},
+};
+
 /* Notify netdevs that UDP port started listening */
 static void vxlan_notify_add_rx_port(struct sock *sk)
 {
@@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
 			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
 							    port);
 	}
+	if (sa_family == AF_INET)
+		udp_add_offload(&vxlan_offload, port);
 	rcu_read_unlock();
 }
 
@@ -585,6 +680,8 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
 			dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
 							    port);
 	}
+	if (sa_family == AF_INET)
+		udp_del_offload(&vxlan_offload, port);
 	rcu_read_unlock();
 }
 
@@ -1125,8 +1222,8 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	 * leave the CHECKSUM_UNNECESSARY, the device checksummed it
 	 * for us. Otherwise force the upper layers to verify it.
 	 */
-	if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation ||
-	    !(vxlan->dev->features & NETIF_F_RXCSUM))
+	if ((skb->ip_summed != CHECKSUM_UNNECESSARY && skb->ip_summed != CHECKSUM_PARTIAL) ||
+	    !skb->encapsulation || !(vxlan->dev->features & NETIF_F_RXCSUM))
 		skb->ip_summed = CHECKSUM_NONE;
 
 	skb->encapsulation = 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
@ 2014-01-07 16:33   ` Eric Dumazet
  2014-01-07 20:19     ` Or Gerlitz
  2014-01-07 18:44   ` Tom Herbert
  2014-01-07 21:19   ` David Miller
  2 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 16:33 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>  
> +
> +#define MAX_UDP_PORT (1 << 16)
> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];

Thats 512 KB of memory.

This will greatly impact forwarding performance of UDP packets with
random ports, and will increase kernel memory size for embedded devices.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 15:29 ` [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
@ 2014-01-07 16:34   ` Eric Dumazet
  2014-01-07 19:43     ` Or Gerlitz
  2014-01-07 18:08   ` Tom Herbert
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 16:34 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
> Add gro handlers for vxlan using the udp gro infrastructure
> 


>  static void vxlan_notify_add_rx_port(struct sock *sk)
>  {
> @@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>  			dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
>  							    port);
>  	}
> +	if (sa_family == AF_INET)
> +		udp_add_offload(&vxlan_offload, port);
>  	rcu_read_unlock();
>  }
>  

This means two vxlan tunnels can not share same port.

Is that a valid assertion ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 15:29 [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
                   ` (2 preceding siblings ...)
  2014-01-07 15:29 ` [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
@ 2014-01-07 16:45 ` Eric Dumazet
  3 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 16:45 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:

> On my setup, which is net-next (now with the mlx4 vxlan offloads patches) -- 
> for single TCP session that goes through vxlan tunneling I got nice improvement
> from 6.8Gbs to 11.5Gbs

It would be nice to check how performance is impacted on a router, using
pktgen to inject UDP traffic on random UDP ports.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 15:29 ` [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
  2014-01-07 16:34   ` Eric Dumazet
@ 2014-01-07 18:08   ` Tom Herbert
  2014-01-07 19:43     ` Or Gerlitz
       [not found]     ` <CAJZOPZLsMvmHwmMjhsuKb__2HncMXMm=p6UFnT4XX5d8hZnGxw@mail.gmail.com>
  1 sibling, 2 replies; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 18:08 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add gro handlers for vxlan using the udp gro infrastructure
>
> On my setup, which is net-next (now with the mlx4 vxlan offloads patches) --
> for single TCP session that goes through vxlan tunneling I got nice improvement
> from 6.8Gbs to 11.5Gbs
>
> --> UDP/VXLAN GRO disabled
> $ netperf  -H 192.168.52.147 -c -C
>
> $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>  87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195
>
> --> UDP/VXLAN GRO enabled
>
> $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
> Recv   Send    Send                          Utilization       Service Demand
> Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
> Size   Size    Size     Time     Throughput  local    remote   local   remote
> bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
>
>  87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>  drivers/net/vxlan.c |  101 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 files changed, 99 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 481f85d..b51823b 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -40,6 +40,7 @@
>  #include <net/net_namespace.h>
>  #include <net/netns/generic.h>
>  #include <net/vxlan.h>
> +#include <net/protocol.h>
>  #if IS_ENABLED(CONFIG_IPV6)
>  #include <net/ipv6.h>
>  #include <net/addrconf.h>
> @@ -554,6 +555,98 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
>         return 1;
>  }
>
> +static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> +       struct sk_buff *p, **pp = NULL;
> +       struct vxlanhdr *vh, *vh2;
> +       struct ethhdr *eh;
> +       unsigned int hlen, off, off_eth;
> +       const struct packet_offload *ptype;
> +       __be16 type;
> +       int flush = 1;
> +
> +       off  = skb_gro_offset(skb);
> +       hlen = off + sizeof(*vh);
> +       vh   = skb_gro_header_fast(skb, off);
> +       if (skb_gro_header_hard(skb, hlen)) {
> +               vh = skb_gro_header_slow(skb, hlen, off);
> +               if (unlikely(!vh))
> +                       goto out;
> +       }
> +
> +       flush = 0;
> +
> +       for (p = *head; p; p = p->next) {
> +               if (!NAPI_GRO_CB(p)->same_flow)
> +                       continue;
> +
> +               vh2 = (struct vxlanhdr   *)(p->data + off);
> +               if (vh->vx_vni ^ vh2->vx_vni) {

Why ^ instead of != ?

> +                       NAPI_GRO_CB(p)->same_flow = 0;
> +                       continue;
> +               }
> +               goto found;
> +       }
> +
> +found:
> +       skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
> +
> +       off_eth = skb_gro_offset(skb);
> +       hlen = off_eth + sizeof(*eh);
> +       eh   = skb_gro_header_fast(skb, off_eth);
> +       if (skb_gro_header_hard(skb, hlen)) {
> +               eh = skb_gro_header_slow(skb, hlen, off_eth);
> +               if (unlikely(!eh))
> +                       goto out;
> +       }
> +       type = eh->h_proto;
> +
> +       rcu_read_lock();
> +       ptype = gro_find_receive_by_type(type);
> +       if (ptype == NULL) {
> +               flush = 1;
> +               goto out_unlock;
> +       }
> +
> +       skb_gro_pull(skb, sizeof(*eh)); /* pull inner eth header */
> +       pp = ptype->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> +       rcu_read_unlock();
> +out:
> +       NAPI_GRO_CB(skb)->flush |= flush;
> +
> +       return pp;
> +}
> +
> +static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +       struct ethhdr *eh;
> +       struct packet_offload *ptype;
> +       __be16 type;
> +       /* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
> +       int vxlan_len  = 22;
> +       int err = -ENOSYS;
> +
> +       eh = (struct ethhdr *)(skb->data + nhoff + sizeof (struct vxlanhdr));
> +       type = eh->h_proto;
> +
> +       rcu_read_lock();
> +       ptype = gro_find_complete_by_type(type);
> +       if (ptype != NULL)
> +               err = ptype->callbacks.gro_complete(skb, nhoff + vxlan_len);
> +
> +       rcu_read_unlock();
> +       return err;
> +}
> +
> +static const struct net_offload vxlan_offload = {
> +       .callbacks = {
> +               .gro_receive  = vxlan_gro_receive,
> +               .gro_complete = vxlan_gro_complete,
> +       },
> +};
> +
>  /* Notify netdevs that UDP port started listening */
>  static void vxlan_notify_add_rx_port(struct sock *sk)
>  {
> @@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>                         dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
>                                                             port);
>         }
> +       if (sa_family == AF_INET)
> +               udp_add_offload(&vxlan_offload, port);
>         rcu_read_unlock();
>  }
>
> @@ -585,6 +680,8 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
>                         dev->netdev_ops->ndo_del_vxlan_port(dev, sa_family,
>                                                             port);
>         }
> +       if (sa_family == AF_INET)
> +               udp_del_offload(&vxlan_offload, port);
>         rcu_read_unlock();
>  }
>
> @@ -1125,8 +1222,8 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>          * leave the CHECKSUM_UNNECESSARY, the device checksummed it
>          * for us. Otherwise force the upper layers to verify it.
>          */
> -       if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation ||
> -           !(vxlan->dev->features & NETIF_F_RXCSUM))
> +       if ((skb->ip_summed != CHECKSUM_UNNECESSARY && skb->ip_summed != CHECKSUM_PARTIAL) ||
> +           !skb->encapsulation || !(vxlan->dev->features & NETIF_F_RXCSUM))
>                 skb->ip_summed = CHECKSUM_NONE;
>
>         skb->encapsulation = 0;
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
  2014-01-07 16:33   ` Eric Dumazet
@ 2014-01-07 18:44   ` Tom Herbert
  2014-01-07 20:21     ` Or Gerlitz
  2014-01-07 21:19   ` David Miller
  2 siblings, 1 reply; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 18:44 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

Or, thanks for posting the patches!

We should also support the case where direct encapsulation is being
done, that is there is no encapsulation header after UDP and the
protocol of the encapsulated packet is inferred by the port number
(e.g. GRE/UDP, TCP/UDP, SCTP/UDP, etc.). This is probably an
additional field in net_offload struct for next protocol, a little
more API, and pretty trivial handlers in UDP code.


On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
> being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>  include/net/protocol.h |    6 ++++
>  net/ipv4/protocol.c    |   21 ++++++++++++++
>  net/ipv4/udp_offload.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 96 insertions(+), 0 deletions(-)
>
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index fbf7676..d776c08 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -92,6 +92,10 @@ extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
>  extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
>  extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
>
> +
> +#define MAX_UDP_PORT (1 << 16)
> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
>  #endif
> @@ -102,6 +106,8 @@ int inet_add_offload(const struct net_offload *prot, unsigned char num);
>  int inet_del_offload(const struct net_offload *prot, unsigned char num);
>  void inet_register_protosw(struct inet_protosw *p);
>  void inet_unregister_protosw(struct inet_protosw *p);
> +int udp_add_offload(const struct net_offload *prot, __be16 port);
> +int udp_del_offload(const struct net_offload *prot, __be16 port);
>
>  #if IS_ENABLED(CONFIG_IPV6)
>  int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
> index 46d6a1c..426eae5 100644
> --- a/net/ipv4/protocol.c
> +++ b/net/ipv4/protocol.c
> @@ -30,6 +30,7 @@
>
>  const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
>  const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
> +const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT] __read_mostly;
>
>  int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
>  {
> @@ -51,6 +52,13 @@ int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
>  }
>  EXPORT_SYMBOL(inet_add_offload);
>
> +int udp_add_offload(const struct net_offload *prot, __be16 port)
> +{
> +       return !cmpxchg((const struct net_offload **)&udp_offloads[port],
> +                       NULL, prot) ? 0 : -1;
> +}
> +EXPORT_SYMBOL(udp_add_offload);
> +
>  int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
>  {
>         int ret;
> @@ -76,3 +84,16 @@ int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
>         return ret;
>  }
>  EXPORT_SYMBOL(inet_del_offload);
> +
> +int udp_del_offload(const struct net_offload *prot, __be16 port)
> +{
> +       int ret;
> +
> +       ret = (cmpxchg((const struct net_offload **)&udp_offloads[port],
> +                      prot, NULL) == prot) ? 0 : -1;
> +
> +       synchronize_net();
> +
> +       return ret;
> +}
> +EXPORT_SYMBOL(udp_del_offload);
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 79c62bd..0a8fdd6 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -89,10 +89,79 @@ out:
>         return segs;
>  }
>
> +
> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> +       const struct net_offload *ops;
> +       struct sk_buff *p, **pp = NULL;
> +       struct udphdr *uh, *uh2;
> +       unsigned int hlen, off;
> +       int flush = 1;
> +
> +       off  = skb_gro_offset(skb);
> +       hlen = off + sizeof(*uh);
> +       uh   = skb_gro_header_fast(skb, off);
> +       if (skb_gro_header_hard(skb, hlen)) {
> +               uh = skb_gro_header_slow(skb, hlen, off);
> +               if (unlikely(!uh))
> +                       goto out;
> +       }
> +
> +       rcu_read_lock();
> +       ops = rcu_dereference(udp_offloads[uh->dest]);
> +       if (!ops || !ops->callbacks.gro_receive)
> +               goto out_unlock;
> +
> +       flush = 0;
> +
> +       for (p = *head; p; p = p->next) {
> +               if (!NAPI_GRO_CB(p)->same_flow)
> +                       continue;
> +
> +               uh2 = (struct udphdr   *)(p->data + off);
> +               if ((*(u32 *)&uh->source ^ *(u32 *)&uh2->source)) {
> +                       NAPI_GRO_CB(p)->same_flow = 0;
> +                       continue;
> +               }
> +               goto found;
> +       }
> +
> +found:
> +       skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
> +       pp = ops->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> +       rcu_read_unlock();
> +out:
> +       NAPI_GRO_CB(skb)->flush |= flush;
> +
> +       return pp;
> +}
> +
> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +       const struct net_offload *ops;
> +       __be16 newlen = htons(skb->len - nhoff);
> +       struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> +       int err = -ENOSYS;
> +
> +       uh->len = newlen;
> +
> +       rcu_read_lock();
> +       ops = rcu_dereference(udp_offloads[uh->dest]);
> +       if (ops && ops->callbacks.gro_complete)
> +               err = ops->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
> +
> +       rcu_read_unlock();
> +       return err;
> +}
> +
>  static const struct net_offload udpv4_offload = {
>         .callbacks = {
>                 .gso_send_check = udp4_ufo_send_check,
>                 .gso_segment = udp4_ufo_fragment,
> +               .gro_receive  = udp_gro_receive,
> +               .gro_complete = udp_gro_complete,
>         },
>  };
>
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 18:08   ` Tom Herbert
@ 2014-01-07 19:43     ` Or Gerlitz
  2014-01-07 20:02       ` Eric Dumazet
       [not found]     ` <CAJZOPZLsMvmHwmMjhsuKb__2HncMXMm=p6UFnT4XX5d8hZnGxw@mail.gmail.com>
  1 sibling, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 19:43 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
> On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> +static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>> +{
>> +       struct sk_buff *p, **pp = NULL;
>> +       struct vxlanhdr *vh, *vh2;
>> +       struct ethhdr *eh;
>> +       unsigned int hlen, off, off_eth;
>> +       const struct packet_offload *ptype;
>> +       __be16 type;
>> +       int flush = 1;
>> +
>> +       off  = skb_gro_offset(skb);
>> +       hlen = off + sizeof(*vh);
>> +       vh   = skb_gro_header_fast(skb, off);
>> +       if (skb_gro_header_hard(skb, hlen)) {
>> +               vh = skb_gro_header_slow(skb, hlen, off);
>> +               if (unlikely(!vh))
>> +                       goto out;
>> +       }
>> +
>> +       flush = 0;
>> +
>> +       for (p = *head; p; p = p->next) {
>> +               if (!NAPI_GRO_CB(p)->same_flow)
>> +                       continue;
>> +
>> +               vh2 = (struct vxlanhdr   *)(p->data + off);
>> +               if (vh->vx_vni ^ vh2->vx_vni) {
>
> Why ^ instead of != ?

The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain
of inet_gro_receive() && tcp_gro_receive(), I guess this might relates
to more efficient assembly code for ^ vs. != and/or the fast/elegant
transitive nature of that operator

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 16:34   ` Eric Dumazet
@ 2014-01-07 19:43     ` Or Gerlitz
  2014-01-07 20:04       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 19:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 6:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>> Add gro handlers for vxlan using the udp gro infrastructure
>>
>
>
>>  static void vxlan_notify_add_rx_port(struct sock *sk)
>>  {
>> @@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>>                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
>>                                                           port);
>>       }
>> +     if (sa_family == AF_INET)
>> +             udp_add_offload(&vxlan_offload, port);
>>       rcu_read_unlock();
>>  }
>>
>
> This means two vxlan tunnels can not share same port.
> Is that a valid assertion ?

nope -- the vxlan driver opens a listener udp socket per per listening
port, but N > 1 vxlan tunnels can sit on that port, the driver does
further demuxing based on the vnid carried in the vxlan header, see
the call to vxlan_find_vni() from vxlan_rcv()

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
       [not found]     ` <CAJZOPZLsMvmHwmMjhsuKb__2HncMXMm=p6UFnT4XX5d8hZnGxw@mail.gmail.com>
@ 2014-01-07 19:52       ` Tom Herbert
  0 siblings, 0 replies; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 19:52 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 11:37 AM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
> On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
>>
>> On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> > +static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct
>> > sk_buff *skb)
>> > +{
>> > +       struct sk_buff *p, **pp = NULL;
>> > +       struct vxlanhdr *vh, *vh2;
>> > +       struct ethhdr *eh;
>> > +       unsigned int hlen, off, off_eth;
>> > +       const struct packet_offload *ptype;
>> > +       __be16 type;
>> > +       int flush = 1;
>> > +
>> > +       off  = skb_gro_offset(skb);
>> > +       hlen = off + sizeof(*vh);
>> > +       vh   = skb_gro_header_fast(skb, off);
>> > +       if (skb_gro_header_hard(skb, hlen)) {
>> > +               vh = skb_gro_header_slow(skb, hlen, off);
>> > +               if (unlikely(!vh))
>> > +                       goto out;
>> > +       }
>> > +
>> > +       flush = 0;
>> > +
>> > +       for (p = *head; p; p = p->next) {
>> > +               if (!NAPI_GRO_CB(p)->same_flow)
>> > +                       continue;
>> > +
>> > +               vh2 = (struct vxlanhdr   *)(p->data + off);
>> > +               if (vh->vx_vni ^ vh2->vx_vni) {
>>
>> Why ^ instead of != ?
>
>
> The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain of
> inet_gro_receive() && tcp_gro_receive(), I guess this might relates to more
> efficient assembly code for ^ vs. != and/or the fast/elegant transitive
> nature of that operator
>
These arguments may have applied twenty years ago, but I seriously
doubt this is going to outwit any modern compiler into producing more
efficient code. To me, it's just making code less readable.

>>
>> > +                       NAPI_GRO_CB(p)->same_flow = 0;
>> > +                       continue;
>> > +               }
>> > +               goto found;
>> > +       }
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 19:43     ` Or Gerlitz
@ 2014-01-07 20:02       ` Eric Dumazet
  2014-01-07 20:12         ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 20:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Tom Herbert, Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
> On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:

> > Why ^ instead of != ?
> 
> The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain
> of inet_gro_receive() && tcp_gro_receive(), I guess this might relates
> to more efficient assembly code for ^ vs. != and/or the fast/elegant
> transitive nature of that operator

This trick is only needed/used when many compares are folded into a
single conditional :

if (a->f1 != b->f1 || a->f2 != b->f2)

->

if (((a->f1 ^ b->f1) | (a->f2 ^ b->f2)) != 0)

Please do not use XOR for a single compare.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 19:43     ` Or Gerlitz
@ 2014-01-07 20:04       ` Eric Dumazet
  2014-01-07 20:10         ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 20:04 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
> On Tue, Jan 7, 2014 at 6:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
> >> Add gro handlers for vxlan using the udp gro infrastructure
> >>
> >
> >
> >>  static void vxlan_notify_add_rx_port(struct sock *sk)
> >>  {
> >> @@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
> >>                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
> >>                                                           port);
> >>       }
> >> +     if (sa_family == AF_INET)
> >> +             udp_add_offload(&vxlan_offload, port);
> >>       rcu_read_unlock();
> >>  }
> >>
> >
> > This means two vxlan tunnels can not share same port.
> > Is that a valid assertion ?
> 
> nope -- the vxlan driver opens a listener udp socket per per listening
> port, but N > 1 vxlan tunnels can sit on that port, the driver does
> further demuxing based on the vnid carried in the vxlan header, see
> the call to vxlan_find_vni() from vxlan_rcv()
> --

So if we use same port, it seems to me udp_del_offload() of the first
dismantled tunnel will remove the offload for the remaining tunnel.

You forgot to implement a refcount somehow.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 20:04       ` Eric Dumazet
@ 2014-01-07 20:10         ` Or Gerlitz
  0 siblings, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 20:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 10:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
>> On Tue, Jan 7, 2014 at 6:34 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>> >> Add gro handlers for vxlan using the udp gro infrastructure
>> >>
>> >
>> >
>> >>  static void vxlan_notify_add_rx_port(struct sock *sk)
>> >>  {
>> >> @@ -568,6 +661,8 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>> >>                       dev->netdev_ops->ndo_add_vxlan_port(dev, sa_family,
>> >>                                                           port);
>> >>       }
>> >> +     if (sa_family == AF_INET)
>> >> +             udp_add_offload(&vxlan_offload, port);
>> >>       rcu_read_unlock();
>> >>  }
>> >>
>> >
>> > This means two vxlan tunnels can not share same port.
>> > Is that a valid assertion ?
>>
>> nope -- the vxlan driver opens a listener udp socket per per listening
>> port, but N > 1 vxlan tunnels can sit on that port, the driver does
>> further demuxing based on the vnid carried in the vxlan header, see
>> the call to vxlan_find_vni() from vxlan_rcv()
>> --
>
> So if we use same port, it seems to me udp_del_offload() of the first
> dismantled tunnel will remove the offload for the remaining tunnel.
>
> You forgot to implement a refcount somehow.

no, the vxlan driver does it for me on a higher level, you can see in
vxlan_sock_release() that vxlan_notify_del_rx_port() is called only if
the ref count which is associated with the listener socket drops to
zero.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 20:02       ` Eric Dumazet
@ 2014-01-07 20:12         ` Or Gerlitz
  2014-01-07 21:09           ` Tom Herbert
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 20:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tom Herbert, Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 10:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
>> On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
>
>> > Why ^ instead of != ?
>>
>> The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain
>> of inet_gro_receive() && tcp_gro_receive(), I guess this might relates
>> to more efficient assembly code for ^ vs. != and/or the fast/elegant
>> transitive nature of that operator
>
> This trick is only needed/used when many compares are folded into a
> single conditional :
>
> if (a->f1 != b->f1 || a->f2 != b->f2)
>
> ->
>
> if (((a->f1 ^ b->f1) | (a->f2 ^ b->f2)) != 0)
>
> Please do not use XOR for a single compare.

OK, but just out of curiosity -- what's the reasoning? clarity or
efficiency or both?
>
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 16:33   ` Eric Dumazet
@ 2014-01-07 20:19     ` Or Gerlitz
  2014-01-07 20:32       ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 20:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>>
>> +
>> +#define MAX_UDP_PORT (1 << 16)
>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>
> Thats 512 KB of memory.
> This will greatly impact forwarding performance of UDP packets with
> random ports, and will increase kernel memory size for embedded devices.

Re forwarding, are you referring to the case where the forwarded
packets are encapsulated? packets which are not encapusalted will be
flushed in the gro receive handler (this went out by mistake in V2 but
exists in V1)  if skb->encapsulation isn't set.

As for encapsulated packets, when you say random ports, are you
referring to a router which has multiple udp encapsulating protocols
where each uses different udp port? for this case and also to reduce
the memory footprint, we can use lookup in a list as done for the L2
protocols gro handlers in the list_for_each loop of dev_gro_receive(),
makes sense?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 18:44   ` Tom Herbert
@ 2014-01-07 20:21     ` Or Gerlitz
  2014-01-07 23:04       ` Tom Herbert
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 20:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 8:44 PM, Tom Herbert <therbert@google.com> wrote:
> Or, thanks for posting the patches!
>
> We should also support the case where direct encapsulation is being
> done, that is there is no encapsulation header after UDP and the
> protocol of the encapsulated packet is inferred by the port number
> (e.g. GRE/UDP, TCP/UDP, SCTP/UDP, etc.). This is probably an
> additional field in net_offload struct for next protocol, a little
> more API, and pretty trivial handlers in UDP code.

The way I have set that follows your guideline under which the
encapsulating method is derived from the udp destination port in the
sense that the encapsulating protocol can do what they want in the
gro_receive/complete handlers entry they plant per that udp port,
isn't that generic enough?


>
>
> On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
>> being able to coalesce packets which encapsulate packets belonging to
>> the same TCP session.
>>
>> For GRO purposes, the destination UDP port takes the role of the ether type
>> field in the ethernet header or the next protocol in the IP header.
>>
>> The UDP GRO handler will only attempt to coalesce packets whose destination
>> port is registered to have gro handler.
>>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>> ---
>>  include/net/protocol.h |    6 ++++
>>  net/ipv4/protocol.c    |   21 ++++++++++++++
>>  net/ipv4/udp_offload.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 96 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/net/protocol.h b/include/net/protocol.h
>> index fbf7676..d776c08 100644
>> --- a/include/net/protocol.h
>> +++ b/include/net/protocol.h
>> @@ -92,6 +92,10 @@ extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
>>  extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
>>  extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
>>
>> +
>> +#define MAX_UDP_PORT (1 << 16)
>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>> +
>>  #if IS_ENABLED(CONFIG_IPV6)
>>  extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
>>  #endif
>> @@ -102,6 +106,8 @@ int inet_add_offload(const struct net_offload *prot, unsigned char num);
>>  int inet_del_offload(const struct net_offload *prot, unsigned char num);
>>  void inet_register_protosw(struct inet_protosw *p);
>>  void inet_unregister_protosw(struct inet_protosw *p);
>> +int udp_add_offload(const struct net_offload *prot, __be16 port);
>> +int udp_del_offload(const struct net_offload *prot, __be16 port);
>>
>>  #if IS_ENABLED(CONFIG_IPV6)
>>  int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
>> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
>> index 46d6a1c..426eae5 100644
>> --- a/net/ipv4/protocol.c
>> +++ b/net/ipv4/protocol.c
>> @@ -30,6 +30,7 @@
>>
>>  const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
>>  const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
>> +const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT] __read_mostly;
>>
>>  int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
>>  {
>> @@ -51,6 +52,13 @@ int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
>>  }
>>  EXPORT_SYMBOL(inet_add_offload);
>>
>> +int udp_add_offload(const struct net_offload *prot, __be16 port)
>> +{
>> +       return !cmpxchg((const struct net_offload **)&udp_offloads[port],
>> +                       NULL, prot) ? 0 : -1;
>> +}
>> +EXPORT_SYMBOL(udp_add_offload);
>> +
>>  int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
>>  {
>>         int ret;
>> @@ -76,3 +84,16 @@ int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
>>         return ret;
>>  }
>>  EXPORT_SYMBOL(inet_del_offload);
>> +
>> +int udp_del_offload(const struct net_offload *prot, __be16 port)
>> +{
>> +       int ret;
>> +
>> +       ret = (cmpxchg((const struct net_offload **)&udp_offloads[port],
>> +                      prot, NULL) == prot) ? 0 : -1;
>> +
>> +       synchronize_net();
>> +
>> +       return ret;
>> +}
>> +EXPORT_SYMBOL(udp_del_offload);
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index 79c62bd..0a8fdd6 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -89,10 +89,79 @@ out:
>>         return segs;
>>  }
>>
>> +
>> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>> +{
>> +       const struct net_offload *ops;
>> +       struct sk_buff *p, **pp = NULL;
>> +       struct udphdr *uh, *uh2;
>> +       unsigned int hlen, off;
>> +       int flush = 1;
>> +
>> +       off  = skb_gro_offset(skb);
>> +       hlen = off + sizeof(*uh);
>> +       uh   = skb_gro_header_fast(skb, off);
>> +       if (skb_gro_header_hard(skb, hlen)) {
>> +               uh = skb_gro_header_slow(skb, hlen, off);
>> +               if (unlikely(!uh))
>> +                       goto out;
>> +       }
>> +
>> +       rcu_read_lock();
>> +       ops = rcu_dereference(udp_offloads[uh->dest]);
>> +       if (!ops || !ops->callbacks.gro_receive)
>> +               goto out_unlock;
>> +
>> +       flush = 0;
>> +
>> +       for (p = *head; p; p = p->next) {
>> +               if (!NAPI_GRO_CB(p)->same_flow)
>> +                       continue;
>> +
>> +               uh2 = (struct udphdr   *)(p->data + off);
>> +               if ((*(u32 *)&uh->source ^ *(u32 *)&uh2->source)) {
>> +                       NAPI_GRO_CB(p)->same_flow = 0;
>> +                       continue;
>> +               }
>> +               goto found;
>> +       }
>> +
>> +found:
>> +       skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
>> +       pp = ops->callbacks.gro_receive(head, skb);
>> +
>> +out_unlock:
>> +       rcu_read_unlock();
>> +out:
>> +       NAPI_GRO_CB(skb)->flush |= flush;
>> +
>> +       return pp;
>> +}
>> +
>> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
>> +{
>> +       const struct net_offload *ops;
>> +       __be16 newlen = htons(skb->len - nhoff);
>> +       struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>> +       int err = -ENOSYS;
>> +
>> +       uh->len = newlen;
>> +
>> +       rcu_read_lock();
>> +       ops = rcu_dereference(udp_offloads[uh->dest]);
>> +       if (ops && ops->callbacks.gro_complete)
>> +               err = ops->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
>> +
>> +       rcu_read_unlock();
>> +       return err;
>> +}
>> +
>>  static const struct net_offload udpv4_offload = {
>>         .callbacks = {
>>                 .gso_send_check = udp4_ufo_send_check,
>>                 .gso_segment = udp4_ufo_fragment,
>> +               .gro_receive  = udp_gro_receive,
>> +               .gro_complete = udp_gro_complete,
>>         },
>>  };
>>
>> --
>> 1.7.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 20:19     ` Or Gerlitz
@ 2014-01-07 20:32       ` Eric Dumazet
  2014-01-07 20:37         ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 20:32 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
> >>
> >> +
> >> +#define MAX_UDP_PORT (1 << 16)
> >> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
> >
> > Thats 512 KB of memory.
> > This will greatly impact forwarding performance of UDP packets with
> > random ports, and will increase kernel memory size for embedded devices.
> 
> Re forwarding, are you referring to the case where the forwarded
> packets are encapsulated? packets which are not encapusalted will be
> flushed in the gro receive handler (this went out by mistake in V2 but
> exists in V1)  if skb->encapsulation isn't set.
> 

How do you know encapsulation must be tried for a given incoming
packet ? NIC do not magically sets skb->encapsulation I think...

You access udp_offloads[XXX], with XXX being in 0..65535 range, right ?


> As for encapsulated packets, when you say random ports, are you
> referring to a router which has multiple udp encapsulating protocols
> where each uses different udp port? for this case and also to reduce
> the memory footprint, we can use lookup in a list as done for the L2
> protocols gro handlers in the list_for_each loop of dev_gro_receive(),
> makes sense?

I am speaking of a normal router, running linux kernel, and having
GRO/TSO enabled.

If each incoming UDP packet has to access one extra cache line in a
512KB array, its likely to be an extra cache line miss, if UDP dest
port is mostly random (compared to ports used by very recent UDP
packets)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 20:32       ` Eric Dumazet
@ 2014-01-07 20:37         ` Or Gerlitz
  2014-01-07 21:38           ` Eric Dumazet
  2014-01-07 22:11           ` Jerry Chu
  0 siblings, 2 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-07 20:37 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
>> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>> >>
>> >> +
>> >> +#define MAX_UDP_PORT (1 << 16)
>> >> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>> >
>> > Thats 512 KB of memory.
>> > This will greatly impact forwarding performance of UDP packets with
>> > random ports, and will increase kernel memory size for embedded devices.
>>
>> Re forwarding, are you referring to the case where the forwarded
>> packets are encapsulated? packets which are not encapusalted will be
>> flushed in the gro receive handler (this went out by mistake in V2 but
>> exists in V1)  if skb->encapsulation isn't set.
>>
>
> How do you know encapsulation must be tried for a given incoming
> packet ? NIC do not magically sets skb->encapsulation I think...

So here's the thing, per my understanding we want to GRO only received
**encapsulated** packets whose checksum status is != CHECKSUM_NONE
which means the NIC has some support for doing RX checksum of
encapsulated packets. Per the current convension, in that case the NIC
RX code has to set skb->encapsulation see 6a674e9c75b17 "net: Add
support for hardware-offloaded encapsulation" this convension is
implemented in the current drivers that have HW offloads for
encapsulated packets (bnx2x, i40e and mlx4)


>
> You access udp_offloads[XXX], with XXX being in 0..65535 range, right ?
>
>
>> As for encapsulated packets, when you say random ports, are you
>> referring to a router which has multiple udp encapsulating protocols
>> where each uses different udp port? for this case and also to reduce
>> the memory footprint, we can use lookup in a list as done for the L2
>> protocols gro handlers in the list_for_each loop of dev_gro_receive(),
>> makes sense?
>
> I am speaking of a normal router, running linux kernel, and having
> GRO/TSO enabled.
>
> If each incoming UDP packet has to access one extra cache line in a
> 512KB array, its likely to be an extra cache line miss, if UDP dest
> port is mostly random (compared to ports used by very recent UDP
> packets)
>
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 20:12         ` Or Gerlitz
@ 2014-01-07 21:09           ` Tom Herbert
  2014-01-08  9:45             ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 21:09 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eric Dumazet, Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 12:12 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
> On Tue, Jan 7, 2014 at 10:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
>>> On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
>>
>>> > Why ^ instead of != ?
>>>
>>> The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain
>>> of inet_gro_receive() && tcp_gro_receive(), I guess this might relates
>>> to more efficient assembly code for ^ vs. != and/or the fast/elegant
>>> transitive nature of that operator
>>
>> This trick is only needed/used when many compares are folded into a
>> single conditional :
>>
>> if (a->f1 != b->f1 || a->f2 != b->f2)
>>
>> ->
>>
>> if (((a->f1 ^ b->f1) | (a->f2 ^ b->f2)) != 0)
>>
>> Please do not use XOR for a single compare.
>
> OK, but just out of curiosity -- what's the reasoning? clarity or
> efficiency or both?

Both. Compiling a simple program and comparing alternatives: gcc
produced the identical code for the single conditional (^ vs !=)
using the cmp instruction. Testing the two conditional case like Eric
provided; the second method (using ^) resulted in 4 more instructions,
but only one branch as opposed to two in the first method (!=). Method
#1 has the advantage of short circuiting when the first condition is
true, so organizing the conditionals to maximize the probability of
short circuit could be beneficial.


>>
>>
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
  2014-01-07 16:33   ` Eric Dumazet
  2014-01-07 18:44   ` Tom Herbert
@ 2014-01-07 21:19   ` David Miller
  2014-01-07 21:40     ` Tom Herbert
  2 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2014-01-07 21:19 UTC (permalink / raw)
  To: ogerlitz; +Cc: hkchu, edumazet, herbert, netdev, yanb, shlomop

From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Tue,  7 Jan 2014 17:29:52 +0200

> +#define MAX_UDP_PORT (1 << 16)
> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];

Bloating the kernel up by half a megabyte just for this feature is
beyond unacceptable.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 20:37         ` Or Gerlitz
@ 2014-01-07 21:38           ` Eric Dumazet
  2014-01-08  8:04             ` Or Gerlitz
  2014-01-07 22:11           ` Jerry Chu
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-01-07 21:38 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, 2014-01-07 at 22:37 +0200, Or Gerlitz wrote:

> So here's the thing, per my understanding we want to GRO only received
> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
> which means the NIC has some support for doing RX checksum of
> encapsulated packets. Per the current convension, in that case the NIC
> RX code has to set skb->encapsulation see 6a674e9c75b17 "net: Add
> support for hardware-offloaded encapsulation" this convension is
> implemented in the current drivers that have HW offloads for
> encapsulated packets (bnx2x, i40e and mlx4)

I do not think its true.

Some drivers set CHECKSUM_COMPLETE even for regular UDP frames...

git grep -n CHECKSUM_COMPLETE -- drivers/net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 21:19   ` David Miller
@ 2014-01-07 21:40     ` Tom Herbert
  0 siblings, 0 replies; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 21:40 UTC (permalink / raw)
  To: David Miller
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 1:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <ogerlitz@mellanox.com>
> Date: Tue,  7 Jan 2014 17:29:52 +0200
>
>> +#define MAX_UDP_PORT (1 << 16)
>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>
> Bloating the kernel up by half a megabyte just for this feature is
> beyond unacceptable.

For now we should have at most handful of encapsulation ports (maybe
5?) so a linear lookup in a simple array should suffice. If we get
into more complex scenarios, like using connected UDP sockets to get
encapsulation through NAT boxes, then we would need something more
complex (maybe socket lookup).

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 20:37         ` Or Gerlitz
  2014-01-07 21:38           ` Eric Dumazet
@ 2014-01-07 22:11           ` Jerry Chu
  2014-01-08  8:02             ` Or Gerlitz
  1 sibling, 1 reply; 36+ messages in thread
From: Jerry Chu @ 2014-01-07 22:11 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Eric Dumazet, Or Gerlitz, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 12:37 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
> On Tue, Jan 7, 2014 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
>>> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> > On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>>> >>
>>> >> +
>>> >> +#define MAX_UDP_PORT (1 << 16)
>>> >> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>>> >
>>> > Thats 512 KB of memory.
>>> > This will greatly impact forwarding performance of UDP packets with
>>> > random ports, and will increase kernel memory size for embedded devices.
>>>
>>> Re forwarding, are you referring to the case where the forwarded
>>> packets are encapsulated? packets which are not encapusalted will be
>>> flushed in the gro receive handler (this went out by mistake in V2 but
>>> exists in V1)  if skb->encapsulation isn't set.
>>>
>>
>> How do you know encapsulation must be tried for a given incoming
>> packet ? NIC do not magically sets skb->encapsulation I think...
>
> So here's the thing, per my understanding we want to GRO only received
> **encapsulated** packets whose checksum status is != CHECKSUM_NONE

What's wrong with GRO'ing pkts whose csum == CHECKSUM_NONE?

Also "udp_offload" is a little misleading - you are not trying to GRO UDP
pkts where UDP is the real transport. You are only trying to GRO UDP
encapped TCP pkts.

Jerry

> which means the NIC has some support for doing RX checksum of
> encapsulated packets. Per the current convension, in that case the NIC
> RX code has to set skb->encapsulation see 6a674e9c75b17 "net: Add
> support for hardware-offloaded encapsulation" this convension is
> implemented in the current drivers that have HW offloads for
> encapsulated packets (bnx2x, i40e and mlx4)
>
>
>>
>> You access udp_offloads[XXX], with XXX being in 0..65535 range, right ?
>>
>>
>>> As for encapsulated packets, when you say random ports, are you
>>> referring to a router which has multiple udp encapsulating protocols
>>> where each uses different udp port? for this case and also to reduce
>>> the memory footprint, we can use lookup in a list as done for the L2
>>> protocols gro handlers in the list_for_each loop of dev_gro_receive(),
>>> makes sense?
>>
>> I am speaking of a normal router, running linux kernel, and having
>> GRO/TSO enabled.
>>
>> If each incoming UDP packet has to access one extra cache line in a
>> 512KB array, its likely to be an extra cache line miss, if UDP dest
>> port is mostly random (compared to ports used by very recent UDP
>> packets)
>>
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 20:21     ` Or Gerlitz
@ 2014-01-07 23:04       ` Tom Herbert
  2014-01-08 16:11         ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Herbert @ 2014-01-07 23:04 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On Tue, Jan 7, 2014 at 12:21 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
> On Tue, Jan 7, 2014 at 8:44 PM, Tom Herbert <therbert@google.com> wrote:
>> Or, thanks for posting the patches!
>>
>> We should also support the case where direct encapsulation is being
>> done, that is there is no encapsulation header after UDP and the
>> protocol of the encapsulated packet is inferred by the port number
>> (e.g. GRE/UDP, TCP/UDP, SCTP/UDP, etc.). This is probably an
>> additional field in net_offload struct for next protocol, a little
>> more API, and pretty trivial handlers in UDP code.
>
> The way I have set that follows your guideline under which the
> encapsulating method is derived from the udp destination port in the
> sense that the encapsulating protocol can do what they want in the
> gro_receive/complete handlers entry they plant per that udp port,
> isn't that generic enough?

Direct encapsulation of different protocols could be done using the
same callback functions. Somehow, we just need to pass the protocol
number (or more generally pass private data to the callbacks). In lieu
adding this to net_offload, we could just make the net_offload an
argument to the callbacks and use container_of to access a private
structure.

>
>
>>
>>
>> On Tue, Jan 7, 2014 at 7:29 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>>> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
>>> being able to coalesce packets which encapsulate packets belonging to
>>> the same TCP session.
>>>
>>> For GRO purposes, the destination UDP port takes the role of the ether type
>>> field in the ethernet header or the next protocol in the IP header.
>>>
>>> The UDP GRO handler will only attempt to coalesce packets whose destination
>>> port is registered to have gro handler.
>>>
>>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>>> ---
>>>  include/net/protocol.h |    6 ++++
>>>  net/ipv4/protocol.c    |   21 ++++++++++++++
>>>  net/ipv4/udp_offload.c |   69 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 96 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/include/net/protocol.h b/include/net/protocol.h
>>> index fbf7676..d776c08 100644
>>> --- a/include/net/protocol.h
>>> +++ b/include/net/protocol.h
>>> @@ -92,6 +92,10 @@ extern const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS];
>>>  extern const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS];
>>>  extern const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS];
>>>
>>> +
>>> +#define MAX_UDP_PORT (1 << 16)
>>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>>> +
>>>  #if IS_ENABLED(CONFIG_IPV6)
>>>  extern const struct inet6_protocol __rcu *inet6_protos[MAX_INET_PROTOS];
>>>  #endif
>>> @@ -102,6 +106,8 @@ int inet_add_offload(const struct net_offload *prot, unsigned char num);
>>>  int inet_del_offload(const struct net_offload *prot, unsigned char num);
>>>  void inet_register_protosw(struct inet_protosw *p);
>>>  void inet_unregister_protosw(struct inet_protosw *p);
>>> +int udp_add_offload(const struct net_offload *prot, __be16 port);
>>> +int udp_del_offload(const struct net_offload *prot, __be16 port);
>>>
>>>  #if IS_ENABLED(CONFIG_IPV6)
>>>  int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
>>> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
>>> index 46d6a1c..426eae5 100644
>>> --- a/net/ipv4/protocol.c
>>> +++ b/net/ipv4/protocol.c
>>> @@ -30,6 +30,7 @@
>>>
>>>  const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
>>>  const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
>>> +const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT] __read_mostly;
>>>
>>>  int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
>>>  {
>>> @@ -51,6 +52,13 @@ int inet_add_offload(const struct net_offload *prot, unsigned char protocol)
>>>  }
>>>  EXPORT_SYMBOL(inet_add_offload);
>>>
>>> +int udp_add_offload(const struct net_offload *prot, __be16 port)
>>> +{
>>> +       return !cmpxchg((const struct net_offload **)&udp_offloads[port],
>>> +                       NULL, prot) ? 0 : -1;
>>> +}
>>> +EXPORT_SYMBOL(udp_add_offload);
>>> +
>>>  int inet_del_protocol(const struct net_protocol *prot, unsigned char protocol)
>>>  {
>>>         int ret;
>>> @@ -76,3 +84,16 @@ int inet_del_offload(const struct net_offload *prot, unsigned char protocol)
>>>         return ret;
>>>  }
>>>  EXPORT_SYMBOL(inet_del_offload);
>>> +
>>> +int udp_del_offload(const struct net_offload *prot, __be16 port)
>>> +{
>>> +       int ret;
>>> +
>>> +       ret = (cmpxchg((const struct net_offload **)&udp_offloads[port],
>>> +                      prot, NULL) == prot) ? 0 : -1;
>>> +
>>> +       synchronize_net();
>>> +
>>> +       return ret;
>>> +}
>>> +EXPORT_SYMBOL(udp_del_offload);
>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>> index 79c62bd..0a8fdd6 100644
>>> --- a/net/ipv4/udp_offload.c
>>> +++ b/net/ipv4/udp_offload.c
>>> @@ -89,10 +89,79 @@ out:
>>>         return segs;
>>>  }
>>>
>>> +
>>> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>>> +{
>>> +       const struct net_offload *ops;
>>> +       struct sk_buff *p, **pp = NULL;
>>> +       struct udphdr *uh, *uh2;
>>> +       unsigned int hlen, off;
>>> +       int flush = 1;
>>> +
>>> +       off  = skb_gro_offset(skb);
>>> +       hlen = off + sizeof(*uh);
>>> +       uh   = skb_gro_header_fast(skb, off);
>>> +       if (skb_gro_header_hard(skb, hlen)) {
>>> +               uh = skb_gro_header_slow(skb, hlen, off);
>>> +               if (unlikely(!uh))
>>> +                       goto out;
>>> +       }
>>> +
>>> +       rcu_read_lock();
>>> +       ops = rcu_dereference(udp_offloads[uh->dest]);
>>> +       if (!ops || !ops->callbacks.gro_receive)
>>> +               goto out_unlock;
>>> +
>>> +       flush = 0;
>>> +
>>> +       for (p = *head; p; p = p->next) {
>>> +               if (!NAPI_GRO_CB(p)->same_flow)
>>> +                       continue;
>>> +
>>> +               uh2 = (struct udphdr   *)(p->data + off);
>>> +               if ((*(u32 *)&uh->source ^ *(u32 *)&uh2->source)) {
>>> +                       NAPI_GRO_CB(p)->same_flow = 0;
>>> +                       continue;
>>> +               }
>>> +               goto found;
>>> +       }
>>> +
>>> +found:
>>> +       skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
>>> +       pp = ops->callbacks.gro_receive(head, skb);
>>> +
>>> +out_unlock:
>>> +       rcu_read_unlock();
>>> +out:
>>> +       NAPI_GRO_CB(skb)->flush |= flush;
>>> +
>>> +       return pp;
>>> +}
>>> +
>>> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
>>> +{
>>> +       const struct net_offload *ops;
>>> +       __be16 newlen = htons(skb->len - nhoff);
>>> +       struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>>> +       int err = -ENOSYS;
>>> +
>>> +       uh->len = newlen;
>>> +
>>> +       rcu_read_lock();
>>> +       ops = rcu_dereference(udp_offloads[uh->dest]);
>>> +       if (ops && ops->callbacks.gro_complete)
>>> +               err = ops->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
>>> +
>>> +       rcu_read_unlock();
>>> +       return err;
>>> +}
>>> +
>>>  static const struct net_offload udpv4_offload = {
>>>         .callbacks = {
>>>                 .gso_send_check = udp4_ufo_send_check,
>>>                 .gso_segment = udp4_ufo_fragment,
>>> +               .gro_receive  = udp_gro_receive,
>>> +               .gro_complete = udp_gro_complete,
>>>         },
>>>  };
>>>
>>> --
>>> 1.7.1
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 22:11           ` Jerry Chu
@ 2014-01-08  8:02             ` Or Gerlitz
  2014-01-09  3:12               ` Jerry Chu
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08  8:02 UTC (permalink / raw)
  To: Jerry Chu, Or Gerlitz
  Cc: Eric Dumazet, Eric Dumazet, Herbert Xu, netdev, David Miller,
	Yan Burman, Shlomo Pongratz

On 08/01/2014 00:11, Jerry Chu wrote:
> On Tue, Jan 7, 2014 at 12:37 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
>> On Tue, Jan 7, 2014 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
>>>> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>>>>>> +
>>>>>> +#define MAX_UDP_PORT (1 << 16)
>>>>>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>>>>> Thats 512 KB of memory.
>>>>> This will greatly impact forwarding performance of UDP packets with
>>>>> random ports, and will increase kernel memory size for embedded devices.
>>>> Re forwarding, are you referring to the case where the forwarded
>>>> packets are encapsulated? packets which are not encapusalted will be
>>>> flushed in the gro receive handler (this went out by mistake in V2 but
>>>> exists in V1)  if skb->encapsulation isn't set.
>>>>
>>> How do you know encapsulation must be tried for a given incoming
>>> packet ? NIC do not magically sets skb->encapsulation I think...
>> So here's the thing, per my understanding we want to GRO only received
>> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
> What's wrong with GRO'ing pkts whose csum == CHECKSUM_NONE?

I am not sure, intuitively it sounds a bit wrong to me, empirically, it 
doesn't work for udp encapsulated  / vxlan
traffic, I got drops from the tcp stack in tcp_rcv_established() -- if  
GRO-ed packets carry CHECKSUM_NONE
we arrive to the csum_error label, which means that 
tcp_checksum_complete_user() failed for them


> Also "udp_offload" is a little misleading - you are not trying to GRO UDP
> pkts where UDP is the real transport. You are only trying to GRO UDP encapped TCP pkts.

Indeed -- however, I just plugged into what was there for GSO, e.g stack 
will not do GSO for plain UDP
packets, only for those who encapsulate something the code that does 
this is udp_offloads.c -- any suggestion
how to phrase/frame the change you envision?


>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 21:38           ` Eric Dumazet
@ 2014-01-08  8:04             ` Or Gerlitz
       [not found]               ` <1389182291.26646.79.camel@edumazet-glaptop2.roam.corp.google.com>
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08  8:04 UTC (permalink / raw)
  To: Eric Dumazet, Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, netdev, David Miller,
	Yan Burman, Shlomo Pongratz

On 07/01/2014 23:38, Eric Dumazet wrote:
> On Tue, 2014-01-07 at 22:37 +0200, Or Gerlitz wrote:
>
>> So here's the thing, per my understanding we want to GRO only received
>> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
>> which means the NIC has some support for doing RX checksum of
>> encapsulated packets. Per the current convension, in that case the NIC
>> RX code has to set skb->encapsulation see 6a674e9c75b17 "net: Add
>> support for hardware-offloaded encapsulation" this convension is
>> implemented in the current drivers that have HW offloads for
>> encapsulated packets (bnx2x, i40e and mlx4)
> I do not think its true.
>
> Some drivers set CHECKSUM_COMPLETE even for regular UDP frames...
>
> git grep -n CHECKSUM_COMPLETE -- drivers/net
>
>
>
Eric, the point I was trying to make is that as long as the driver set a 
value which is different from CHECKSUM_NONE
for an skb who carry encapsulated packet, we want skb->encapsulation to 
be set, per the architecture dictated by the above commit.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic
  2014-01-07 21:09           ` Tom Herbert
@ 2014-01-08  9:45             ` Or Gerlitz
  0 siblings, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08  9:45 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Eric Dumazet, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, Yan Burman, Shlomo Pongratz

On 07/01/2014 23:09, Tom Herbert wrote:
> On Tue, Jan 7, 2014 at 12:12 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
>> On Tue, Jan 7, 2014 at 10:02 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> On Tue, 2014-01-07 at 21:43 +0200, Or Gerlitz wrote:
>>>> On Tue, Jan 7, 2014 at 8:08 PM, Tom Herbert <therbert@google.com> wrote:
>>>>> Why ^ instead of != ?
>>>> The XOR approach is very popular in the GRO stack, e.g see the IPv4 chain
>>>> of inet_gro_receive() && tcp_gro_receive(), I guess this might relates
>>>> to more efficient assembly code for ^ vs. != and/or the fast/elegant
>>>> transitive nature of that operator
>>> This trick is only needed/used when many compares are folded into a
>>> single conditional :
>>>
>>> if (a->f1 != b->f1 || a->f2 != b->f2)
>>>
>>> ->
>>>
>>> if (((a->f1 ^ b->f1) | (a->f2 ^ b->f2)) != 0)
>>>
>>> Please do not use XOR for a single compare.
>> OK, but just out of curiosity -- what's the reasoning? clarity or
>> efficiency or both?
> Both. Compiling a simple program and comparing alternatives: gcc
> produced the identical code for the single conditional (^ vs !=)
> using the cmp instruction. Testing the two conditional case like Eric
> provided; the second method (using ^) resulted in 4 more instructions,
> but only one branch as opposed to two in the first method (!=). Method
> #1 has the advantage of short circuiting when the first condition is
> true, so organizing the conditionals to maximize the probability of
> short circuit could be beneficial.

OK, I will follow that.

Or.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
       [not found]               ` <1389182291.26646.79.camel@edumazet-glaptop2.roam.corp.google.com>
@ 2014-01-08 12:15                 ` Or Gerlitz
  0 siblings, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08 12:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Or Gerlitz, netdev

On 08/01/2014 13:58, Eric Dumazet wrote:
> On Wed, 2014-01-08 at 10:04 +0200, Or Gerlitz wrote:
>> On 07/01/2014 23:38, Eric Dumazet wrote:
>>> On Tue, 2014-01-07 at 22:37 +0200, Or Gerlitz wrote:
>>>
>>>> So here's the thing, per my understanding we want to GRO only received
>>>> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
>>>> which means the NIC has some support for doing RX checksum of
>>>> encapsulated packets. Per the current convension, in that case the NIC
>>>> RX code has to set skb->encapsulation see 6a674e9c75b17 "net: Add
>>>> support for hardware-offloaded encapsulation" this convension is
>>>> implemented in the current drivers that have HW offloads for
>>>> encapsulated packets (bnx2x, i40e and mlx4)
>>> I do not think its true.
>>>
>>> Some drivers set CHECKSUM_COMPLETE even for regular UDP frames...
>>>
>>> git grep -n CHECKSUM_COMPLETE -- drivers/net
>>>
>>>
>>>
>> Eric, the point I was trying to make is that as long as the driver set a
>> value which is different from CHECKSUM_NONE
>> for an skb who carry encapsulated packet, we want skb->encapsulation to
>> be set, per the architecture dictated by the above commit.
> Then this point is obviously wrong. Your interpretation is wrong, I am very sorry.
> skb->encapsulation _might_ be set, only if the NIC is performing header
> analysis. CHECKSUM_COMPLETE support doesn't require header analysis.
>
>
>
>
fair enough && thanks for shedding more light on this - so in that 
respect, the gro handler for udp encapsulated packets will not flush skb 
who is either marked with CHECKSUM_COMPLETE or has skb->encapsulation set.

Or.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-07 23:04       ` Tom Herbert
@ 2014-01-08 16:11         ` Or Gerlitz
  2014-01-08 16:29           ` Tom Herbert
  0 siblings, 1 reply; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08 16:11 UTC (permalink / raw)
  To: Tom Herbert, Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On 08/01/2014 01:04, Tom Herbert wrote:
> On Tue, Jan 7, 2014 at 12:21 PM, Or Gerlitz<or.gerlitz@gmail.com>  wrote:
>> >On Tue, Jan 7, 2014 at 8:44 PM, Tom Herbert<therbert@google.com>  wrote:
>>> >>Or, thanks for posting the patches!
>>> >>
>>> >>We should also support the case where direct encapsulation is being
>>> >>done, that is there is no encapsulation header after UDP and the
>>> >>protocol of the encapsulated packet is inferred by the port number
>>> >>(e.g. GRE/UDP, TCP/UDP, SCTP/UDP, etc.). This is probably an
>>> >>additional field in net_offload struct for next protocol, a little
>>> >>more API, and pretty trivial handlers in UDP code.
>> >
>> >The way I have set that follows your guideline under which the
>> >encapsulating method is derived from the udp destination port in the
>> >sense that the encapsulating protocol can do what they want in the
>> >gro_receive/complete handlers entry they plant per that udp port,
>> >isn't that generic enough?
> Direct encapsulation of different protocols could be done using the
> same callback functions. Somehow, we just need to pass the protocol
> number (or more generally pass private data to the callbacks). In lieu
> adding this to net_offload, we could just make the net_offload an
> argument to the callbacks and use container_of to access a private structure.
>

Tom,

OK, so I understand that you want the infrastructure to allow for direct 
udp encapsulation in the sense that there is no specific encapsulation 
header following the udp header.

Take for example encapsulating TCP over UDP port X, the layer/driver 
that wants to enable gro for such traffic will register a udp gro 
handler for port X and in their gro receive/complete
callbacks would just assume that following the UDP header there's TCP 
header and they will then issue a lookup in the inet gro handlers array 
to get the TCP gro handler, problem solved, agree?

Or.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 16:11         ` Or Gerlitz
@ 2014-01-08 16:29           ` Tom Herbert
  2014-01-08 16:31             ` Or Gerlitz
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Herbert @ 2014-01-08 16:29 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

> Tom,
>
> OK, so I understand that you want the infrastructure to allow for direct udp
> encapsulation in the sense that there is no specific encapsulation header
> following the udp header.
>
> Take for example encapsulating TCP over UDP port X, the layer/driver that
> wants to enable gro for such traffic will register a udp gro handler for
> port X and in their gro receive/complete
> callbacks would just assume that following the UDP header there's TCP header
> and they will then issue a lookup in the inet gro handlers array to get the
> TCP gro handler, problem solved, agree?
>
Yes, that could work, but except for specifying a different protocol
number, the code to do direct encapsulation with GRO would be
identical for TCP/UDP, GRE/UDP, IPIP/UDP, ... we should be able to
avoid redundancy.  Let's get the basic UDP GRO support in, and then I
think adding support for direct encapsulation can be done without API
change in your proposed scheme.

> Or.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 16:29           ` Tom Herbert
@ 2014-01-08 16:31             ` Or Gerlitz
  0 siblings, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-08 16:31 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Or Gerlitz, Jerry Chu, Eric Dumazet, Herbert Xu,
	Linux Netdev List, David Miller, Yan Burman, Shlomo Pongratz

On 08/01/2014 18:29, Tom Herbert wrote:
> [...] Let's get the basic UDP GRO support in, and then I think adding support for direct encapsulation can be done without API change in your proposed scheme
sounds good to me

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08  8:02             ` Or Gerlitz
@ 2014-01-09  3:12               ` Jerry Chu
  2014-01-09  6:35                 ` Or Gerlitz
  2014-01-09  7:19                 ` Or Gerlitz
  0 siblings, 2 replies; 36+ messages in thread
From: Jerry Chu @ 2014-01-09  3:12 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, Eric Dumazet, Eric Dumazet, Herbert Xu, netdev,
	David Miller, Yan Burman, Shlomo Pongratz

On Wed, Jan 8, 2014 at 12:02 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 08/01/2014 00:11, Jerry Chu wrote:
>>
>> On Tue, Jan 7, 2014 at 12:37 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
>>>
>>> On Tue, Jan 7, 2014 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>> wrote:
>>>>
>>>> On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
>>>>>
>>>>> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>>>>>>>
>>>>>>> +
>>>>>>> +#define MAX_UDP_PORT (1 << 16)
>>>>>>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>>>>>>
>>>>>> Thats 512 KB of memory.
>>>>>> This will greatly impact forwarding performance of UDP packets with
>>>>>> random ports, and will increase kernel memory size for embedded
>>>>>> devices.
>>>>>
>>>>> Re forwarding, are you referring to the case where the forwarded
>>>>> packets are encapsulated? packets which are not encapusalted will be
>>>>> flushed in the gro receive handler (this went out by mistake in V2 but
>>>>> exists in V1)  if skb->encapsulation isn't set.
>>>>>
>>>> How do you know encapsulation must be tried for a given incoming
>>>> packet ? NIC do not magically sets skb->encapsulation I think...
>>>
>>> So here's the thing, per my understanding we want to GRO only received
>>> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
>>
>> What's wrong with GRO'ing pkts whose csum == CHECKSUM_NONE?
>
>
> I am not sure, intuitively it sounds a bit wrong to me, empirically, it
> doesn't work for udp encapsulated  / vxlan
> traffic, I got drops from the tcp stack in tcp_rcv_established() -- if
> GRO-ed packets carry CHECKSUM_NONE
> we arrive to the csum_error label, which means that
> tcp_checksum_complete_user() failed for them

This is odd because if pkts have been aggregated successfully,
tcp4_gro_receive() should've skb_checksum() and turned CHECKSUM_NONE
into CHECKSUM_UNNECESSARY. (I think i've already tested this
case with my GRE-GRO patch on a NIC that sends up pkts w/ CHECKSUM_NONE.

But granted there are a lot of csum related bugs in the current code. I just
spent half a day scratching my head on a very low thruput number with
my GRE patch over a GRE tunnel w/ csum flag on. I just tracked it down
to be buggy TSO/GRE code that will produce bad csum on the tx side.

>
>
>
>> Also "udp_offload" is a little misleading - you are not trying to GRO UDP
>> pkts where UDP is the real transport. You are only trying to GRO UDP
>> encapped TCP pkts.
>
>
> Indeed -- however, I just plugged into what was there for GSO, e.g stack
> will not do GSO for plain UDP
> packets, only for those who encapsulate something the code that does this is
> udp_offloads.c -- any suggestion
> how to phrase/frame the change you envision?

There is already udpv4_offload for real udp gso (ufo) offload. How about
"udp_encap_offload" for your stuff?

>
>
>>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-09  3:12               ` Jerry Chu
@ 2014-01-09  6:35                 ` Or Gerlitz
  2014-01-09  7:19                 ` Or Gerlitz
  1 sibling, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-09  6:35 UTC (permalink / raw)
  To: Jerry Chu, Eric Dumazet, David Miller
  Cc: Eric Dumazet, Herbert Xu, netdev, Yan Burman, Shlomo Pongratz

On 09/01/2014 05:12, Jerry Chu wrote:
> On Wed, Jan 8, 2014 at 12:02 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> On 08/01/2014 00:11, Jerry Chu wrote:
>>> On Tue, Jan 7, 2014 at 12:37 PM, Or Gerlitz <or.gerlitz@gmail.com> wrote:
>>>> On Tue, Jan 7, 2014 at 10:32 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>>> wrote:
>>>>> On Tue, 2014-01-07 at 22:19 +0200, Or Gerlitz wrote:
>>>>>> On Tue, Jan 7, 2014 at 6:33 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>>>>> wrote:
>>>>>>> On Tue, 2014-01-07 at 17:29 +0200, Or Gerlitz wrote:
>>>>>>>> +
>>>>>>>> +#define MAX_UDP_PORT (1 << 16)
>>>>>>>> +extern const struct net_offload __rcu *udp_offloads[MAX_UDP_PORT];
>>>>>>> Thats 512 KB of memory.
>>>>>>> This will greatly impact forwarding performance of UDP packets with
>>>>>>> random ports, and will increase kernel memory size for embedded
>>>>>>> devices.
>>>>>> Re forwarding, are you referring to the case where the forwarded
>>>>>> packets are encapsulated? packets which are not encapusalted will be
>>>>>> flushed in the gro receive handler (this went out by mistake in V2 but
>>>>>> exists in V1)  if skb->encapsulation isn't set.
>>>>>>
>>>>> How do you know encapsulation must be tried for a given incoming
>>>>> packet ? NIC do not magically sets skb->encapsulation I think...
>>>> So here's the thing, per my understanding we want to GRO only received
>>>> **encapsulated** packets whose checksum status is != CHECKSUM_NONE
>>> What's wrong with GRO'ing pkts whose csum == CHECKSUM_NONE?
>>
>> I am not sure, intuitively it sounds a bit wrong to me, empirically, it
>> doesn't work for udp encapsulated  / vxlan
>> traffic, I got drops from the tcp stack in tcp_rcv_established() -- if
>> GRO-ed packets carry CHECKSUM_NONE
>> we arrive to the csum_error label, which means that
>> tcp_checksum_complete_user() failed for them
> This is odd because if pkts have been aggregated successfully,
> tcp4_gro_receive() should've skb_checksum() and turned CHECKSUM_NONE
> into CHECKSUM_UNNECESSARY. (I think i've already tested this
> case with my GRE-GRO patch on a NIC that sends up pkts w/ CHECKSUM_NONE.
>
> But granted there are a lot of csum related bugs in the current code. I just spent half a day scratching my head on a very low thruput number with my GRE patch over a GRE tunnel w/ csum flag on. I just tracked it down to be buggy TSO/GRE code that will produce bad csum on the tx side.

The "there are a lot of csum related bugs in the current code" comment 
sounds really bad, how do we get into a better place? can you point on 
the buggy TSO code that produced bad csum on the tx side?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next V2 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-09  3:12               ` Jerry Chu
  2014-01-09  6:35                 ` Or Gerlitz
@ 2014-01-09  7:19                 ` Or Gerlitz
  1 sibling, 0 replies; 36+ messages in thread
From: Or Gerlitz @ 2014-01-09  7:19 UTC (permalink / raw)
  To: Jerry Chu
  Cc: Eric Dumazet, Eric Dumazet, Herbert Xu, netdev, David Miller,
	Yan Burman, Shlomo Pongratz

On 09/01/2014 05:12, Jerry Chu wrote:
>> >
>> >
>>> >>Also "udp_offload" is a little misleading - you are not trying to GRO UDP
>>> >>pkts where UDP is the real transport. You are only trying to GRO UDP
>>> >>encapped TCP pkts.
>> >
>> >
>> >Indeed -- however, I just plugged into what was there for GSO, e.g stack
>> >will not do GSO for plain UDP
>> >packets, only for those who encapsulate something the code that does this is
>> >udp_offloads.c -- any suggestion
>> >how to phrase/frame the change you envision?
> There is already udpv4_offload for real udp gso (ufo) offload. How about "udp_encap_offload" for your stuff?
>

well, I don't see how it can work... the problem with your suggestion is 
that we'll have two structures to compete on the UDP protocol entry in 
the inet offloads structure, one that holds the current udp gso offloads 
and one with the udp gro offloads.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-01-09  7:19 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-07 15:29 [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-07 15:29 ` [PATCH net-next V2 1/3] " Or Gerlitz
2014-01-07 16:33   ` Eric Dumazet
2014-01-07 20:19     ` Or Gerlitz
2014-01-07 20:32       ` Eric Dumazet
2014-01-07 20:37         ` Or Gerlitz
2014-01-07 21:38           ` Eric Dumazet
2014-01-08  8:04             ` Or Gerlitz
     [not found]               ` <1389182291.26646.79.camel@edumazet-glaptop2.roam.corp.google.com>
2014-01-08 12:15                 ` Or Gerlitz
2014-01-07 22:11           ` Jerry Chu
2014-01-08  8:02             ` Or Gerlitz
2014-01-09  3:12               ` Jerry Chu
2014-01-09  6:35                 ` Or Gerlitz
2014-01-09  7:19                 ` Or Gerlitz
2014-01-07 18:44   ` Tom Herbert
2014-01-07 20:21     ` Or Gerlitz
2014-01-07 23:04       ` Tom Herbert
2014-01-08 16:11         ` Or Gerlitz
2014-01-08 16:29           ` Tom Herbert
2014-01-08 16:31             ` Or Gerlitz
2014-01-07 21:19   ` David Miller
2014-01-07 21:40     ` Tom Herbert
2014-01-07 15:29 ` [PATCH net-next V2 2/3] net: Export gro_find_by_type helpers Or Gerlitz
2014-01-07 15:29 ` [PATCH net-next V2 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2014-01-07 16:34   ` Eric Dumazet
2014-01-07 19:43     ` Or Gerlitz
2014-01-07 20:04       ` Eric Dumazet
2014-01-07 20:10         ` Or Gerlitz
2014-01-07 18:08   ` Tom Herbert
2014-01-07 19:43     ` Or Gerlitz
2014-01-07 20:02       ` Eric Dumazet
2014-01-07 20:12         ` Or Gerlitz
2014-01-07 21:09           ` Tom Herbert
2014-01-08  9:45             ` Or Gerlitz
     [not found]     ` <CAJZOPZLsMvmHwmMjhsuKb__2HncMXMm=p6UFnT4XX5d8hZnGxw@mail.gmail.com>
2014-01-07 19:52       ` Tom Herbert
2014-01-07 16:45 ` [PATCH net-next V2 0/3] net: Add GRO support for UDP encapsulating protocols Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.