netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
@ 2014-01-08 20:34 Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

This series adds GRO handlers for protocols that do UDP encapsulation, with the
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

On my setup, which is net-next (now with the mlx4 vxlan offloads patches) --
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs

patches done against net-next 80077935cad "Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next"

v2 --> v3 changes:

 - moved to use linked list to store the udp gro handlers, this solves the
   problem of consuming 512KB of memory for the handlers.

 - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
   on a packet, this solves the problem of udp encapsulated packets whose inner VM
   packet is udp and happen to carry a port which has registered offloads - and flush it.

 - invoke the udp offload protocol registration and de-registration from the vxlan driver
   in a sleepable context 

For unclear some reason I got this warning when the vxlan driver deletes the
udp offload structure 

-----------[ cut here ]------------
WARNING: CPU: 2 PID: 19 at kernel/rcu/tree.c:2127 rcu_do_batch+0x359/0x370()
Modules linked in: veth vxlan ip_tunnel bridge stp llc rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en ptp pps_core mlx4_ib ib_sa ib_mad ib_core mlx4_core nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 nfs lockd autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_mod joydev microcode pcspkr virtio_balloon virtio_net i2c_piix4 button ext3 jbd virtio_pci virtio_ring virtio uhci_hcd [last unloaded: ib_core]
CPU: 2 PID: 19 Comm: rcuc/2 Not tainted 3.13.0-rc6+ #278
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
 000000000000084f ffff88021651dd68 ffffffff81498b52 000000000000084f
 0000000000000000 ffff88021651dda8 ffffffff81059f2c ffff88021651dd98
 ffff88022010d6a0 ffff88022010d6c8 0000000000000246 ffffffff81844200
Call Trace:
 [<ffffffff81498b52>] dump_stack+0x51/0x77
 [<ffffffff81059f2c>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff81059f7a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff810a49c9>] rcu_do_batch+0x359/0x370
 [<ffffffff810a4ade>] rcu_cpu_kthread+0xfe/0x2b0
 [<ffffffff81083697>] smpboot_thread_fn+0x167/0x260
 [<ffffffff81083530>] ? smpboot_create_threads+0x80/0x80
 [<ffffffff8107c36e>] kthread+0xce/0xf0
 [<ffffffff8107c2a0>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff8149de7c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8107c2a0>] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace f98588ea45ef3f4c ]---

Or


Or Gerlitz (3):
  net: Add GRO support for UDP encapsulating protocols
  net: Export gro_find_by_type helpers
  net: Add GRO support for vxlan traffic

 drivers/net/vxlan.c       |  129 ++++++++++++++++++++++++++++++++++++++++++---
 include/linux/netdevice.h |   10 +++-
 include/net/protocol.h    |    3 +
 include/net/vxlan.h       |    1 +
 net/core/dev.c            |    3 +
 net/ipv4/udp_offload.c    |  129 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 267 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
  2014-01-08 20:39   ` Or Gerlitz
                     ` (2 more replies)
  2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
  2 siblings, 3 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/linux/netdevice.h |   10 +++-
 include/net/protocol.h    |    3 +
 net/core/dev.c            |    1 +
 net/ipv4/udp_offload.c    |  129 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 142 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a2a70cc..360551a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1652,7 +1652,9 @@ struct napi_gro_cb {
 	unsigned long age;
 
 	/* Used in ipv6_gro_receive() */
-	int	proto;
+	u16	proto;
+
+	u16	udp_mark;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
@@ -1691,6 +1693,12 @@ struct packet_offload {
 	struct list_head	 list;
 };
 
+struct udp_offload {
+	__be16			 port;
+	struct offload_callbacks callbacks;
+	struct list_head	 list;
+};
+
 /* often modified stats are per cpu, other are shared (netdev->stats) */
 struct pcpu_sw_netstats {
 	u64     rx_packets;
diff --git a/include/net/protocol.h b/include/net/protocol.h
index fbf7676..fe9af94 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
 void inet_register_protosw(struct inet_protosw *p);
 void inet_unregister_protosw(struct inet_protosw *p);
 
+void udp_add_offload(struct udp_offload *prot);
+void udp_del_offload(struct udp_offload *prot);
+
 #if IS_ENABLED(CONFIG_IPV6)
 int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
 int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
diff --git a/net/core/dev.c b/net/core/dev.c
index ce01847..11f7acf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		NAPI_GRO_CB(skb)->same_flow = 0;
 		NAPI_GRO_CB(skb)->flush = 0;
 		NAPI_GRO_CB(skb)->free = 0;
+		NAPI_GRO_CB(skb)->udp_mark = 0;
 
 		pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
 		break;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 79c62bd..2846ade 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -13,6 +13,16 @@
 #include <linux/skbuff.h>
 #include <net/udp.h>
 #include <net/protocol.h>
+/*
+struct udp_offload {
+	__be16			 port;
+	struct offload_callbacks callbacks;
+	struct list_head	 list;
+};
+*/
+
+static DEFINE_SPINLOCK(udp_offload_lock);
+static struct list_head udp_offload_base __read_mostly;
 
 static int udp4_ufo_send_check(struct sk_buff *skb)
 {
@@ -89,14 +99,133 @@ out:
 	return segs;
 }
 
+void udp_add_offload(struct udp_offload *uo)
+{
+	struct list_head *head = &udp_offload_base;
+
+	spin_lock(&udp_offload_lock);
+	list_add_rcu(&uo->list, head);
+	spin_unlock(&udp_offload_lock);
+}
+EXPORT_SYMBOL(udp_add_offload);
+
+void udp_del_offload(struct udp_offload *uo)
+{
+	struct list_head *head = &udp_offload_base;
+	struct udp_offload *uo1;
+
+	spin_lock(&udp_offload_lock);
+	list_for_each_entry(uo1, head, list) {
+		if (uo == uo1) {
+			list_del_rcu(&uo->list);
+			goto out;
+		}
+	}
+
+	pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
+out:
+	spin_unlock(&udp_offload_lock);
+
+	synchronize_net();
+}
+EXPORT_SYMBOL(udp_del_offload);
+
+static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct list_head *ohead = &udp_offload_base;
+	struct udp_offload *poffload;
+	struct sk_buff *p, **pp = NULL;
+	struct udphdr *uh, *uh2;
+	unsigned int hlen, off;
+	int flush = 1;
+
+	if (NAPI_GRO_CB(skb)->udp_mark ||
+	    (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
+		goto out;
+
+	/* mark that this skb passed once through the udp gro layer */
+	NAPI_GRO_CB(skb)->udp_mark = 1;
+
+	off  = skb_gro_offset(skb);
+	hlen = off + sizeof(*uh);
+	uh   = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		uh = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!uh))
+			goto out;
+	}
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(poffload, ohead, list) {
+		if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
+			continue;
+		break;
+	}
+
+	if (&poffload->list == ohead)
+		goto out_unlock;
+
+	flush = 0;
+
+	for (p = *head; p; p = p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		uh2 = (struct udphdr   *)(p->data + off);
+		if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+		goto found;
+	}
+
+found:
+	skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
+	pp = poffload->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	NAPI_GRO_CB(skb)->flush |= flush;
+
+	return pp;
+}
+
+static int udp_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct list_head *ohead = &udp_offload_base;
+	struct udp_offload *poffload;
+	__be16 newlen = htons(skb->len - nhoff);
+	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
+	int err = -ENOSYS;
+
+	uh->len = newlen;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(poffload, ohead, list) {
+		if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
+			continue;
+		break;
+	}
+
+	if (&poffload->list != ohead)
+		err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
+
+	rcu_read_unlock();
+	return err;
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_send_check = udp4_ufo_send_check,
 		.gso_segment = udp4_ufo_fragment,
+		.gro_receive  =	udp_gro_receive,
+		.gro_complete =	udp_gro_complete,
 	},
 };
 
 int __init udpv4_offload_init(void)
 {
+	INIT_LIST_HEAD(&udp_offload_base);
 	return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers
  2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
  2 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 net/core/dev.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 11f7acf..a052e54 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3936,6 +3936,7 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(gro_find_receive_by_type);
 
 struct packet_offload *gro_find_complete_by_type(__be16 type)
 {
@@ -3949,6 +3950,7 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
 	}
 	return NULL;
 }
+EXPORT_SYMBOL(gro_find_complete_by_type);
 
 static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
  2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
  2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
  2014-01-08 22:09   ` Eric Dumazet
  2014-01-08 22:11   ` Eric Dumazet
  2 siblings, 2 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

Add gro handlers for vxlan using the udp gro infrastructure

On my setup, which is net-next (now with the mlx4 vxlan offloads patches) --
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs

--> UDP/VXLAN GRO disabled
$ netperf  -H 192.168.52.147 -c -C

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195

--> UDP/VXLAN GRO enabled

$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/vxlan.c |  129 ++++++++++++++++++++++++++++++++++++++++++++++++---
 include/net/vxlan.h |    1 +
 2 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..e132f19 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -40,6 +40,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/vxlan.h>
+#include <net/protocol.h>
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
 #include <net/addrconf.h>
@@ -554,10 +555,111 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	return 1;
 }
 
+static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+	struct sk_buff *p, **pp = NULL;
+	struct vxlanhdr *vh, *vh2;
+	struct ethhdr *eh, *eh2;
+	unsigned int hlen, off_vx, off_eth;
+	const struct packet_offload *ptype;
+	__be16 type;
+	int flush = 1;
+
+	off_vx = skb_gro_offset(skb);
+	hlen = off_vx + sizeof(*vh);
+	vh   = skb_gro_header_fast(skb, off_vx);
+	if (skb_gro_header_hard(skb, hlen)) {
+		vh = skb_gro_header_slow(skb, hlen, off_vx);
+		if (unlikely(!vh))
+			goto out;
+	}
+	skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
+
+	off_eth = skb_gro_offset(skb);
+	hlen = off_eth + sizeof(*eh);
+	eh   = skb_gro_header_fast(skb, off_eth);
+	if (skb_gro_header_hard(skb, hlen)) {
+		eh = skb_gro_header_slow(skb, hlen, off_eth);
+		if (unlikely(!eh))
+			goto out;
+	}
+
+	flush = 0;
+
+	for (p = *head; p; p = p->next) {
+		if (!NAPI_GRO_CB(p)->same_flow)
+			continue;
+
+		vh2 = (struct vxlanhdr *)(p->data + off_vx);
+		eh2 = (struct ethhdr   *)(p->data + off_eth);
+		if (vh->vx_vni != vh2->vx_vni || compare_ether_header(eh, eh2)) {
+			NAPI_GRO_CB(p)->same_flow = 0;
+			continue;
+		}
+		goto found;
+	}
+
+found:
+	type = eh->h_proto;
+
+	rcu_read_lock();
+	ptype = gro_find_receive_by_type(type);
+	if (ptype == NULL) {
+		flush = 1;
+		goto out_unlock;
+	}
+
+	skb_gro_pull(skb, sizeof(*eh)); /* pull inner eth header */
+	pp = ptype->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+out:
+	NAPI_GRO_CB(skb)->flush |= flush;
+
+	return pp;
+}
+
+static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct ethhdr *eh;
+	struct packet_offload *ptype;
+	__be16 type;
+	/* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
+	int vxlan_len  = 22;
+	int err = -ENOSYS;
+
+	eh = (struct ethhdr *)(skb->data + nhoff + sizeof (struct vxlanhdr));
+	type = eh->h_proto;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + vxlan_len);
+
+	rcu_read_unlock();
+	return err;
+}
+
+static void vxlan_add_udp_offload(struct rcu_head *head)
+{
+	struct vxlan_sock *vs = container_of(head, struct vxlan_sock, rcu);
+
+	udp_add_offload(&vs->udp_offloads);
+}
+
+static void vxlan_del_udp_offload(struct rcu_head *head)
+{
+	struct vxlan_sock *vs = container_of(head, struct vxlan_sock, rcu);
+
+	udp_del_offload(&vs->udp_offloads);
+}
+
 /* Notify netdevs that UDP port started listening */
-static void vxlan_notify_add_rx_port(struct sock *sk)
+static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
 {
 	struct net_device *dev;
+	struct sock *sk = vs->sock->sk;
 	struct net *net = sock_net(sk);
 	sa_family_t sa_family = sk->sk_family;
 	__be16 port = inet_sk(sk)->inet_sport;
@@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
 							    port);
 	}
 	rcu_read_unlock();
+
+	if (sa_family == AF_INET)
+		call_rcu(&vs->rcu, vxlan_add_udp_offload);
 }
 
 /* Notify netdevs that UDP port is no more listening */
-static void vxlan_notify_del_rx_port(struct sock *sk)
+static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
 {
 	struct net_device *dev;
+	struct sock *sk = vs->sock->sk;
 	struct net *net = sock_net(sk);
 	sa_family_t sa_family = sk->sk_family;
 	__be16 port = inet_sk(sk)->inet_sport;
@@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
 							    port);
 	}
 	rcu_read_unlock();
+
+	if (sa_family == AF_INET)
+		call_rcu(&vs->rcu, vxlan_del_udp_offload);
 }
 
 /* Add new entry to forwarding table -- assumes lock held */
@@ -964,7 +1073,7 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 	spin_lock(&vn->sock_lock);
 	hlist_del_rcu(&vs->hlist);
 	rcu_assign_sk_user_data(vs->sock->sk, NULL);
-	vxlan_notify_del_rx_port(sk);
+	vxlan_notify_del_rx_port(vs);
 	spin_unlock(&vn->sock_lock);
 
 	queue_work(vxlan_wq, &vs->del_work);
@@ -1125,8 +1234,8 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	 * leave the CHECKSUM_UNNECESSARY, the device checksummed it
 	 * for us. Otherwise force the upper layers to verify it.
 	 */
-	if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation ||
-	    !(vxlan->dev->features & NETIF_F_RXCSUM))
+	if ((skb->ip_summed != CHECKSUM_UNNECESSARY && skb->ip_summed != CHECKSUM_PARTIAL) ||
+	    !skb->encapsulation || !(vxlan->dev->features & NETIF_F_RXCSUM))
 		skb->ip_summed = CHECKSUM_NONE;
 
 	skb->encapsulation = 0;
@@ -2304,7 +2413,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	struct sock *sk;
 	unsigned int h;
 
-	vs = kmalloc(sizeof(*vs), GFP_KERNEL);
+	vs = kzalloc(sizeof(*vs), GFP_KERNEL);
 	if (!vs)
 		return ERR_PTR(-ENOMEM);
 
@@ -2329,9 +2438,15 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
 	vs->data = data;
 	rcu_assign_sk_user_data(vs->sock->sk, vs);
 
+	/* Initialize the vxlan udp offloads structure */
+	vs->udp_offloads.port = port;
+	vs->udp_offloads.callbacks.gro_receive  = vxlan_gro_receive;
+	vs->udp_offloads.callbacks.gro_complete = vxlan_gro_complete;
+	INIT_LIST_HEAD(&vs->udp_offloads.list);
+
 	spin_lock(&vn->sock_lock);
 	hlist_add_head_rcu(&vs->hlist, vs_head(net, port));
-	vxlan_notify_add_rx_port(sk);
+	vxlan_notify_add_rx_port(vs);
 	spin_unlock(&vn->sock_lock);
 
 	/* Mark socket as an encapsulation socket. */
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 6b6d180..5deef1a 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -21,6 +21,7 @@ struct vxlan_sock {
 	struct rcu_head	  rcu;
 	struct hlist_head vni_list[VNI_HASH_SIZE];
 	atomic_t	  refcnt;
+	struct udp_offload udp_offloads;
 };
 
 struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
@ 2014-01-08 20:39   ` Or Gerlitz
  2014-01-08 21:58   ` Tom Herbert
  2014-01-09  7:09   ` Tom Herbert
  2 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:39 UTC (permalink / raw)
  To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz

On 08/01/2014 22:34, Or Gerlitz wrote:
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
>   #include <linux/skbuff.h>
>   #include <net/udp.h>
>   #include <net/protocol.h>
> +/*
> +struct udp_offload {
> +	__be16			 port;
> +	struct offload_callbacks callbacks;
> +	struct list_head	 list;
> +};
> +*/
sorry for this small mess, will clean it up

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
  2014-01-08 20:39   ` Or Gerlitz
@ 2014-01-08 21:58   ` Tom Herbert
  2014-01-09  6:25     ` Or Gerlitz
  2014-01-09  7:09   ` Tom Herbert
  2 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2014-01-08 21:58 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On Wed, Jan 8, 2014 at 12:34 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
> being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>  include/linux/netdevice.h |   10 +++-
>  include/net/protocol.h    |    3 +
>  net/core/dev.c            |    1 +
>  net/ipv4/udp_offload.c    |  129 +++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 142 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a2a70cc..360551a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1652,7 +1652,9 @@ struct napi_gro_cb {
>         unsigned long age;
>
>         /* Used in ipv6_gro_receive() */
> -       int     proto;
> +       u16     proto;
> +
> +       u16     udp_mark;
>
>         /* used to support CHECKSUM_COMPLETE for tunneling protocols */
>         __wsum  csum;
> @@ -1691,6 +1693,12 @@ struct packet_offload {
>         struct list_head         list;
>  };
>
> +struct udp_offload {
> +       __be16                   port;
> +       struct offload_callbacks callbacks;
> +       struct list_head         list;
> +};
> +
>  /* often modified stats are per cpu, other are shared (netdev->stats) */
>  struct pcpu_sw_netstats {
>         u64     rx_packets;
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index fbf7676..fe9af94 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
>  void inet_register_protosw(struct inet_protosw *p);
>  void inet_unregister_protosw(struct inet_protosw *p);
>
> +void udp_add_offload(struct udp_offload *prot);
> +void udp_del_offload(struct udp_offload *prot);
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
>  int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..11f7acf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
>                 NAPI_GRO_CB(skb)->same_flow = 0;
>                 NAPI_GRO_CB(skb)->flush = 0;
>                 NAPI_GRO_CB(skb)->free = 0;
> +               NAPI_GRO_CB(skb)->udp_mark = 0;
>
>                 pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
>                 break;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 79c62bd..2846ade 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
>  #include <linux/skbuff.h>
>  #include <net/udp.h>
>  #include <net/protocol.h>
> +/*
> +struct udp_offload {
> +       __be16                   port;
> +       struct offload_callbacks callbacks;
> +       struct list_head         list;
> +};
> +*/
> +
> +static DEFINE_SPINLOCK(udp_offload_lock);
> +static struct list_head udp_offload_base __read_mostly;
>
>  static int udp4_ufo_send_check(struct sk_buff *skb)
>  {
> @@ -89,14 +99,133 @@ out:
>         return segs;
>  }
>
> +void udp_add_offload(struct udp_offload *uo)
> +{
> +       struct list_head *head = &udp_offload_base;
> +
> +       spin_lock(&udp_offload_lock);
> +       list_add_rcu(&uo->list, head);
> +       spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_add_offload);
> +
> +void udp_del_offload(struct udp_offload *uo)
> +{
> +       struct list_head *head = &udp_offload_base;
> +       struct udp_offload *uo1;
> +
> +       spin_lock(&udp_offload_lock);
> +       list_for_each_entry(uo1, head, list) {
> +               if (uo == uo1) {
> +                       list_del_rcu(&uo->list);
> +                       goto out;
> +               }
> +       }
> +
> +       pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
> +out:
> +       spin_unlock(&udp_offload_lock);
> +
> +       synchronize_net();
> +}
> +EXPORT_SYMBOL(udp_del_offload);
> +
> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> +       struct list_head *ohead = &udp_offload_base;
> +       struct udp_offload *poffload;
> +       struct sk_buff *p, **pp = NULL;
> +       struct udphdr *uh, *uh2;
> +       unsigned int hlen, off;
> +       int flush = 1;
> +
> +       if (NAPI_GRO_CB(skb)->udp_mark ||
> +           (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
> +               goto out;
> +
> +       /* mark that this skb passed once through the udp gro layer */
> +       NAPI_GRO_CB(skb)->udp_mark = 1;
> +
> +       off  = skb_gro_offset(skb);
> +       hlen = off + sizeof(*uh);
> +       uh   = skb_gro_header_fast(skb, off);
> +       if (skb_gro_header_hard(skb, hlen)) {
> +               uh = skb_gro_header_slow(skb, hlen, off);
> +               if (unlikely(!uh))
> +                       goto out;
> +       }
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(poffload, ohead, list) {
> +               if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)

Is gro_receive == NULL ever valid? Maybe we can assert on registration
instead of checking on every packet.

Maybe make this poffload->port == uh->dest and goto "flush = 0".
Check below that list end was reached becomes unnecessary.

> +                       continue;
> +               break;
> +       }
> +
> +       if (&poffload->list == ohead)
> +               goto out_unlock;
> +
> +       flush = 0;
> +
> +       for (p = *head; p; p = p->next) {
> +               if (!NAPI_GRO_CB(p)->same_flow)
> +                       continue;
> +
> +               uh2 = (struct udphdr   *)(p->data + off);
> +               if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> +                       NAPI_GRO_CB(p)->same_flow = 0;
> +                       continue;
> +               }
> +               goto found;
> +       }
> +
> +found:
> +       skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
> +       pp = poffload->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> +       rcu_read_unlock();
> +out:
> +       NAPI_GRO_CB(skb)->flush |= flush;
> +
> +       return pp;
> +}
> +
> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +       struct list_head *ohead = &udp_offload_base;
> +       struct udp_offload *poffload;
> +       __be16 newlen = htons(skb->len - nhoff);
> +       struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> +       int err = -ENOSYS;
> +
> +       uh->len = newlen;
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(poffload, ohead, list) {
> +               if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
> +                       continue;
> +               break;
> +       }
> +
> +       if (&poffload->list != ohead)
> +               err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
> +
> +       rcu_read_unlock();
> +       return err;
> +}
> +
>  static const struct net_offload udpv4_offload = {
>         .callbacks = {
>                 .gso_send_check = udp4_ufo_send_check,
>                 .gso_segment = udp4_ufo_fragment,
> +               .gro_receive  = udp_gro_receive,
> +               .gro_complete = udp_gro_complete,
>         },
>  };
>
>  int __init udpv4_offload_init(void)
>  {
> +       INIT_LIST_HEAD(&udp_offload_base);
>         return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
>  }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
  2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
@ 2014-01-08 22:09   ` Eric Dumazet
  2014-01-09  6:28     ` Or Gerlitz
  2014-01-08 22:11   ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-01-08 22:09 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
> +
> +static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +	struct ethhdr *eh;
> +	struct packet_offload *ptype;
> +	__be16 type;
> +	/* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
> +	int vxlan_len  = 22;



I am pretty sure this can use existing macros or sizeof(...)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
  2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
  2014-01-08 22:09   ` Eric Dumazet
@ 2014-01-08 22:11   ` Eric Dumazet
  2014-01-09  6:32     ` Or Gerlitz
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-01-08 22:11 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:

> +
>  /* Notify netdevs that UDP port started listening */
> -static void vxlan_notify_add_rx_port(struct sock *sk)
> +static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
>  {
>  	struct net_device *dev;
> +	struct sock *sk = vs->sock->sk;
>  	struct net *net = sock_net(sk);
>  	sa_family_t sa_family = sk->sk_family;
>  	__be16 port = inet_sk(sk)->inet_sport;
> @@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>  							    port);
>  	}
>  	rcu_read_unlock();
> +
> +	if (sa_family == AF_INET)
> +		call_rcu(&vs->rcu, vxlan_add_udp_offload);

Why waiting RCU grace period here ?

>  }
>  
>  /* Notify netdevs that UDP port is no more listening */
> -static void vxlan_notify_del_rx_port(struct sock *sk)
> +static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
>  {
>  	struct net_device *dev;
> +	struct sock *sk = vs->sock->sk;
>  	struct net *net = sock_net(sk);
>  	sa_family_t sa_family = sk->sk_family;
>  	__be16 port = inet_sk(sk)->inet_sport;
> @@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
>  							    port);
>  	}
>  	rcu_read_unlock();
> +
> +	if (sa_family == AF_INET)
> +		call_rcu(&vs->rcu, vxlan_del_udp_offload);
>  }

This looks buggy.

You need to :

1) remove the offload structure from list
2) Then wait rcu grace period, and finally free the memory

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 21:58   ` Tom Herbert
@ 2014-01-09  6:25     ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09  6:25 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On 08/01/2014 23:58, Tom Herbert wrote:
>> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>> >+{
>> >+       struct list_head *ohead = &udp_offload_base;
>> >+       struct udp_offload *poffload;
>> >+       struct sk_buff *p, **pp = NULL;
>> >+       struct udphdr *uh, *uh2;
>> >+       unsigned int hlen, off;
>> >+       int flush = 1;
>> >+
>> >+       if (NAPI_GRO_CB(skb)->udp_mark ||
>> >+           (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
>> >+               goto out;
>> >+
>> >+       /* mark that this skb passed once through the udp gro layer */
>> >+       NAPI_GRO_CB(skb)->udp_mark = 1;
>> >+
>> >+       off  = skb_gro_offset(skb);
>> >+       hlen = off + sizeof(*uh);
>> >+       uh   = skb_gro_header_fast(skb, off);
>> >+       if (skb_gro_header_hard(skb, hlen)) {
>> >+               uh = skb_gro_header_slow(skb, hlen, off);
>> >+               if (unlikely(!uh))
>> >+                       goto out;
>> >+       }
>> >+
>> >+       rcu_read_lock();
>> >+       list_for_each_entry_rcu(poffload, ohead, list) {
>> >+               if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
> Is gro_receive == NULL ever valid? Maybe we can assert on registration instead of checking on every packet.

I see your point, however, the offload structure contains entries for 
both gro and  gso, asserting on registration could somehow limit the use 
cases, isn't that?


> Maybe make this poffload->port == uh->dest and goto "flush = 0". Check below that list end was reached becomes unnecessary.

Sure, will use goto "flush = 0" and if we didn't  go there we'll go to 
out_unlock

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
  2014-01-08 22:09   ` Eric Dumazet
@ 2014-01-09  6:28     ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09  6:28 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 09/01/2014 00:09, Eric Dumazet wrote:
> On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
>> +
>> +static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
>> +{
>> +	struct ethhdr *eh;
>> +	struct packet_offload *ptype;
>> +	__be16 type;
>> +	/* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
>> +	int vxlan_len  = 22;
>
>
> I am pretty sure this can use existing macros or sizeof(...)

sure, will fix



>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
  2014-01-08 22:11   ` Eric Dumazet
@ 2014-01-09  6:32     ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09  6:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop

On 09/01/2014 00:11, Eric Dumazet wrote:
> On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
>
>> +
>>   /* Notify netdevs that UDP port started listening */
>> -static void vxlan_notify_add_rx_port(struct sock *sk)
>> +static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
>>   {
>>   	struct net_device *dev;
>> +	struct sock *sk = vs->sock->sk;
>>   	struct net *net = sock_net(sk);
>>   	sa_family_t sa_family = sk->sk_family;
>>   	__be16 port = inet_sk(sk)->inet_sport;
>> @@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>>   							    port);
>>   	}
>>   	rcu_read_unlock();
>> +
>> +	if (sa_family == AF_INET)
>> +		call_rcu(&vs->rcu, vxlan_add_udp_offload);
> Why waiting RCU grace period here?

Basically the add operation can be done right away, however, since the 
delete operation can't be done
instantly when we want it, I wanted to protect against a series of 
add/del/add in times T1 < T2 < T3

T1 add(X)
T2 del(X)
T3 add(X)

where the delete is deferred and as a result the 2nd add is done before 
the delete and @ the end offload X is not added in the 2nd time.From 
your other comment below I conclude that I probably miss something about 
the rcu usage here, so will give it further thought.



>
>>   }
>>   
>>   /* Notify netdevs that UDP port is no more listening */
>> -static void vxlan_notify_del_rx_port(struct sock *sk)
>> +static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
>>   {
>>   	struct net_device *dev;
>> +	struct sock *sk = vs->sock->sk;
>>   	struct net *net = sock_net(sk);
>>   	sa_family_t sa_family = sk->sk_family;
>>   	__be16 port = inet_sk(sk)->inet_sport;
>> @@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
>>   							    port);
>>   	}
>>   	rcu_read_unlock();
>> +
>> +	if (sa_family == AF_INET)
>> +		call_rcu(&vs->rcu, vxlan_del_udp_offload);
>>   }
> This looks buggy.
>
> You need to :
>
> 1) remove the offload structure from list
> 2) Then wait rcu grace period, and finally free the memory
>
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
  2014-01-08 20:39   ` Or Gerlitz
  2014-01-08 21:58   ` Tom Herbert
@ 2014-01-09  7:09   ` Tom Herbert
  2014-01-09  7:14     ` Or Gerlitz
  2 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2014-01-09  7:09 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On Wed, Jan 8, 2014 at 12:34 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
> being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
>  include/linux/netdevice.h |   10 +++-
>  include/net/protocol.h    |    3 +
>  net/core/dev.c            |    1 +
>  net/ipv4/udp_offload.c    |  129 +++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 142 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a2a70cc..360551a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1652,7 +1652,9 @@ struct napi_gro_cb {
>         unsigned long age;
>
>         /* Used in ipv6_gro_receive() */
> -       int     proto;
> +       u16     proto;
> +
> +       u16     udp_mark;
>
>         /* used to support CHECKSUM_COMPLETE for tunneling protocols */
>         __wsum  csum;
> @@ -1691,6 +1693,12 @@ struct packet_offload {
>         struct list_head         list;
>  };
>
> +struct udp_offload {
> +       __be16                   port;
> +       struct offload_callbacks callbacks;
> +       struct list_head         list;
> +};
> +
>  /* often modified stats are per cpu, other are shared (netdev->stats) */
>  struct pcpu_sw_netstats {
>         u64     rx_packets;
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index fbf7676..fe9af94 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
>  void inet_register_protosw(struct inet_protosw *p);
>  void inet_unregister_protosw(struct inet_protosw *p);
>
> +void udp_add_offload(struct udp_offload *prot);
> +void udp_del_offload(struct udp_offload *prot);
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
>  int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..11f7acf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
>                 NAPI_GRO_CB(skb)->same_flow = 0;
>                 NAPI_GRO_CB(skb)->flush = 0;
>                 NAPI_GRO_CB(skb)->free = 0;
> +               NAPI_GRO_CB(skb)->udp_mark = 0;
>
>                 pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
>                 break;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 79c62bd..2846ade 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
>  #include <linux/skbuff.h>
>  #include <net/udp.h>
>  #include <net/protocol.h>
> +/*
> +struct udp_offload {
> +       __be16                   port;
> +       struct offload_callbacks callbacks;
> +       struct list_head         list;
> +};
> +*/
> +
> +static DEFINE_SPINLOCK(udp_offload_lock);
> +static struct list_head udp_offload_base __read_mostly;
>
>  static int udp4_ufo_send_check(struct sk_buff *skb)
>  {
> @@ -89,14 +99,133 @@ out:
>         return segs;
>  }
>
> +void udp_add_offload(struct udp_offload *uo)
> +{
> +       struct list_head *head = &udp_offload_base;
> +
> +       spin_lock(&udp_offload_lock);
> +       list_add_rcu(&uo->list, head);
> +       spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_add_offload);
> +
> +void udp_del_offload(struct udp_offload *uo)
> +{
> +       struct list_head *head = &udp_offload_base;
> +       struct udp_offload *uo1;
> +
> +       spin_lock(&udp_offload_lock);
> +       list_for_each_entry(uo1, head, list) {
> +               if (uo == uo1) {
> +                       list_del_rcu(&uo->list);
> +                       goto out;
> +               }
> +       }
> +
> +       pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
> +out:
> +       spin_unlock(&udp_offload_lock);
> +
> +       synchronize_net();
> +}
> +EXPORT_SYMBOL(udp_del_offload);
> +
> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> +       struct list_head *ohead = &udp_offload_base;
> +       struct udp_offload *poffload;
> +       struct sk_buff *p, **pp = NULL;
> +       struct udphdr *uh, *uh2;
> +       unsigned int hlen, off;
> +       int flush = 1;
> +
> +       if (NAPI_GRO_CB(skb)->udp_mark ||
> +           (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
> +               goto out;
> +
> +       /* mark that this skb passed once through the udp gro layer */
> +       NAPI_GRO_CB(skb)->udp_mark = 1;
> +
> +       off  = skb_gro_offset(skb);
> +       hlen = off + sizeof(*uh);
> +       uh   = skb_gro_header_fast(skb, off);
> +       if (skb_gro_header_hard(skb, hlen)) {
> +               uh = skb_gro_header_slow(skb, hlen, off);
> +               if (unlikely(!uh))
> +                       goto out;
> +       }
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(poffload, ohead, list) {
> +               if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
> +                       continue;
> +               break;
> +       }
> +
> +       if (&poffload->list == ohead)
> +               goto out_unlock;
> +
> +       flush = 0;
> +
> +       for (p = *head; p; p = p->next) {
> +               if (!NAPI_GRO_CB(p)->same_flow)
> +                       continue;
> +
> +               uh2 = (struct udphdr   *)(p->data + off);
> +               if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> +                       NAPI_GRO_CB(p)->same_flow = 0;
> +                       continue;
> +               }
> +               goto found;
I don't believe this is correct. If you exit on the first match, skb's
that follow in the list can still be marked as same_flow. You need to
walk the whole list I believe (just get rid of the goto).

> +       }
> +
> +found:
> +       skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
> +       pp = poffload->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> +       rcu_read_unlock();
> +out:
> +       NAPI_GRO_CB(skb)->flush |= flush;
> +
> +       return pp;
> +}
> +
> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +       struct list_head *ohead = &udp_offload_base;
> +       struct udp_offload *poffload;
> +       __be16 newlen = htons(skb->len - nhoff);
> +       struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> +       int err = -ENOSYS;
> +
> +       uh->len = newlen;
> +
> +       rcu_read_lock();
> +       list_for_each_entry_rcu(poffload, ohead, list) {
> +               if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
> +                       continue;
> +               break;
> +       }
> +
> +       if (&poffload->list != ohead)
> +               err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
> +
> +       rcu_read_unlock();
> +       return err;
> +}
> +
>  static const struct net_offload udpv4_offload = {
>         .callbacks = {
>                 .gso_send_check = udp4_ufo_send_check,
>                 .gso_segment = udp4_ufo_fragment,
> +               .gro_receive  = udp_gro_receive,
> +               .gro_complete = udp_gro_complete,
>         },
>  };
>
>  int __init udpv4_offload_init(void)
>  {
> +       INIT_LIST_HEAD(&udp_offload_base);
>         return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
>  }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-09  7:09   ` Tom Herbert
@ 2014-01-09  7:14     ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09  7:14 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
	David Miller, Yan Burman, Shlomo Pongratz

On 09/01/2014 09:09, Tom Herbert wrote:
>> +       for (p = *head; p; p = p->next) {
>> >+               if (!NAPI_GRO_CB(p)->same_flow)
>> >+                       continue;
>> >+
>> >+               uh2 = (struct udphdr   *)(p->data + off);
>> >+               if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
>> >+                       NAPI_GRO_CB(p)->same_flow = 0;
>> >+                       continue;
>> >+               }
>> >+               goto found;
> I don't believe this is correct. If you exit on the first match, skb's
> that follow in the list can still be marked as same_flow. You need to
> walk the whole list I believe (just get rid of the goto).
>

Good catch, will fix.

Looking on the ipv4 and gre gro_receive callbacks they indeed go over 
all the list where the tcp gro_receive code does allow itself do goto 
found, probably b/c they are last in the chain?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
  2014-01-14 16:00 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
@ 2014-01-14 16:06 ` Or Gerlitz
  0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-14 16:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On 14/01/2014 18:00, Or Gerlitz wrote:
> This series adds GRO handlers for protocols that do UDP encapsulation, with the
> intent of being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> The patches done against net-next ae237b3ede64 "net: 3com: fix
> warning for incorrect type in argument"
>
> Or.
>
>
> v3 --> v4 changes:
>
>    - applied feedback from Tom on some micro-optimizations that save
>      branches and goto directives in the udp gro logic
>
>   - applied feedback from Eric on correct RCU programming for the
>     add/remove flow of the upper protocols udp gro handlers
>
>
> v2 --> v3 changes:
>
>   - moved to use linked list to store the udp gro handlers, this solves the
>     problem of consuming 512KB of memory for the handlers.
>
>   - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
>     on a packet, this solves the problem of udp encapsulated packets whose inner VM
>     packet is udp and happen to carry a port which has registered offloads - and flush it.
>
>   - invoke the udp offload protocol registration and de-registration from the vxlan driver
>     in a sleepable context
>
> For unclear some reason I got this warning when the vxlan driver deletes the
> udp offload structure
> *** BLURB HERE ***

Sorry for the spam, the above three lines are leftovers from the V3 
cover letter, same for the subject line of this
cover-letter which carries "V3" this *is* V4, will make sure to avoid 
such flushes (....) in the future.

Or.

>
> Or Gerlitz (3):
>    net: Add GRO support for UDP encapsulating protocols
>    net: Export gro_find_by_type helpers
>    net: Add GRO support for vxlan traffic
>
>   drivers/net/vxlan.c       |  117 +++++++++++++++++++++++++++++++--
>   include/linux/netdevice.h |   10 +++-
>   include/net/protocol.h    |    3 +
>   include/net/vxlan.h       |    1 +
>   net/core/dev.c            |    3 +
>   net/ipv4/udp_offload.c    |  157 +++++++++++++++++++++++++++++++++++++++++++++
>   6 files changed, 283 insertions(+), 8 deletions(-)
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
@ 2014-01-14 16:00 Or Gerlitz
  2014-01-14 16:06 ` Or Gerlitz
  0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2014-01-14 16:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, hkchu, edumazet, herbert, yanb, shlomop, therbert, Or Gerlitz

This series adds GRO handlers for protocols that do UDP encapsulation, with the
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

The patches done against net-next ae237b3ede64 "net: 3com: fix 
warning for incorrect type in argument"

Or.


v3 --> v4 changes:

  - applied feedback from Tom on some micro-optimizations that save 
    branches and goto directives in the udp gro logic

 - applied feedback from Eric on correct RCU programming for the 
   add/remove flow of the upper protocols udp gro handlers


v2 --> v3 changes:

 - moved to use linked list to store the udp gro handlers, this solves the
   problem of consuming 512KB of memory for the handlers.

 - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
   on a packet, this solves the problem of udp encapsulated packets whose inner VM
   packet is udp and happen to carry a port which has registered offloads - and flush it.

 - invoke the udp offload protocol registration and de-registration from the vxlan driver
   in a sleepable context 

For unclear some reason I got this warning when the vxlan driver deletes the
udp offload structure 
*** BLURB HERE ***

Or Gerlitz (3):
  net: Add GRO support for UDP encapsulating protocols
  net: Export gro_find_by_type helpers
  net: Add GRO support for vxlan traffic

 drivers/net/vxlan.c       |  117 +++++++++++++++++++++++++++++++--
 include/linux/netdevice.h |   10 +++-
 include/net/protocol.h    |    3 +
 include/net/vxlan.h       |    1 +
 net/core/dev.c            |    3 +
 net/ipv4/udp_offload.c    |  157 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 283 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-01-14 16:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
2014-01-08 20:39   ` Or Gerlitz
2014-01-08 21:58   ` Tom Herbert
2014-01-09  6:25     ` Or Gerlitz
2014-01-09  7:09   ` Tom Herbert
2014-01-09  7:14     ` Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2014-01-08 22:09   ` Eric Dumazet
2014-01-09  6:28     ` Or Gerlitz
2014-01-08 22:11   ` Eric Dumazet
2014-01-09  6:32     ` Or Gerlitz
2014-01-14 16:00 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-14 16:06 ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).