* [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
@ 2014-01-08 20:34 Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz
This series adds GRO handlers for protocols that do UDP encapsulation, with the
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.
For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.
The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.
On my setup, which is net-next (now with the mlx4 vxlan offloads patches) --
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs
patches done against net-next 80077935cad "Merge branch 'master' of
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next"
v2 --> v3 changes:
- moved to use linked list to store the udp gro handlers, this solves the
problem of consuming 512KB of memory for the handlers.
- use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
on a packet, this solves the problem of udp encapsulated packets whose inner VM
packet is udp and happen to carry a port which has registered offloads - and flush it.
- invoke the udp offload protocol registration and de-registration from the vxlan driver
in a sleepable context
For unclear some reason I got this warning when the vxlan driver deletes the
udp offload structure
-----------[ cut here ]------------
WARNING: CPU: 2 PID: 19 at kernel/rcu/tree.c:2127 rcu_do_batch+0x359/0x370()
Modules linked in: veth vxlan ip_tunnel bridge stp llc rdma_ucm rdma_cm ib_cm iw_cm ib_addr ib_uverbs ib_umad mlx4_en ptp pps_core mlx4_ib ib_sa ib_mad ib_core mlx4_core nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 nfs lockd autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_mod joydev microcode pcspkr virtio_balloon virtio_net i2c_piix4 button ext3 jbd virtio_pci virtio_ring virtio uhci_hcd [last unloaded: ib_core]
CPU: 2 PID: 19 Comm: rcuc/2 Not tainted 3.13.0-rc6+ #278
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
000000000000084f ffff88021651dd68 ffffffff81498b52 000000000000084f
0000000000000000 ffff88021651dda8 ffffffff81059f2c ffff88021651dd98
ffff88022010d6a0 ffff88022010d6c8 0000000000000246 ffffffff81844200
Call Trace:
[<ffffffff81498b52>] dump_stack+0x51/0x77
[<ffffffff81059f2c>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059f7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810a49c9>] rcu_do_batch+0x359/0x370
[<ffffffff810a4ade>] rcu_cpu_kthread+0xfe/0x2b0
[<ffffffff81083697>] smpboot_thread_fn+0x167/0x260
[<ffffffff81083530>] ? smpboot_create_threads+0x80/0x80
[<ffffffff8107c36e>] kthread+0xce/0xf0
[<ffffffff8107c2a0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8149de7c>] ret_from_fork+0x7c/0xb0
[<ffffffff8107c2a0>] ? kthread_freezable_should_stop+0x70/0x70
---[ end trace f98588ea45ef3f4c ]---
Or
Or Gerlitz (3):
net: Add GRO support for UDP encapsulating protocols
net: Export gro_find_by_type helpers
net: Add GRO support for vxlan traffic
drivers/net/vxlan.c | 129 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/netdevice.h | 10 +++-
include/net/protocol.h | 3 +
include/net/vxlan.h | 1 +
net/core/dev.c | 3 +
net/ipv4/udp_offload.c | 129 +++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 267 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
2014-01-08 20:39 ` Or Gerlitz
` (2 more replies)
2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2 siblings, 3 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.
For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.
The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
include/linux/netdevice.h | 10 +++-
include/net/protocol.h | 3 +
net/core/dev.c | 1 +
net/ipv4/udp_offload.c | 129 +++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 142 insertions(+), 1 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a2a70cc..360551a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1652,7 +1652,9 @@ struct napi_gro_cb {
unsigned long age;
/* Used in ipv6_gro_receive() */
- int proto;
+ u16 proto;
+
+ u16 udp_mark;
/* used to support CHECKSUM_COMPLETE for tunneling protocols */
__wsum csum;
@@ -1691,6 +1693,12 @@ struct packet_offload {
struct list_head list;
};
+struct udp_offload {
+ __be16 port;
+ struct offload_callbacks callbacks;
+ struct list_head list;
+};
+
/* often modified stats are per cpu, other are shared (netdev->stats) */
struct pcpu_sw_netstats {
u64 rx_packets;
diff --git a/include/net/protocol.h b/include/net/protocol.h
index fbf7676..fe9af94 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
void inet_register_protosw(struct inet_protosw *p);
void inet_unregister_protosw(struct inet_protosw *p);
+void udp_add_offload(struct udp_offload *prot);
+void udp_del_offload(struct udp_offload *prot);
+
#if IS_ENABLED(CONFIG_IPV6)
int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
diff --git a/net/core/dev.c b/net/core/dev.c
index ce01847..11f7acf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
NAPI_GRO_CB(skb)->same_flow = 0;
NAPI_GRO_CB(skb)->flush = 0;
NAPI_GRO_CB(skb)->free = 0;
+ NAPI_GRO_CB(skb)->udp_mark = 0;
pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
break;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 79c62bd..2846ade 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -13,6 +13,16 @@
#include <linux/skbuff.h>
#include <net/udp.h>
#include <net/protocol.h>
+/*
+struct udp_offload {
+ __be16 port;
+ struct offload_callbacks callbacks;
+ struct list_head list;
+};
+*/
+
+static DEFINE_SPINLOCK(udp_offload_lock);
+static struct list_head udp_offload_base __read_mostly;
static int udp4_ufo_send_check(struct sk_buff *skb)
{
@@ -89,14 +99,133 @@ out:
return segs;
}
+void udp_add_offload(struct udp_offload *uo)
+{
+ struct list_head *head = &udp_offload_base;
+
+ spin_lock(&udp_offload_lock);
+ list_add_rcu(&uo->list, head);
+ spin_unlock(&udp_offload_lock);
+}
+EXPORT_SYMBOL(udp_add_offload);
+
+void udp_del_offload(struct udp_offload *uo)
+{
+ struct list_head *head = &udp_offload_base;
+ struct udp_offload *uo1;
+
+ spin_lock(&udp_offload_lock);
+ list_for_each_entry(uo1, head, list) {
+ if (uo == uo1) {
+ list_del_rcu(&uo->list);
+ goto out;
+ }
+ }
+
+ pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
+out:
+ spin_unlock(&udp_offload_lock);
+
+ synchronize_net();
+}
+EXPORT_SYMBOL(udp_del_offload);
+
+static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+ struct list_head *ohead = &udp_offload_base;
+ struct udp_offload *poffload;
+ struct sk_buff *p, **pp = NULL;
+ struct udphdr *uh, *uh2;
+ unsigned int hlen, off;
+ int flush = 1;
+
+ if (NAPI_GRO_CB(skb)->udp_mark ||
+ (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
+ goto out;
+
+ /* mark that this skb passed once through the udp gro layer */
+ NAPI_GRO_CB(skb)->udp_mark = 1;
+
+ off = skb_gro_offset(skb);
+ hlen = off + sizeof(*uh);
+ uh = skb_gro_header_fast(skb, off);
+ if (skb_gro_header_hard(skb, hlen)) {
+ uh = skb_gro_header_slow(skb, hlen, off);
+ if (unlikely(!uh))
+ goto out;
+ }
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(poffload, ohead, list) {
+ if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
+ continue;
+ break;
+ }
+
+ if (&poffload->list == ohead)
+ goto out_unlock;
+
+ flush = 0;
+
+ for (p = *head; p; p = p->next) {
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ uh2 = (struct udphdr *)(p->data + off);
+ if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+ goto found;
+ }
+
+found:
+ skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
+ pp = poffload->callbacks.gro_receive(head, skb);
+
+out_unlock:
+ rcu_read_unlock();
+out:
+ NAPI_GRO_CB(skb)->flush |= flush;
+
+ return pp;
+}
+
+static int udp_gro_complete(struct sk_buff *skb, int nhoff)
+{
+ struct list_head *ohead = &udp_offload_base;
+ struct udp_offload *poffload;
+ __be16 newlen = htons(skb->len - nhoff);
+ struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
+ int err = -ENOSYS;
+
+ uh->len = newlen;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(poffload, ohead, list) {
+ if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
+ continue;
+ break;
+ }
+
+ if (&poffload->list != ohead)
+ err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
+
+ rcu_read_unlock();
+ return err;
+}
+
static const struct net_offload udpv4_offload = {
.callbacks = {
.gso_send_check = udp4_ufo_send_check,
.gso_segment = udp4_ufo_fragment,
+ .gro_receive = udp_gro_receive,
+ .gro_complete = udp_gro_complete,
},
};
int __init udpv4_offload_init(void)
{
+ INIT_LIST_HEAD(&udp_offload_base);
return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
}
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers
2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz
Export the gro_find_receive/complete_by_type helpers to they can be invoked
by the gro callbacks of encapsulation protocols such as vxlan.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
net/core/dev.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 11f7acf..a052e54 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3936,6 +3936,7 @@ struct packet_offload *gro_find_receive_by_type(__be16 type)
}
return NULL;
}
+EXPORT_SYMBOL(gro_find_receive_by_type);
struct packet_offload *gro_find_complete_by_type(__be16 type)
{
@@ -3949,6 +3950,7 @@ struct packet_offload *gro_find_complete_by_type(__be16 type)
}
return NULL;
}
+EXPORT_SYMBOL(gro_find_complete_by_type);
static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
{
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
@ 2014-01-08 20:34 ` Or Gerlitz
2014-01-08 22:09 ` Eric Dumazet
2014-01-08 22:11 ` Eric Dumazet
2 siblings, 2 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:34 UTC (permalink / raw)
To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz
Add gro handlers for vxlan using the udp gro infrastructure
On my setup, which is net-next (now with the mlx4 vxlan offloads patches) --
for single TCP session that goes through vxlan tunneling I got nice improvement
from 6.8Gbs to 11.5Gbs
--> UDP/VXLAN GRO disabled
$ netperf -H 192.168.52.147 -c -C
$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 65536 65536 10.00 6799.75 12.54 24.79 0.604 1.195
--> UDP/VXLAN GRO enabled
$ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 65536 65536 10.00 11562.72 24.90 20.34 0.706 0.577
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
drivers/net/vxlan.c | 129 ++++++++++++++++++++++++++++++++++++++++++++++++---
include/net/vxlan.h | 1 +
2 files changed, 123 insertions(+), 7 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 481f85d..e132f19 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -40,6 +40,7 @@
#include <net/net_namespace.h>
#include <net/netns/generic.h>
#include <net/vxlan.h>
+#include <net/protocol.h>
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ipv6.h>
#include <net/addrconf.h>
@@ -554,10 +555,111 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
return 1;
}
+static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff *skb)
+{
+ struct sk_buff *p, **pp = NULL;
+ struct vxlanhdr *vh, *vh2;
+ struct ethhdr *eh, *eh2;
+ unsigned int hlen, off_vx, off_eth;
+ const struct packet_offload *ptype;
+ __be16 type;
+ int flush = 1;
+
+ off_vx = skb_gro_offset(skb);
+ hlen = off_vx + sizeof(*vh);
+ vh = skb_gro_header_fast(skb, off_vx);
+ if (skb_gro_header_hard(skb, hlen)) {
+ vh = skb_gro_header_slow(skb, hlen, off_vx);
+ if (unlikely(!vh))
+ goto out;
+ }
+ skb_gro_pull(skb, sizeof(struct vxlanhdr)); /* pull vxlan header */
+
+ off_eth = skb_gro_offset(skb);
+ hlen = off_eth + sizeof(*eh);
+ eh = skb_gro_header_fast(skb, off_eth);
+ if (skb_gro_header_hard(skb, hlen)) {
+ eh = skb_gro_header_slow(skb, hlen, off_eth);
+ if (unlikely(!eh))
+ goto out;
+ }
+
+ flush = 0;
+
+ for (p = *head; p; p = p->next) {
+ if (!NAPI_GRO_CB(p)->same_flow)
+ continue;
+
+ vh2 = (struct vxlanhdr *)(p->data + off_vx);
+ eh2 = (struct ethhdr *)(p->data + off_eth);
+ if (vh->vx_vni != vh2->vx_vni || compare_ether_header(eh, eh2)) {
+ NAPI_GRO_CB(p)->same_flow = 0;
+ continue;
+ }
+ goto found;
+ }
+
+found:
+ type = eh->h_proto;
+
+ rcu_read_lock();
+ ptype = gro_find_receive_by_type(type);
+ if (ptype == NULL) {
+ flush = 1;
+ goto out_unlock;
+ }
+
+ skb_gro_pull(skb, sizeof(*eh)); /* pull inner eth header */
+ pp = ptype->callbacks.gro_receive(head, skb);
+
+out_unlock:
+ rcu_read_unlock();
+out:
+ NAPI_GRO_CB(skb)->flush |= flush;
+
+ return pp;
+}
+
+static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
+{
+ struct ethhdr *eh;
+ struct packet_offload *ptype;
+ __be16 type;
+ /* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
+ int vxlan_len = 22;
+ int err = -ENOSYS;
+
+ eh = (struct ethhdr *)(skb->data + nhoff + sizeof (struct vxlanhdr));
+ type = eh->h_proto;
+
+ rcu_read_lock();
+ ptype = gro_find_complete_by_type(type);
+ if (ptype != NULL)
+ err = ptype->callbacks.gro_complete(skb, nhoff + vxlan_len);
+
+ rcu_read_unlock();
+ return err;
+}
+
+static void vxlan_add_udp_offload(struct rcu_head *head)
+{
+ struct vxlan_sock *vs = container_of(head, struct vxlan_sock, rcu);
+
+ udp_add_offload(&vs->udp_offloads);
+}
+
+static void vxlan_del_udp_offload(struct rcu_head *head)
+{
+ struct vxlan_sock *vs = container_of(head, struct vxlan_sock, rcu);
+
+ udp_del_offload(&vs->udp_offloads);
+}
+
/* Notify netdevs that UDP port started listening */
-static void vxlan_notify_add_rx_port(struct sock *sk)
+static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
{
struct net_device *dev;
+ struct sock *sk = vs->sock->sk;
struct net *net = sock_net(sk);
sa_family_t sa_family = sk->sk_family;
__be16 port = inet_sk(sk)->inet_sport;
@@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
port);
}
rcu_read_unlock();
+
+ if (sa_family == AF_INET)
+ call_rcu(&vs->rcu, vxlan_add_udp_offload);
}
/* Notify netdevs that UDP port is no more listening */
-static void vxlan_notify_del_rx_port(struct sock *sk)
+static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
{
struct net_device *dev;
+ struct sock *sk = vs->sock->sk;
struct net *net = sock_net(sk);
sa_family_t sa_family = sk->sk_family;
__be16 port = inet_sk(sk)->inet_sport;
@@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
port);
}
rcu_read_unlock();
+
+ if (sa_family == AF_INET)
+ call_rcu(&vs->rcu, vxlan_del_udp_offload);
}
/* Add new entry to forwarding table -- assumes lock held */
@@ -964,7 +1073,7 @@ void vxlan_sock_release(struct vxlan_sock *vs)
spin_lock(&vn->sock_lock);
hlist_del_rcu(&vs->hlist);
rcu_assign_sk_user_data(vs->sock->sk, NULL);
- vxlan_notify_del_rx_port(sk);
+ vxlan_notify_del_rx_port(vs);
spin_unlock(&vn->sock_lock);
queue_work(vxlan_wq, &vs->del_work);
@@ -1125,8 +1234,8 @@ static void vxlan_rcv(struct vxlan_sock *vs,
* leave the CHECKSUM_UNNECESSARY, the device checksummed it
* for us. Otherwise force the upper layers to verify it.
*/
- if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation ||
- !(vxlan->dev->features & NETIF_F_RXCSUM))
+ if ((skb->ip_summed != CHECKSUM_UNNECESSARY && skb->ip_summed != CHECKSUM_PARTIAL) ||
+ !skb->encapsulation || !(vxlan->dev->features & NETIF_F_RXCSUM))
skb->ip_summed = CHECKSUM_NONE;
skb->encapsulation = 0;
@@ -2304,7 +2413,7 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
struct sock *sk;
unsigned int h;
- vs = kmalloc(sizeof(*vs), GFP_KERNEL);
+ vs = kzalloc(sizeof(*vs), GFP_KERNEL);
if (!vs)
return ERR_PTR(-ENOMEM);
@@ -2329,9 +2438,15 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
vs->data = data;
rcu_assign_sk_user_data(vs->sock->sk, vs);
+ /* Initialize the vxlan udp offloads structure */
+ vs->udp_offloads.port = port;
+ vs->udp_offloads.callbacks.gro_receive = vxlan_gro_receive;
+ vs->udp_offloads.callbacks.gro_complete = vxlan_gro_complete;
+ INIT_LIST_HEAD(&vs->udp_offloads.list);
+
spin_lock(&vn->sock_lock);
hlist_add_head_rcu(&vs->hlist, vs_head(net, port));
- vxlan_notify_add_rx_port(sk);
+ vxlan_notify_add_rx_port(vs);
spin_unlock(&vn->sock_lock);
/* Mark socket as an encapsulation socket. */
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 6b6d180..5deef1a 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -21,6 +21,7 @@ struct vxlan_sock {
struct rcu_head rcu;
struct hlist_head vni_list[VNI_HASH_SIZE];
atomic_t refcnt;
+ struct udp_offload udp_offloads;
};
struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
--
1.7.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
@ 2014-01-08 20:39 ` Or Gerlitz
2014-01-08 21:58 ` Tom Herbert
2014-01-09 7:09 ` Tom Herbert
2 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-08 20:39 UTC (permalink / raw)
To: hkchu, edumazet, herbert; +Cc: netdev, davem, yanb, shlomop, Or Gerlitz
On 08/01/2014 22:34, Or Gerlitz wrote:
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
> #include <linux/skbuff.h>
> #include <net/udp.h>
> #include <net/protocol.h>
> +/*
> +struct udp_offload {
> + __be16 port;
> + struct offload_callbacks callbacks;
> + struct list_head list;
> +};
> +*/
sorry for this small mess, will clean it up
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
2014-01-08 20:39 ` Or Gerlitz
@ 2014-01-08 21:58 ` Tom Herbert
2014-01-09 6:25 ` Or Gerlitz
2014-01-09 7:09 ` Tom Herbert
2 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2014-01-08 21:58 UTC (permalink / raw)
To: Or Gerlitz
Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
David Miller, Yan Burman, Shlomo Pongratz
On Wed, Jan 8, 2014 at 12:34 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
> being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> include/linux/netdevice.h | 10 +++-
> include/net/protocol.h | 3 +
> net/core/dev.c | 1 +
> net/ipv4/udp_offload.c | 129 +++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 142 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a2a70cc..360551a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1652,7 +1652,9 @@ struct napi_gro_cb {
> unsigned long age;
>
> /* Used in ipv6_gro_receive() */
> - int proto;
> + u16 proto;
> +
> + u16 udp_mark;
>
> /* used to support CHECKSUM_COMPLETE for tunneling protocols */
> __wsum csum;
> @@ -1691,6 +1693,12 @@ struct packet_offload {
> struct list_head list;
> };
>
> +struct udp_offload {
> + __be16 port;
> + struct offload_callbacks callbacks;
> + struct list_head list;
> +};
> +
> /* often modified stats are per cpu, other are shared (netdev->stats) */
> struct pcpu_sw_netstats {
> u64 rx_packets;
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index fbf7676..fe9af94 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
> void inet_register_protosw(struct inet_protosw *p);
> void inet_unregister_protosw(struct inet_protosw *p);
>
> +void udp_add_offload(struct udp_offload *prot);
> +void udp_del_offload(struct udp_offload *prot);
> +
> #if IS_ENABLED(CONFIG_IPV6)
> int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
> int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..11f7acf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
> NAPI_GRO_CB(skb)->same_flow = 0;
> NAPI_GRO_CB(skb)->flush = 0;
> NAPI_GRO_CB(skb)->free = 0;
> + NAPI_GRO_CB(skb)->udp_mark = 0;
>
> pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
> break;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 79c62bd..2846ade 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
> #include <linux/skbuff.h>
> #include <net/udp.h>
> #include <net/protocol.h>
> +/*
> +struct udp_offload {
> + __be16 port;
> + struct offload_callbacks callbacks;
> + struct list_head list;
> +};
> +*/
> +
> +static DEFINE_SPINLOCK(udp_offload_lock);
> +static struct list_head udp_offload_base __read_mostly;
>
> static int udp4_ufo_send_check(struct sk_buff *skb)
> {
> @@ -89,14 +99,133 @@ out:
> return segs;
> }
>
> +void udp_add_offload(struct udp_offload *uo)
> +{
> + struct list_head *head = &udp_offload_base;
> +
> + spin_lock(&udp_offload_lock);
> + list_add_rcu(&uo->list, head);
> + spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_add_offload);
> +
> +void udp_del_offload(struct udp_offload *uo)
> +{
> + struct list_head *head = &udp_offload_base;
> + struct udp_offload *uo1;
> +
> + spin_lock(&udp_offload_lock);
> + list_for_each_entry(uo1, head, list) {
> + if (uo == uo1) {
> + list_del_rcu(&uo->list);
> + goto out;
> + }
> + }
> +
> + pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
> +out:
> + spin_unlock(&udp_offload_lock);
> +
> + synchronize_net();
> +}
> +EXPORT_SYMBOL(udp_del_offload);
> +
> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> + struct list_head *ohead = &udp_offload_base;
> + struct udp_offload *poffload;
> + struct sk_buff *p, **pp = NULL;
> + struct udphdr *uh, *uh2;
> + unsigned int hlen, off;
> + int flush = 1;
> +
> + if (NAPI_GRO_CB(skb)->udp_mark ||
> + (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
> + goto out;
> +
> + /* mark that this skb passed once through the udp gro layer */
> + NAPI_GRO_CB(skb)->udp_mark = 1;
> +
> + off = skb_gro_offset(skb);
> + hlen = off + sizeof(*uh);
> + uh = skb_gro_header_fast(skb, off);
> + if (skb_gro_header_hard(skb, hlen)) {
> + uh = skb_gro_header_slow(skb, hlen, off);
> + if (unlikely(!uh))
> + goto out;
> + }
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(poffload, ohead, list) {
> + if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
Is gro_receive == NULL ever valid? Maybe we can assert on registration
instead of checking on every packet.
Maybe make this poffload->port == uh->dest and goto "flush = 0".
Check below that list end was reached becomes unnecessary.
> + continue;
> + break;
> + }
> +
> + if (&poffload->list == ohead)
> + goto out_unlock;
> +
> + flush = 0;
> +
> + for (p = *head; p; p = p->next) {
> + if (!NAPI_GRO_CB(p)->same_flow)
> + continue;
> +
> + uh2 = (struct udphdr *)(p->data + off);
> + if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> + NAPI_GRO_CB(p)->same_flow = 0;
> + continue;
> + }
> + goto found;
> + }
> +
> +found:
> + skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
> + pp = poffload->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> + rcu_read_unlock();
> +out:
> + NAPI_GRO_CB(skb)->flush |= flush;
> +
> + return pp;
> +}
> +
> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> + struct list_head *ohead = &udp_offload_base;
> + struct udp_offload *poffload;
> + __be16 newlen = htons(skb->len - nhoff);
> + struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> + int err = -ENOSYS;
> +
> + uh->len = newlen;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(poffload, ohead, list) {
> + if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
> + continue;
> + break;
> + }
> +
> + if (&poffload->list != ohead)
> + err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
> +
> + rcu_read_unlock();
> + return err;
> +}
> +
> static const struct net_offload udpv4_offload = {
> .callbacks = {
> .gso_send_check = udp4_ufo_send_check,
> .gso_segment = udp4_ufo_fragment,
> + .gro_receive = udp_gro_receive,
> + .gro_complete = udp_gro_complete,
> },
> };
>
> int __init udpv4_offload_init(void)
> {
> + INIT_LIST_HEAD(&udp_offload_base);
> return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
> }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
@ 2014-01-08 22:09 ` Eric Dumazet
2014-01-09 6:28 ` Or Gerlitz
2014-01-08 22:11 ` Eric Dumazet
1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-01-08 22:09 UTC (permalink / raw)
To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop
On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
> +
> +static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> + struct ethhdr *eh;
> + struct packet_offload *ptype;
> + __be16 type;
> + /* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
> + int vxlan_len = 22;
I am pretty sure this can use existing macros or sizeof(...)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2014-01-08 22:09 ` Eric Dumazet
@ 2014-01-08 22:11 ` Eric Dumazet
2014-01-09 6:32 ` Or Gerlitz
1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2014-01-08 22:11 UTC (permalink / raw)
To: Or Gerlitz; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop
On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
> +
> /* Notify netdevs that UDP port started listening */
> -static void vxlan_notify_add_rx_port(struct sock *sk)
> +static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
> {
> struct net_device *dev;
> + struct sock *sk = vs->sock->sk;
> struct net *net = sock_net(sk);
> sa_family_t sa_family = sk->sk_family;
> __be16 port = inet_sk(sk)->inet_sport;
> @@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
> port);
> }
> rcu_read_unlock();
> +
> + if (sa_family == AF_INET)
> + call_rcu(&vs->rcu, vxlan_add_udp_offload);
Why waiting RCU grace period here ?
> }
>
> /* Notify netdevs that UDP port is no more listening */
> -static void vxlan_notify_del_rx_port(struct sock *sk)
> +static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
> {
> struct net_device *dev;
> + struct sock *sk = vs->sock->sk;
> struct net *net = sock_net(sk);
> sa_family_t sa_family = sk->sk_family;
> __be16 port = inet_sk(sk)->inet_sport;
> @@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
> port);
> }
> rcu_read_unlock();
> +
> + if (sa_family == AF_INET)
> + call_rcu(&vs->rcu, vxlan_del_udp_offload);
> }
This looks buggy.
You need to :
1) remove the offload structure from list
2) Then wait rcu grace period, and finally free the memory
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-08 21:58 ` Tom Herbert
@ 2014-01-09 6:25 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09 6:25 UTC (permalink / raw)
To: Tom Herbert
Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
David Miller, Yan Burman, Shlomo Pongratz
On 08/01/2014 23:58, Tom Herbert wrote:
>> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>> >+{
>> >+ struct list_head *ohead = &udp_offload_base;
>> >+ struct udp_offload *poffload;
>> >+ struct sk_buff *p, **pp = NULL;
>> >+ struct udphdr *uh, *uh2;
>> >+ unsigned int hlen, off;
>> >+ int flush = 1;
>> >+
>> >+ if (NAPI_GRO_CB(skb)->udp_mark ||
>> >+ (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
>> >+ goto out;
>> >+
>> >+ /* mark that this skb passed once through the udp gro layer */
>> >+ NAPI_GRO_CB(skb)->udp_mark = 1;
>> >+
>> >+ off = skb_gro_offset(skb);
>> >+ hlen = off + sizeof(*uh);
>> >+ uh = skb_gro_header_fast(skb, off);
>> >+ if (skb_gro_header_hard(skb, hlen)) {
>> >+ uh = skb_gro_header_slow(skb, hlen, off);
>> >+ if (unlikely(!uh))
>> >+ goto out;
>> >+ }
>> >+
>> >+ rcu_read_lock();
>> >+ list_for_each_entry_rcu(poffload, ohead, list) {
>> >+ if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
> Is gro_receive == NULL ever valid? Maybe we can assert on registration instead of checking on every packet.
I see your point, however, the offload structure contains entries for
both gro and gso, asserting on registration could somehow limit the use
cases, isn't that?
> Maybe make this poffload->port == uh->dest and goto "flush = 0". Check below that list end was reached becomes unnecessary.
Sure, will use goto "flush = 0" and if we didn't go there we'll go to
out_unlock
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
2014-01-08 22:09 ` Eric Dumazet
@ 2014-01-09 6:28 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09 6:28 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
On 09/01/2014 00:09, Eric Dumazet wrote:
> On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
>> +
>> +static int vxlan_gro_complete(struct sk_buff *skb, int nhoff)
>> +{
>> + struct ethhdr *eh;
>> + struct packet_offload *ptype;
>> + __be16 type;
>> + /* 22 = 8 bytes for the vlxan header + 14 bytes for the inner eth header */
>> + int vxlan_len = 22;
>
>
> I am pretty sure this can use existing macros or sizeof(...)
sure, will fix
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic
2014-01-08 22:11 ` Eric Dumazet
@ 2014-01-09 6:32 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09 6:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: hkchu, edumazet, herbert, netdev, davem, yanb, shlomop
On 09/01/2014 00:11, Eric Dumazet wrote:
> On Wed, 2014-01-08 at 22:34 +0200, Or Gerlitz wrote:
>
>> +
>> /* Notify netdevs that UDP port started listening */
>> -static void vxlan_notify_add_rx_port(struct sock *sk)
>> +static void vxlan_notify_add_rx_port(struct vxlan_sock *vs)
>> {
>> struct net_device *dev;
>> + struct sock *sk = vs->sock->sk;
>> struct net *net = sock_net(sk);
>> sa_family_t sa_family = sk->sk_family;
>> __be16 port = inet_sk(sk)->inet_sport;
>> @@ -569,12 +671,16 @@ static void vxlan_notify_add_rx_port(struct sock *sk)
>> port);
>> }
>> rcu_read_unlock();
>> +
>> + if (sa_family == AF_INET)
>> + call_rcu(&vs->rcu, vxlan_add_udp_offload);
> Why waiting RCU grace period here?
Basically the add operation can be done right away, however, since the
delete operation can't be done
instantly when we want it, I wanted to protect against a series of
add/del/add in times T1 < T2 < T3
T1 add(X)
T2 del(X)
T3 add(X)
where the delete is deferred and as a result the 2nd add is done before
the delete and @ the end offload X is not added in the 2nd time.From
your other comment below I conclude that I probably miss something about
the rcu usage here, so will give it further thought.
>
>> }
>>
>> /* Notify netdevs that UDP port is no more listening */
>> -static void vxlan_notify_del_rx_port(struct sock *sk)
>> +static void vxlan_notify_del_rx_port(struct vxlan_sock *vs)
>> {
>> struct net_device *dev;
>> + struct sock *sk = vs->sock->sk;
>> struct net *net = sock_net(sk);
>> sa_family_t sa_family = sk->sk_family;
>> __be16 port = inet_sk(sk)->inet_sport;
>> @@ -586,6 +692,9 @@ static void vxlan_notify_del_rx_port(struct sock *sk)
>> port);
>> }
>> rcu_read_unlock();
>> +
>> + if (sa_family == AF_INET)
>> + call_rcu(&vs->rcu, vxlan_del_udp_offload);
>> }
> This looks buggy.
>
> You need to :
>
> 1) remove the offload structure from list
> 2) Then wait rcu grace period, and finally free the memory
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
2014-01-08 20:39 ` Or Gerlitz
2014-01-08 21:58 ` Tom Herbert
@ 2014-01-09 7:09 ` Tom Herbert
2014-01-09 7:14 ` Or Gerlitz
2 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2014-01-09 7:09 UTC (permalink / raw)
To: Or Gerlitz
Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
David Miller, Yan Burman, Shlomo Pongratz
On Wed, Jan 8, 2014 at 12:34 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Add GRO handlers for protocols that do UDP encapsulation, with the intent of
> being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
> ---
> include/linux/netdevice.h | 10 +++-
> include/net/protocol.h | 3 +
> net/core/dev.c | 1 +
> net/ipv4/udp_offload.c | 129 +++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 142 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a2a70cc..360551a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1652,7 +1652,9 @@ struct napi_gro_cb {
> unsigned long age;
>
> /* Used in ipv6_gro_receive() */
> - int proto;
> + u16 proto;
> +
> + u16 udp_mark;
>
> /* used to support CHECKSUM_COMPLETE for tunneling protocols */
> __wsum csum;
> @@ -1691,6 +1693,12 @@ struct packet_offload {
> struct list_head list;
> };
>
> +struct udp_offload {
> + __be16 port;
> + struct offload_callbacks callbacks;
> + struct list_head list;
> +};
> +
> /* often modified stats are per cpu, other are shared (netdev->stats) */
> struct pcpu_sw_netstats {
> u64 rx_packets;
> diff --git a/include/net/protocol.h b/include/net/protocol.h
> index fbf7676..fe9af94 100644
> --- a/include/net/protocol.h
> +++ b/include/net/protocol.h
> @@ -103,6 +103,9 @@ int inet_del_offload(const struct net_offload *prot, unsigned char num);
> void inet_register_protosw(struct inet_protosw *p);
> void inet_unregister_protosw(struct inet_protosw *p);
>
> +void udp_add_offload(struct udp_offload *prot);
> +void udp_del_offload(struct udp_offload *prot);
> +
> #if IS_ENABLED(CONFIG_IPV6)
> int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char num);
> int inet6_del_protocol(const struct inet6_protocol *prot, unsigned char num);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ce01847..11f7acf 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3858,6 +3858,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
> NAPI_GRO_CB(skb)->same_flow = 0;
> NAPI_GRO_CB(skb)->flush = 0;
> NAPI_GRO_CB(skb)->free = 0;
> + NAPI_GRO_CB(skb)->udp_mark = 0;
>
> pp = ptype->callbacks.gro_receive(&napi->gro_list, skb);
> break;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 79c62bd..2846ade 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -13,6 +13,16 @@
> #include <linux/skbuff.h>
> #include <net/udp.h>
> #include <net/protocol.h>
> +/*
> +struct udp_offload {
> + __be16 port;
> + struct offload_callbacks callbacks;
> + struct list_head list;
> +};
> +*/
> +
> +static DEFINE_SPINLOCK(udp_offload_lock);
> +static struct list_head udp_offload_base __read_mostly;
>
> static int udp4_ufo_send_check(struct sk_buff *skb)
> {
> @@ -89,14 +99,133 @@ out:
> return segs;
> }
>
> +void udp_add_offload(struct udp_offload *uo)
> +{
> + struct list_head *head = &udp_offload_base;
> +
> + spin_lock(&udp_offload_lock);
> + list_add_rcu(&uo->list, head);
> + spin_unlock(&udp_offload_lock);
> +}
> +EXPORT_SYMBOL(udp_add_offload);
> +
> +void udp_del_offload(struct udp_offload *uo)
> +{
> + struct list_head *head = &udp_offload_base;
> + struct udp_offload *uo1;
> +
> + spin_lock(&udp_offload_lock);
> + list_for_each_entry(uo1, head, list) {
> + if (uo == uo1) {
> + list_del_rcu(&uo->list);
> + goto out;
> + }
> + }
> +
> + pr_warn("udp_remove_offload: %p not found port %d\n", uo, htons(uo->port));
> +out:
> + spin_unlock(&udp_offload_lock);
> +
> + synchronize_net();
> +}
> +EXPORT_SYMBOL(udp_del_offload);
> +
> +static struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
> +{
> + struct list_head *ohead = &udp_offload_base;
> + struct udp_offload *poffload;
> + struct sk_buff *p, **pp = NULL;
> + struct udphdr *uh, *uh2;
> + unsigned int hlen, off;
> + int flush = 1;
> +
> + if (NAPI_GRO_CB(skb)->udp_mark ||
> + (!skb->encapsulation && skb->ip_summed != CHECKSUM_COMPLETE))
> + goto out;
> +
> + /* mark that this skb passed once through the udp gro layer */
> + NAPI_GRO_CB(skb)->udp_mark = 1;
> +
> + off = skb_gro_offset(skb);
> + hlen = off + sizeof(*uh);
> + uh = skb_gro_header_fast(skb, off);
> + if (skb_gro_header_hard(skb, hlen)) {
> + uh = skb_gro_header_slow(skb, hlen, off);
> + if (unlikely(!uh))
> + goto out;
> + }
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(poffload, ohead, list) {
> + if (poffload->port != uh->dest || !poffload->callbacks.gro_receive)
> + continue;
> + break;
> + }
> +
> + if (&poffload->list == ohead)
> + goto out_unlock;
> +
> + flush = 0;
> +
> + for (p = *head; p; p = p->next) {
> + if (!NAPI_GRO_CB(p)->same_flow)
> + continue;
> +
> + uh2 = (struct udphdr *)(p->data + off);
> + if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
> + NAPI_GRO_CB(p)->same_flow = 0;
> + continue;
> + }
> + goto found;
I don't believe this is correct. If you exit on the first match, skb's
that follow in the list can still be marked as same_flow. You need to
walk the whole list I believe (just get rid of the goto).
> + }
> +
> +found:
> + skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
> + pp = poffload->callbacks.gro_receive(head, skb);
> +
> +out_unlock:
> + rcu_read_unlock();
> +out:
> + NAPI_GRO_CB(skb)->flush |= flush;
> +
> + return pp;
> +}
> +
> +static int udp_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> + struct list_head *ohead = &udp_offload_base;
> + struct udp_offload *poffload;
> + __be16 newlen = htons(skb->len - nhoff);
> + struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> + int err = -ENOSYS;
> +
> + uh->len = newlen;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(poffload, ohead, list) {
> + if (poffload->port != uh->dest || !poffload->callbacks.gro_complete)
> + continue;
> + break;
> + }
> +
> + if (&poffload->list != ohead)
> + err = poffload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
> +
> + rcu_read_unlock();
> + return err;
> +}
> +
> static const struct net_offload udpv4_offload = {
> .callbacks = {
> .gso_send_check = udp4_ufo_send_check,
> .gso_segment = udp4_ufo_fragment,
> + .gro_receive = udp_gro_receive,
> + .gro_complete = udp_gro_complete,
> },
> };
>
> int __init udpv4_offload_init(void)
> {
> + INIT_LIST_HEAD(&udp_offload_base);
> return inet_add_offload(&udpv4_offload, IPPROTO_UDP);
> }
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 1/3] net: Add GRO support for UDP encapsulating protocols
2014-01-09 7:09 ` Tom Herbert
@ 2014-01-09 7:14 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-09 7:14 UTC (permalink / raw)
To: Tom Herbert
Cc: Jerry Chu, Eric Dumazet, Herbert Xu, Linux Netdev List,
David Miller, Yan Burman, Shlomo Pongratz
On 09/01/2014 09:09, Tom Herbert wrote:
>> + for (p = *head; p; p = p->next) {
>> >+ if (!NAPI_GRO_CB(p)->same_flow)
>> >+ continue;
>> >+
>> >+ uh2 = (struct udphdr *)(p->data + off);
>> >+ if ((*(u32 *)&uh->source != *(u32 *)&uh2->source)) {
>> >+ NAPI_GRO_CB(p)->same_flow = 0;
>> >+ continue;
>> >+ }
>> >+ goto found;
> I don't believe this is correct. If you exit on the first match, skb's
> that follow in the list can still be marked as same_flow. You need to
> walk the whole list I believe (just get rid of the goto).
>
Good catch, will fix.
Looking on the ipv4 and gre gro_receive callbacks they indeed go over
all the list where the tcp gro_receive code does allow itself do goto
found, probably b/c they are last in the chain?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
2014-01-14 16:00 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
@ 2014-01-14 16:06 ` Or Gerlitz
0 siblings, 0 replies; 15+ messages in thread
From: Or Gerlitz @ 2014-01-14 16:06 UTC (permalink / raw)
To: David Miller; +Cc: netdev
On 14/01/2014 18:00, Or Gerlitz wrote:
> This series adds GRO handlers for protocols that do UDP encapsulation, with the
> intent of being able to coalesce packets which encapsulate packets belonging to
> the same TCP session.
>
> For GRO purposes, the destination UDP port takes the role of the ether type
> field in the ethernet header or the next protocol in the IP header.
>
> The UDP GRO handler will only attempt to coalesce packets whose destination
> port is registered to have gro handler.
>
> The patches done against net-next ae237b3ede64 "net: 3com: fix
> warning for incorrect type in argument"
>
> Or.
>
>
> v3 --> v4 changes:
>
> - applied feedback from Tom on some micro-optimizations that save
> branches and goto directives in the udp gro logic
>
> - applied feedback from Eric on correct RCU programming for the
> add/remove flow of the upper protocols udp gro handlers
>
>
> v2 --> v3 changes:
>
> - moved to use linked list to store the udp gro handlers, this solves the
> problem of consuming 512KB of memory for the handlers.
>
> - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
> on a packet, this solves the problem of udp encapsulated packets whose inner VM
> packet is udp and happen to carry a port which has registered offloads - and flush it.
>
> - invoke the udp offload protocol registration and de-registration from the vxlan driver
> in a sleepable context
>
> For unclear some reason I got this warning when the vxlan driver deletes the
> udp offload structure
> *** BLURB HERE ***
Sorry for the spam, the above three lines are leftovers from the V3
cover letter, same for the subject line of this
cover-letter which carries "V3" this *is* V4, will make sure to avoid
such flushes (....) in the future.
Or.
>
> Or Gerlitz (3):
> net: Add GRO support for UDP encapsulating protocols
> net: Export gro_find_by_type helpers
> net: Add GRO support for vxlan traffic
>
> drivers/net/vxlan.c | 117 +++++++++++++++++++++++++++++++--
> include/linux/netdevice.h | 10 +++-
> include/net/protocol.h | 3 +
> include/net/vxlan.h | 1 +
> net/core/dev.c | 3 +
> net/ipv4/udp_offload.c | 157 +++++++++++++++++++++++++++++++++++++++++++++
> 6 files changed, 283 insertions(+), 8 deletions(-)
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols
@ 2014-01-14 16:00 Or Gerlitz
2014-01-14 16:06 ` Or Gerlitz
0 siblings, 1 reply; 15+ messages in thread
From: Or Gerlitz @ 2014-01-14 16:00 UTC (permalink / raw)
To: davem
Cc: netdev, hkchu, edumazet, herbert, yanb, shlomop, therbert, Or Gerlitz
This series adds GRO handlers for protocols that do UDP encapsulation, with the
intent of being able to coalesce packets which encapsulate packets belonging to
the same TCP session.
For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.
The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.
The patches done against net-next ae237b3ede64 "net: 3com: fix
warning for incorrect type in argument"
Or.
v3 --> v4 changes:
- applied feedback from Tom on some micro-optimizations that save
branches and goto directives in the udp gro logic
- applied feedback from Eric on correct RCU programming for the
add/remove flow of the upper protocols udp gro handlers
v2 --> v3 changes:
- moved to use linked list to store the udp gro handlers, this solves the
problem of consuming 512KB of memory for the handlers.
- use a mark on the skb GRO CB data to disallow running the udp gro_receive twice
on a packet, this solves the problem of udp encapsulated packets whose inner VM
packet is udp and happen to carry a port which has registered offloads - and flush it.
- invoke the udp offload protocol registration and de-registration from the vxlan driver
in a sleepable context
For unclear some reason I got this warning when the vxlan driver deletes the
udp offload structure
*** BLURB HERE ***
Or Gerlitz (3):
net: Add GRO support for UDP encapsulating protocols
net: Export gro_find_by_type helpers
net: Add GRO support for vxlan traffic
drivers/net/vxlan.c | 117 +++++++++++++++++++++++++++++++--
include/linux/netdevice.h | 10 +++-
include/net/protocol.h | 3 +
include/net/vxlan.h | 1 +
net/core/dev.c | 3 +
net/ipv4/udp_offload.c | 157 +++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 283 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-01-14 16:10 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-08 20:34 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 1/3] " Or Gerlitz
2014-01-08 20:39 ` Or Gerlitz
2014-01-08 21:58 ` Tom Herbert
2014-01-09 6:25 ` Or Gerlitz
2014-01-09 7:09 ` Tom Herbert
2014-01-09 7:14 ` Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 2/3] net: Export gro_find_by_type helpers Or Gerlitz
2014-01-08 20:34 ` [PATCH net-next V3 3/3] net: Add GRO support for vxlan traffic Or Gerlitz
2014-01-08 22:09 ` Eric Dumazet
2014-01-09 6:28 ` Or Gerlitz
2014-01-08 22:11 ` Eric Dumazet
2014-01-09 6:32 ` Or Gerlitz
2014-01-14 16:00 [PATCH net-next V3 0/3] net: Add GRO support for UDP encapsulating protocols Or Gerlitz
2014-01-14 16:06 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).