All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF)
@ 2016-02-04  6:36 Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 01/13] net: allow to leave the buffer fragmented in skb_cow_data() Steffen Klassert
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patchset adds some performance improvements for IPsec. It is a
early stage RFC version and still contains bugs. I post this now just
to have a discussion base for the IPsec performance BoF at the netdev
conference next week.

The patchset adds a GRO/GSO codepath for IPsec and tries to avoid the
linearization of the buffers whenever possible.

The GRO part seems to work well. GSO and avoiding linearization still
have problems, in particular with async crypto operations.

Below are some performance numbers.

Transport mode (measured by Sowmini Varadhan):

Baseline:

2.6 Gbps ESP-NULL
2.17 Gbps AES-GCM-256

Avoid frame copy + GSO/GRO:

8 Gbps ESP-NULL
4.2 Gbps AES-GCM-256

Forwarding with tunnel mode (measured by myself):

Baseline:

3.63 Gbps prcypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))

Avoid frame copy + GSO/GRO:

4.33 Gbps pcrypt(echainiv(authenc(hmac(sha1-ssse3),cbc-aes-aesni)))

I used the following cpu bindings for the pcrypt setup:

----------  TCP   ----------  ESP Tunnel     ---------   TCP   ----------
|iperf -c|------->|IPsec TX|---------------->|IPsec RX|------->|iperf -s|
----------        ----------                 ----------        ----------

cpu0           RX |        |              RX |        |
cpu1              |        | TX              |        | TX
cpu2              | crypto |                 | crypto |
cpu3              | crypto |                 | crypto |
cpu4              | crypto |                 | crypto |
cpu5              | crypto |                 | crypto |

Packet forwarding is done with four machines.
The crypto operations are isolated from the networking path.
Packets traverse the stack as follows:

- Packet is received on cpu0 (irqs pinned).
- cpu0 enqueues the crypto request to a pcrypt parallelization queue.
- Crypto is done on cpu 2-5, crypto requests are pinned round robin
  to the cpus. pcrypt ensures to keep the requests in the right order.
- After crypto is done, the requests are queued to a pcrypt
  serialization queue.
- cpu1 gets a callback from the crypto layer and does the final TX
  path.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RFC 01/13] net: allow to leave the buffer fragmented in skb_cow_data()
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 02/13] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, Mathias Krause, sowmini.varadhan

From: Mathias Krause <mathias.krause@secunet.com>

Do not linearize the buffer per se but only if we're expected to expand
the tail. All callers can handle fragmented buffers and even expect
them!

Not linearizing the buffer leads to a small performance improvement for
the IPsec receive path in case the network driver passed us a fragmented
buffer.

With this patch applied I was able to increase the throughput of an
IPsec gateway from 7.12 Gbit/s to 7.28 Gbit/s.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/skbuff.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b2df375..120add40 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3445,7 +3445,7 @@ EXPORT_SYMBOL_GPL(skb_to_sgvec);
  *
  *	If @tailbits is given, make sure that there is space to write @tailbits
  *	bytes of data beyond current end of socket buffer.  @trailer will be
- *	set to point to the skb in which this space begins.
+ *	linearized and set to point to the skb in which this space begins.
  *
  *	The number of scatterlist elements required to completely map the
  *	COW'd and extended socket buffer will be returned.
@@ -3456,11 +3456,10 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 	int elt;
 	struct sk_buff *skb1, **skb_p;
 
-	/* If skb is cloned or its head is paged, reallocate
-	 * head pulling out all the pages (pages are considered not writable
-	 * at the moment even if they are anonymous).
+	/* If skb is cloned reallocate head pulling out all the pages (pages are
+	 * considered not writable at the moment even if they are anonymous).
 	 */
-	if ((skb_cloned(skb) || skb_shinfo(skb)->nr_frags) &&
+	if (skb_cloned(skb) &&
 	    __pskb_pull_tail(skb, skb_pagelen(skb)-skb_headlen(skb)) == NULL)
 		return -ENOMEM;
 
@@ -3471,18 +3470,26 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 		 * good frames. OK, on miss we reallocate and reserve even more
 		 * space, 128 bytes is fair. */
 
-		if (skb_tailroom(skb) < tailbits &&
-		    pskb_expand_head(skb, 0, tailbits-skb_tailroom(skb)+128, GFP_ATOMIC))
-			return -ENOMEM;
+		if (tailbits) {
+			if (skb_linearize(skb))
+				return -ENOMEM;
+
+			if (skb_tailroom(skb) < tailbits) {
+				int ntail = tailbits - skb_tailroom(skb) + 128;
+
+				if (pskb_expand_head(skb, 0, ntail, GFP_ATOMIC))
+					return -ENOMEM;
+			}
+		}
 
 		/* Voila! */
 		*trailer = skb;
-		return 1;
+		return skb_shinfo(skb)->nr_frags + 1;
 	}
 
 	/* Misery. We are in troubles, going to mincer fragments... */
 
-	elt = 1;
+	elt = skb_shinfo(skb)->nr_frags + 1;
 	skb_p = &skb_shinfo(skb)->frag_list;
 	copyflag = 0;
 
@@ -3534,7 +3541,7 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 			kfree_skb(skb1);
 			skb1 = skb2;
 		}
-		elt++;
+		elt += skb_shinfo(skb1)->nr_frags + 1;
 		*trailer = skb1;
 		skb_p = &skb1->next;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 02/13] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 01/13] net: allow to leave the buffer fragmented in skb_cow_data() Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 03/13] esp: Add a software GRO codepath Steffen Klassert
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This partly reverts the below mentioned patch because on
forwarding, such skbs can't be offloaded to a NIC.

This is just a hack to get IPsec GRO for forwarding to work.
A real fix may consider the proposed solutions in the original
patch, see below.

-------------------------------------------------------------------------
commit 8a29111c7ca68d928dfab58636f3f6acf0ac04f7
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 8 09:02:23 2013 -0700

    net: gro: allow to build full sized skb

    skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
    typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.

    It's relatively easy to extend the skb using frag_list to allow
    more frags to be appended into the last sk_buff.

    This still builds very efficient skbs, and allows reaching 45 MSS per
    skb.

    (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
    sk_buff)

    High speed TCP flows benefit from this extension by lowering TCP stack
    cpu usage (less packets stored in receive queue, less ACK packets
    processed)

    Forwarding setups could be hurt, as such skbs will need to be
    linearized, although its not a new problem, as GRO could already
    provide skbs with a frag_list.

    We could make the 65536 bytes threshold a tunable to mitigate this.

    (First time we need to linearize skb in skb_needs_linearize(), we could
    lower the tunable to ~16*1460 so that following skb_gro_receive() calls
    build smaller skbs)

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
---------------------------------------------------------------------------

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/skbuff.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 120add40..336a3e9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3224,7 +3224,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		int nr_frags = pinfo->nr_frags + i;
 
 		if (nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+			return -E2BIG;
 
 		offset -= headlen;
 		pinfo->nr_frags = nr_frags;
@@ -3257,7 +3257,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		unsigned int first_offset;
 
 		if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+			return -E2BIG;
 
 		first_offset = skb->data -
 			       (unsigned char *)page_address(page) +
@@ -3277,7 +3277,6 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		goto done;
 	}
 
-merge:
 	delta_truesize = skb->truesize;
 	if (offset > headlen) {
 		unsigned int eat = offset - headlen;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 03/13] esp: Add a software GRO codepath
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 01/13] net: allow to leave the buffer fragmented in skb_cow_data() Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 02/13] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 04/13] xfrm: Move device notifications to a sepatate file Steffen Klassert
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch adds GRO callbacks for ESP on ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp4_gro_receive() function calls the xfrm input layer
which decapsulates the packet and reinject it into
layer 2 by calling netif_rx(). We use on bit of the
sk_buff to flag xfrm_gro. If this bit is set, the
process_backlog() function calls napi_gro_receive()
intead of __netif_receive_skb(). We could avoid the
usage of xfrm_gro if we could call napi_gro_receive()
unconditionaly from process_backlog().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h       |  1 +
 include/linux/skbuff.h          |  3 +-
 net/core/dev.c                  | 14 ++++++-
 net/ipv4/Makefile               |  2 +-
 net/ipv4/esp4_offload.c         | 77 +++++++++++++++++++++++++++++++++++++++
 net/ipv4/xfrm4_input.c          |  3 ++
 net/ipv4/xfrm4_mode_transport.c |  3 +-
 net/ipv6/Makefile               |  2 +-
 net/ipv6/esp6_offload.c         | 81 +++++++++++++++++++++++++++++++++++++++++
 net/ipv6/xfrm6_input.c          |  3 ++
 net/xfrm/xfrm_input.c           |  8 +++-
 11 files changed, 191 insertions(+), 6 deletions(-)
 create mode 100644 net/ipv4/esp4_offload.c
 create mode 100644 net/ipv6/esp6_offload.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 289c231..6fd1f1d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -338,6 +338,7 @@ enum gro_result {
 	GRO_HELD,
 	GRO_NORMAL,
 	GRO_DROP,
+	GRO_CONSUMED,
 };
 typedef enum gro_result gro_result_t;
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 11f935c..b84245f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -721,7 +721,8 @@ struct sk_buff {
 	__u8			ipvs_property:1;
 	__u8			inner_protocol_type:1;
 	__u8			remcsum_offload:1;
-	/* 3 or 5 bit hole */
+	__u8			xfrm_gro:1;
+	/* 2 or 4 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d8..1a456ea 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4453,6 +4453,11 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 	}
 	rcu_read_unlock();
 
+	if (PTR_ERR(pp) == -EINPROGRESS) {
+		ret = GRO_CONSUMED;
+		goto ok;
+	}
+
 	if (&ptype->list == head)
 		goto normal;
 
@@ -4559,6 +4564,7 @@ static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 
 	case GRO_HELD:
 	case GRO_MERGED:
+	case GRO_CONSUMED:
 		break;
 	}
 
@@ -4629,6 +4635,7 @@ static gro_result_t napi_frags_finish(struct napi_struct *napi,
 		break;
 
 	case GRO_MERGED:
+	case GRO_CONSUMED:
 		break;
 	}
 
@@ -4770,7 +4777,12 @@ static int process_backlog(struct napi_struct *napi, int quota)
 		while ((skb = __skb_dequeue(&sd->process_queue))) {
 			rcu_read_lock();
 			local_irq_enable();
-			__netif_receive_skb(skb);
+
+			if (skb->xfrm_gro)
+				napi_gro_receive(napi, skb);
+			else
+				__netif_receive_skb(skb);
+
 			rcu_read_unlock();
 			local_irq_disable();
 			input_queue_head_incr(sd);
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 62c049b..48d3390 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -28,7 +28,7 @@ obj-$(CONFIG_NET_UDP_TUNNEL) += udp_tunnel.o
 obj-$(CONFIG_NET_IPVTI) += ip_vti.o
 obj-$(CONFIG_SYN_COOKIES) += syncookies.o
 obj-$(CONFIG_INET_AH) += ah4.o
-obj-$(CONFIG_INET_ESP) += esp4.o
+obj-$(CONFIG_INET_ESP) += esp4.o esp4_offload.o
 obj-$(CONFIG_INET_IPCOMP) += ipcomp.o
 obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_tunnel.o
 obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
new file mode 100644
index 0000000..f2b0d6d
--- /dev/null
+++ b/net/ipv4/esp4_offload.c
@@ -0,0 +1,77 @@
+/*
+ * IPV4 GSO/GRO offload support
+ * Linux INET implementation
+ *
+ * Copyright (C) 2015 secunet Security Networks AG
+ * Author: Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * ESP GRO support
+ */
+
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <net/ip.h>
+#include <net/xfrm.h>
+#include <net/esp.h>
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <net/udp.h>
+
+static struct sk_buff **esp4_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	if (NAPI_GRO_CB(skb)->flush)
+		goto out;
+
+	skb_pull(skb, skb_gro_offset(skb));
+	skb->xfrm_gro = 1;
+
+	xfrm4_rcv_encap(skb, IPPROTO_ESP, 0, 0);
+
+	return ERR_PTR(-EINPROGRESS);
+out:
+	return NULL;
+}
+
+static int esp4_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct crypto_aead *aead = x->data;
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data + nhoff);
+	struct packet_offload *ptype;
+	int err = -ENOENT;
+	__be16 type = skb->protocol;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	rcu_read_unlock();
+
+	return err;
+}
+
+static const struct net_offload esp4_offload = {
+	.callbacks = {
+		.gro_receive = esp4_gro_receive,
+		.gro_complete = esp4_gro_complete,
+	},
+};
+
+static int __init esp4_offload_init(void)
+{
+	return inet_add_offload(&esp4_offload, IPPROTO_ESP);
+}
+device_initcall(esp4_offload_init);
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 62e1e72..0fbc40f 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -53,6 +53,9 @@ int xfrm4_transport_finish(struct sk_buff *skb, int async)
 	iph->tot_len = htons(skb->len);
 	ip_send_check(iph);
 
+	if (skb->xfrm_gro)
+		return 0;
+
 	NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
 		xfrm4_rcv_encap_finish);
diff --git a/net/ipv4/xfrm4_mode_transport.c b/net/ipv4/xfrm4_mode_transport.c
index fd840c7..ce59c34 100644
--- a/net/ipv4/xfrm4_mode_transport.c
+++ b/net/ipv4/xfrm4_mode_transport.c
@@ -50,7 +50,8 @@ static int xfrm4_transport_input(struct xfrm_state *x, struct sk_buff *skb)
 		skb->network_header = skb->transport_header;
 	}
 	ip_hdr(skb)->tot_len = htons(skb->len + ihl);
-	skb_reset_transport_header(skb);
+	if (!skb->xfrm_gro)
+		skb_reset_transport_header(skb);
 	return 0;
 }
 
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 2fbd90b..64e3c4c 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -25,7 +25,7 @@ ipv6-$(CONFIG_SYN_COOKIES) += syncookies.o
 ipv6-objs += $(ipv6-y)
 
 obj-$(CONFIG_INET6_AH) += ah6.o
-obj-$(CONFIG_INET6_ESP) += esp6.o
+obj-$(CONFIG_INET6_ESP) += esp6.o esp6_offload.o
 obj-$(CONFIG_INET6_IPCOMP) += ipcomp6.o
 obj-$(CONFIG_INET6_XFRM_TUNNEL) += xfrm6_tunnel.o
 obj-$(CONFIG_INET6_TUNNEL) += tunnel6.o
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
new file mode 100644
index 0000000..54bcd61
--- /dev/null
+++ b/net/ipv6/esp6_offload.c
@@ -0,0 +1,81 @@
+/*
+ * IPV6 GSO/GRO offload support
+ * Linux INET implementation
+ *
+ * Copyright (C) 2015 secunet Security Networks AG
+ * Author: Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * ESP GRO support
+ */
+
+#include <linux/skbuff.h>
+#include <linux/init.h>
+#include <net/protocol.h>
+#include <crypto/aead.h>
+#include <crypto/authenc.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <net/ip.h>
+#include <net/xfrm.h>
+#include <net/esp.h>
+#include <linux/scatterlist.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <net/ip6_route.h>
+#include <net/ipv6.h>
+#include <linux/icmpv6.h>
+
+static struct sk_buff **esp6_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	if (NAPI_GRO_CB(skb)->flush)
+		goto out;
+
+	skb_pull(skb, skb_gro_offset(skb));
+	skb->xfrm_gro = 1;
+
+	XFRM_SPI_SKB_CB(skb)->family = AF_INET6;
+	XFRM_SPI_SKB_CB(skb)->daddroff = offsetof(struct ipv6hdr, daddr);
+	xfrm_input(skb, IPPROTO_ESP, 0, 0);
+
+	return ERR_PTR(-EINPROGRESS);
+out:
+	return NULL;
+}
+
+static int esp6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	struct xfrm_state *x = xfrm_input_state(skb);
+	struct crypto_aead *aead = x->data;
+	struct ip_esp_hdr *esph = (struct ip_esp_hdr *)(skb->data + nhoff);
+	struct packet_offload *ptype;
+	int err = -ENOENT;
+	__be16 type = skb->protocol;
+
+	rcu_read_lock();
+	ptype = gro_find_complete_by_type(type);
+	if (ptype != NULL)
+		err = ptype->callbacks.gro_complete(skb, nhoff + sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	rcu_read_unlock();
+
+	return err;
+}
+
+static const struct net_offload esp6_offload = {
+	.callbacks = {
+		.gro_receive = esp6_gro_receive,
+		.gro_complete = esp6_gro_complete,
+	},
+};
+
+static int __init esp6_offload_init(void)
+{
+	return inet6_add_offload(&esp6_offload, IPPROTO_ESP);
+}
+device_initcall(esp6_offload_init);
diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 0eaab1f..d0c73f0 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -42,6 +42,9 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
 	ipv6_hdr(skb)->payload_len = htons(skb->len);
 	__skb_push(skb, skb->data - skb_network_header(skb));
 
+	if (skb->xfrm_gro)
+		return -1;
+
 	NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING,
 		dev_net(skb->dev), NULL, skb, skb->dev, NULL,
 		ip6_rcv_finish);
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index ad7f5b3..aa56252 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -371,7 +371,13 @@ resume:
 		netif_rx(skb);
 		return 0;
 	} else {
-		return x->inner_mode->afinfo->transport_finish(skb, async);
+		err = x->inner_mode->afinfo->transport_finish(skb, async);
+		if (skb->xfrm_gro) {
+			netif_rx(skb);
+			return 0;
+		}
+
+		return err;
 	}
 
 drop_unlock:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 04/13] xfrm: Move device notifications to a sepatate file
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (2 preceding siblings ...)
  2016-02-04  6:36 ` [PATCH RFC 03/13] esp: Add a software GRO codepath Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 05/13] xfrm: Add callbacks for IPsec GSO offloading Steffen Klassert
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This is needed for the upcomming IPsec device offloading.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h     |  1 +
 net/xfrm/Makefile      |  2 +-
 net/xfrm/xfrm_device.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_policy.c | 17 +----------------
 4 files changed, 46 insertions(+), 17 deletions(-)
 create mode 100644 net/xfrm/xfrm_device.c

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index d6f6e50..aed7153 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1398,6 +1398,7 @@ struct xfrm6_tunnel {
 void xfrm_init(void);
 void xfrm4_init(void);
 int xfrm_state_init(struct net *net);
+void xfrm_dev_init(void);
 void xfrm_state_fini(struct net *net);
 void xfrm4_state_init(void);
 void xfrm4_protocol_init(void);
diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index c0e9619..55b2ac3 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -4,7 +4,7 @@
 
 obj-$(CONFIG_XFRM) := xfrm_policy.o xfrm_state.o xfrm_hash.o \
 		      xfrm_input.o xfrm_output.o \
-		      xfrm_sysctl.o xfrm_replay.o
+		      xfrm_sysctl.o xfrm_replay.o xfrm_device.o
 obj-$(CONFIG_XFRM_STATISTICS) += xfrm_proc.o
 obj-$(CONFIG_XFRM_ALGO) += xfrm_algo.o
 obj-$(CONFIG_XFRM_USER) += xfrm_user.o
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
new file mode 100644
index 0000000..34a260a
--- /dev/null
+++ b/net/xfrm/xfrm_device.c
@@ -0,0 +1,43 @@
+/*
+ * xfrm_device.c - IPsec device offloading code.
+ *
+ * Copyright (c) 2015 secunet Security Networks AG
+ *
+ * Author:
+ * Steffen Klassert <steffen.klassert@secunet.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/errno.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <net/dst.h>
+#include <net/xfrm.h>
+#include <linux/notifier.h>
+
+static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case NETDEV_DOWN:
+		xfrm_garbage_collect(dev_net(dev));
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block xfrm_dev_notifier = {
+	.notifier_call	= xfrm_dev_event,
+};
+
+void __net_init xfrm_dev_init(void)
+{
+	register_netdevice_notifier(&xfrm_dev_notifier);
+}
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index b5e665b..b157f7c 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2891,21 +2891,6 @@ int xfrm_policy_unregister_afinfo(struct xfrm_policy_afinfo *afinfo)
 }
 EXPORT_SYMBOL(xfrm_policy_unregister_afinfo);
 
-static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
-{
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-
-	switch (event) {
-	case NETDEV_DOWN:
-		xfrm_garbage_collect(dev_net(dev));
-	}
-	return NOTIFY_DONE;
-}
-
-static struct notifier_block xfrm_dev_notifier = {
-	.notifier_call	= xfrm_dev_event,
-};
-
 #ifdef CONFIG_XFRM_STATISTICS
 static int __net_init xfrm_statistics_init(struct net *net)
 {
@@ -2982,7 +2967,7 @@ static int __net_init xfrm_policy_init(struct net *net)
 	INIT_WORK(&net->xfrm.policy_hash_work, xfrm_hash_resize);
 	INIT_WORK(&net->xfrm.policy_hthresh.work, xfrm_hash_rebuild);
 	if (net_eq(net, &init_net))
-		register_netdevice_notifier(&xfrm_dev_notifier);
+		xfrm_dev_init();
 	return 0;
 
 out_bydst:
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 05/13] xfrm: Add callbacks for IPsec GSO offloading
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (3 preceding siblings ...)
  2016-02-04  6:36 ` [PATCH RFC 04/13] xfrm: Move device notifications to a sepatate file Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:36 ` [PATCH RFC 06/13] net: Add xfrm offload callbacks to struct net_device Steffen Klassert
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch prepares struct xfrm_type for IPsec GSO offloading by
adding a encap() callback for encapsulation and a output_tail()
callback to do the crypto operations after the return from the
GSO layer. We need the output_tail() callback to handle async
crypto operations.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index aed7153..a33ceb7 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -375,6 +375,8 @@ struct xfrm_type {
 	void			(*destructor)(struct xfrm_state *);
 	int			(*input)(struct xfrm_state *, struct sk_buff *skb);
 	int			(*output)(struct xfrm_state *, struct sk_buff *pskb);
+	int			(*output_tail)(struct xfrm_state *x, struct sk_buff *skb);
+	void			(*encap)(struct xfrm_state *x, struct sk_buff *skb);
 	int			(*reject)(struct xfrm_state *, struct sk_buff *,
 					  const struct flowi *);
 	int			(*hdr_offset)(struct xfrm_state *, struct sk_buff *, u8 **);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 06/13] net: Add xfrm offload callbacks to struct net_device
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (4 preceding siblings ...)
  2016-02-04  6:36 ` [PATCH RFC 05/13] xfrm: Add callbacks for IPsec GSO offloading Steffen Klassert
@ 2016-02-04  6:36 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 07/13] net: Add ESP offload features Steffen Klassert
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:36 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch adds the callbacks we need for IPsec GSO
and maybe also for IPsec hardware offload.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6fd1f1d..6936e96f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -778,6 +778,12 @@ static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a,
 
 typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
 				       struct sk_buff *skb);
+struct xfrmdev_ops {
+	int			(*xdo_dev_encap) (struct sk_buff *skb);
+	int			(*xdo_dev_prepare) (struct sk_buff *skb);
+	int			(*xdo_dev_validate) (struct sk_buff *skb);
+	void			(*xdo_dev_resume) (struct sk_buff *skb, int err);
+};
 
 /*
  * This structure defines the management hooks for network devices.
@@ -1625,6 +1631,9 @@ struct net_device {
 #ifdef CONFIG_NET_L3_MASTER_DEV
 	const struct l3mdev_ops	*l3mdev_ops;
 #endif
+#ifdef CONFIG_XFRM
+	const struct xfrmdev_ops *xfrmdev_ops;
+#endif
 
 	const struct header_ops *header_ops;
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 07/13] net: Add ESP offload features
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (5 preceding siblings ...)
  2016-02-04  6:36 ` [PATCH RFC 06/13] net: Add xfrm offload callbacks to struct net_device Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 08/13] esp4: Add a software GSO codepath Steffen Klassert
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch adds netdev features to configure IPsec offloads.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdev_features.h | 6 +++++-
 include/linux/netdevice.h       | 1 +
 include/linux/skbuff.h          | 2 ++
 net/core/ethtool.c              | 2 ++
 4 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index d9654f0e..4a62acb 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -48,8 +48,9 @@ enum {
 	NETIF_F_GSO_UDP_TUNNEL_BIT,	/* ... UDP TUNNEL with TSO */
 	NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
 	NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */
+	NETIF_F_GSO_ESP_BIT,		/* ... ESP with TSO */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_TUNNEL_REMCSUM_BIT,
+		NETIF_F_GSO_ESP_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
@@ -66,6 +67,7 @@ enum {
 	NETIF_F_HW_VLAN_STAG_FILTER_BIT,/* Receive filtering on VLAN STAGs */
 	NETIF_F_HW_L2FW_DOFFLOAD_BIT,	/* Allow L2 Forwarding in Hardware */
 	NETIF_F_BUSY_POLL_BIT,		/* Busy poll */
+	NETIF_F_ESP_OFFLOAD_BIT,	/* ESP transformation offload */
 
 	/*
 	 * Add your fresh new feature above and remember to update
@@ -119,11 +121,13 @@ enum {
 #define NETIF_F_GSO_UDP_TUNNEL	__NETIF_F(GSO_UDP_TUNNEL)
 #define NETIF_F_GSO_UDP_TUNNEL_CSUM __NETIF_F(GSO_UDP_TUNNEL_CSUM)
 #define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM)
+#define NETIF_F_GSO_ESP		__NETIF_F(GSO_ESP)
 #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER)
 #define NETIF_F_HW_VLAN_STAG_RX	__NETIF_F(HW_VLAN_STAG_RX)
 #define NETIF_F_HW_VLAN_STAG_TX	__NETIF_F(HW_VLAN_STAG_TX)
 #define NETIF_F_HW_L2FW_DOFFLOAD	__NETIF_F(HW_L2FW_DOFFLOAD)
 #define NETIF_F_BUSY_POLL	__NETIF_F(BUSY_POLL)
+#define NETIF_F_ESP_OFFLOAD	__NETIF_F(ESP_OFFLOAD)
 
 #define for_each_netdev_feature(mask_addr, bit)	\
 	for_each_set_bit(bit, (unsigned long *)mask_addr, NETDEV_FEATURE_COUNT)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6936e96f..adbca16 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3938,6 +3938,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL != (NETIF_F_GSO_UDP_TUNNEL >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_TUNNEL_CSUM != (NETIF_F_GSO_UDP_TUNNEL_CSUM >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b84245f..4652f2c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -481,6 +481,8 @@ enum {
 	SKB_GSO_UDP_TUNNEL_CSUM = 1 << 11,
 
 	SKB_GSO_TUNNEL_REMCSUM = 1 << 12,
+
+	SKB_GSO_ESP = 1 << 13,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index daf0470..d9baba1 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -85,6 +85,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_GSO_IPIP_BIT] =	 "tx-ipip-segmentation",
 	[NETIF_F_GSO_SIT_BIT] =		 "tx-sit-segmentation",
 	[NETIF_F_GSO_UDP_TUNNEL_BIT] =	 "tx-udp_tnl-segmentation",
+	[NETIF_F_GSO_ESP_BIT] =		 "tx-esp-segmentation",
 
 	[NETIF_F_FCOE_CRC_BIT] =         "tx-checksum-fcoe-crc",
 	[NETIF_F_SCTP_CRC_BIT] =        "tx-checksum-sctp",
@@ -98,6 +99,7 @@ static const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN]
 	[NETIF_F_RXALL_BIT] =            "rx-all",
 	[NETIF_F_HW_L2FW_DOFFLOAD_BIT] = "l2-fwd-offload",
 	[NETIF_F_BUSY_POLL_BIT] =        "busy-poll",
+	[NETIF_F_ESP_OFFLOAD_BIT] =       "tx-esp-offload",
 };
 
 static const char
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 08/13] esp4: Add a software GSO codepath.
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (6 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 07/13] net: Add ESP offload features Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 09/13] esp: Avoid skb_cow_data whenever possible Steffen Klassert
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch adds an esp4_gso_segment() callback and registers
functions for the new ESP encapsulation and crypto callbacks.

The work to get transport mode ready was done by
Sowmini Varadhan <sowmini.varadhan@oracle.com>

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/skbuff.h  |  3 +-
 net/ipv4/af_inet.c      |  1 +
 net/ipv4/esp4.c         | 92 +++++++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/esp4_offload.c | 85 +++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_offload.c  |  1 +
 5 files changed, 178 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4652f2c..dcc6c85 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -724,7 +724,8 @@ struct sk_buff {
 	__u8			inner_protocol_type:1;
 	__u8			remcsum_offload:1;
 	__u8			xfrm_gro:1;
-	/* 2 or 4 bit hole */
+	__u8			hw_xfrm:1;
+	/* 1 or 3 bit hole */
 
 #ifdef CONFIG_NET_SCHED
 	__u16			tc_index;	/* traffic control index */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 5c5db66..ac6c1aa 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1220,6 +1220,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 		       SKB_GSO_UDP_TUNNEL |
 		       SKB_GSO_UDP_TUNNEL_CSUM |
 		       SKB_GSO_TUNNEL_REMCSUM |
+		       SKB_GSO_ESP |
 		       0)))
 		goto out;
 
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 4779374..550323d 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -86,6 +86,15 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
 			     __alignof__(struct scatterlist));
 }
 
+static void esp_output_done2(struct crypto_async_request *base, int err)
+{
+	struct sk_buff *skb = base->data;
+
+	kfree(ESP_SKB_CB(skb)->tmp);
+
+	skb_dst(skb)->dev->xfrmdev_ops->xdo_dev_resume(skb, err);
+}
+
 static void esp_output_done(struct crypto_async_request *base, int err)
 {
 	struct sk_buff *skb = base->data;
@@ -118,6 +127,69 @@ static void esp_output_done_esn(struct crypto_async_request *base, int err)
 	esp_output_done(base, err);
 }
 
+static void esp4_gso_encap(struct xfrm_state *x, struct sk_buff *skb)
+{
+	struct ip_esp_hdr *esph;
+	struct iphdr *iph = ip_hdr(skb);
+	int proto = iph->protocol;
+
+	skb_push(skb, -skb_network_offset(skb));
+	esph = ip_esp_hdr(skb);
+	*skb_mac_header(skb) = IPPROTO_ESP;
+
+	esph->spi = x->id.spi;
+
+	/* save off the next_proto in seq_no to be used in
+	 * esp4_gso_encap() for invoking protocol specific
+	 * segmentation offload.
+	 */
+	esph->seq_no = proto;
+}
+
+static int esp_output_tail(struct xfrm_state *x, struct sk_buff *skb)
+{
+	int err;
+	__be32 *seqhi;
+	int seqhilen;
+	u8 *iv;
+	struct crypto_aead *aead;
+	struct aead_request *req;
+	void *tmp;
+
+	aead = x->data;
+	tmp = ESP_SKB_CB(skb)->tmp;
+
+	seqhilen = 0;
+	if (x->props.flags & XFRM_STATE_ESN)
+		seqhilen += sizeof(__be32);
+
+	seqhi = esp_tmp_seqhi(tmp);
+	iv = esp_tmp_iv(aead, tmp, seqhilen);
+	req = esp_tmp_req(aead, iv);
+
+	aead_request_set_callback(req, 0, esp_output_done2, skb);
+
+	err = crypto_aead_encrypt(req);
+
+	switch (err) {
+	case -EINPROGRESS:
+		goto error;
+
+	case -EBUSY:
+		err = NET_XMIT_DROP;
+		break;
+
+	case 0:
+		if ((x->props.flags & XFRM_STATE_ESN))
+			esp_output_restore_header(skb);
+	}
+
+	kfree(tmp);
+
+error:
+	return err;
+}
+
 static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int err;
@@ -140,6 +212,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	int seqhilen;
 	__be32 *seqhi;
 	__be64 seqno;
+	int proto;
 
 	/* skb is pure payload to encrypt */
 
@@ -167,6 +240,7 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	assoclen = sizeof(*esph);
 	seqhilen = 0;
+	proto = ip_esp_hdr(skb)->seq_no;
 
 	if (x->props.flags & XFRM_STATE_ESN) {
 		seqhilen += sizeof(__be32);
@@ -196,12 +270,18 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 			tail[i] = i + 1;
 	} while (0);
 	tail[plen - 2] = plen - 2;
-	tail[plen - 1] = *skb_mac_header(skb);
+	if (!skb->hw_xfrm)
+		tail[plen - 1] = *skb_mac_header(skb);
+	else
+		tail[plen - 1] = proto;
+
 	pskb_put(skb, trailer, clen - skb->len + alen);
 
 	skb_push(skb, -skb_network_offset(skb));
 	esph = ip_esp_hdr(skb);
-	*skb_mac_header(skb) = IPPROTO_ESP;
+
+	if (!skb->hw_xfrm)
+		*skb_mac_header(skb) = IPPROTO_ESP;
 
 	/* this is non-NULL only with UDP Encapsulation */
 	if (x->encap) {
@@ -271,6 +351,10 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	       min(ivlen, 8));
 
 	ESP_SKB_CB(skb)->tmp = tmp;
+
+	if (skb->hw_xfrm)
+		return 0;
+
 	err = crypto_aead_encrypt(req);
 
 	switch (err) {
@@ -735,7 +819,9 @@ static const struct xfrm_type esp_type =
 	.destructor	= esp_destroy,
 	.get_mtu	= esp4_get_mtu,
 	.input		= esp_input,
-	.output		= esp_output
+	.output		= esp_output,
+	.output_tail	= esp_output_tail,
+	.encap		= esp4_gso_encap,
 };
 
 static struct xfrm4_protocol esp4_protocol = {
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index f2b0d6d..7c44c09 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -63,10 +63,95 @@ static int esp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return err;
 }
 
+static struct sk_buff *esp4_gso_segment(struct sk_buff *skb,
+				        netdev_features_t features)
+{
+	struct ip_esp_hdr *esph;
+	struct sk_buff *skb2;
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x;
+	struct crypto_aead *aead;
+	int err = 0;
+	const struct net_offload *ops;
+	int proto;
+
+	if (!dst || !dst->xfrm)
+		goto out;
+
+	x = dst->xfrm;
+	aead = x->data;
+	esph = ip_esp_hdr(skb);
+
+	proto = esph->seq_no;
+	if (esph->spi != x->id.spi)
+		goto out;
+
+	if (!pskb_may_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead)))
+		goto out;
+
+	__skb_pull(skb, sizeof(*esph) + crypto_aead_ivsize(aead));
+
+	skb->encap_hdr_csum = 1;
+
+	if (proto == IPPROTO_IPIP) {
+		__skb_push(skb, skb->mac_len);
+		segs = skb_mac_gso_segment(skb, features);
+	} else {
+		skb->transport_header += x->props.header_len;
+		ops = rcu_dereference(inet_offloads[proto]);
+		if (likely(ops && ops->callbacks.gso_segment))
+			segs = ops->callbacks.gso_segment(skb, features);
+	}
+	if (IS_ERR(segs))
+		goto out;
+	if (segs == NULL)
+		return ERR_PTR(-EINVAL);
+	__skb_pull(skb, skb->data - skb_mac_header(skb));
+
+	skb2 = segs;
+	do {
+		struct sk_buff *nskb = skb2->next;
+
+		if (proto == IPPROTO_IPIP) {
+			skb2->network_header = skb2->network_header - x->props.header_len;
+			skb2->transport_header = skb2->network_header + sizeof(struct iphdr);
+			skb_reset_mac_len(skb2);
+			skb_pull(skb2, skb2->mac_len + x->props.header_len);
+		} else {
+			/* skb2 mac and data are pointing at the start of
+			 * mac address. Pull data forward to point to tcp hdr
+			 */
+			 __skb_pull(skb2, skb2->transport_header - skb2->mac_header);
+
+			 /* move transport_header to point to esp header */
+			 skb2->transport_header -= x->props.header_len;
+		}
+
+		/* Set up eshp->seq_no to be used by esp_output()
+		 * for initializing trailer.
+		 */
+		ip_esp_hdr(skb2)->seq_no = proto;
+
+		err = dst->dev->xfrmdev_ops->xdo_dev_prepare(skb2);
+		if (err) {
+			kfree_skb_list(segs);
+			return ERR_PTR(err);
+		}
+
+		skb_push(skb2, skb2->mac_len);
+		skb2 = nskb;
+	} while (skb2);
+
+out:
+	return segs;
+}
+
 static const struct net_offload esp4_offload = {
 	.callbacks = {
 		.gro_receive = esp4_gro_receive,
 		.gro_complete = esp4_gro_complete,
+		.gso_segment = esp4_gso_segment,
 	},
 };
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 9864a2d..e67981e 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -97,6 +97,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 			       SKB_GSO_UDP_TUNNEL |
 			       SKB_GSO_UDP_TUNNEL_CSUM |
 			       SKB_GSO_TUNNEL_REMCSUM |
+			       SKB_GSO_ESP |
 			       0) ||
 			     !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
 			goto out;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 09/13] esp: Avoid skb_cow_data whenever possible
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (7 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 08/13] esp4: Add a software GSO codepath Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 10/13] xfrm: Add basic infrastructure for IPsec device offloading Steffen Klassert
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

If we are allowed to write the buffer and have enough free
space on the lineaer part of the buffer, we add the IPsec
tailbit to it. If there is no space on the linare part
but we are allowed to write, we add a page fragment with
the tailbits to the buffer.

With this, we can avoid a linearization of the buffer
whenever we are allowed to write on it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/xfrm.h |   2 +
 net/ipv4/esp4.c    | 125 ++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 111 insertions(+), 16 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index a33ceb7..7939c39 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -217,6 +217,8 @@ struct xfrm_state {
 	/* Last used time */
 	unsigned long		lastused;
 
+	struct page_frag xfrag;
+
 	/* Reference to data common to all the instances of this
 	 * transformer. */
 	const struct xfrm_type	*type;
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 550323d..b702467 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -18,6 +18,8 @@
 #include <net/protocol.h>
 #include <net/udp.h>
 
+#include <linux/highmem.h>
+
 struct esp_skb_cb {
 	struct xfrm_skb_cb xfrm;
 	void *tmp;
@@ -61,6 +63,7 @@ static inline __be32 *esp_tmp_seqhi(void *tmp)
 {
 	return PTR_ALIGN((__be32 *)tmp, __alignof__(__be32));
 }
+
 static inline u8 *esp_tmp_iv(struct crypto_aead *aead, void *tmp, int seqhilen)
 {
 	return crypto_aead_ivsize(aead) ?
@@ -192,15 +195,17 @@ error:
 
 static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 {
-	int err;
+	int err = -ENOMEM;
 	struct ip_esp_hdr *esph;
 	struct crypto_aead *aead;
 	struct aead_request *req;
 	struct scatterlist *sg;
 	struct sk_buff *trailer;
+	struct page *page;
 	void *tmp;
 	u8 *iv;
 	u8 *tail;
+	u8 *vaddr;
 	int blksize;
 	int clen;
 	int alen;
@@ -232,12 +237,6 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 	blksize = ALIGN(crypto_aead_blocksize(aead), 4);
 	clen = ALIGN(skb->len + 2 + tfclen, blksize);
 	plen = clen - skb->len - tfclen;
-
-	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
-	if (err < 0)
-		goto error;
-	nfrags = err;
-
 	assoclen = sizeof(*esph);
 	seqhilen = 0;
 	proto = ip_esp_hdr(skb)->seq_no;
@@ -247,19 +246,100 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 		assoclen += seqhilen;
 	}
 
-	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
-	if (!tmp) {
-		err = -ENOMEM;
-		goto error;
+	if (!(skb_shinfo(skb)->tx_flags & SKBTX_SHARED_FRAG) && !skb_cloned(skb)) {
+		if (tfclen + plen + alen <= skb_availroom(skb)) {
+			nfrags = 1;
+			trailer = skb;
+			tail = skb_tail_pointer(trailer);
+
+			/* Fill padding... */
+			if (tfclen) {
+				memset(tail, 0, tfclen);
+				tail += tfclen;
+			}
+			do {
+				int i;
+				for (i = 0; i < plen - 2; i++)
+					tail[i] = i + 1;
+			} while (0);
+			tail[plen - 2] = plen - 2;
+			if (!skb->hw_xfrm)
+				tail[plen - 1] = *skb_mac_header(skb);
+			else
+				tail[plen - 1] = proto;
+
+			pskb_put(skb, trailer, clen - skb->len + alen);
+
+			goto skip_cow;
+
+		} else if ((skb_shinfo(skb)->nr_frags < MAX_SKB_FRAGS)
+			   && !skb_has_frag_list(skb)) {
+			int allocsize;
+			struct sock *sk = skb->sk;
+			struct page_frag *pfrag = &x->xfrag;
+
+			allocsize = ALIGN(tfclen + plen + alen, L1_CACHE_BYTES);
+
+			spin_lock_bh(&x->lock);
+
+			if (unlikely(!skb_page_frag_refill(allocsize, pfrag, GFP_ATOMIC))) {
+				spin_unlock_bh(&x->lock);
+				goto cow;
+			}
+
+			page = pfrag->page;
+			get_page(page);
+
+			vaddr = kmap_atomic(page);
+
+			tail = vaddr + pfrag->offset;
+
+			/* Fill padding... */
+			if (tfclen) {
+				memset(tail, 0, tfclen);
+				tail += tfclen;
+			}
+			do {
+				int i;
+				for (i = 0; i < plen - 2; i++)
+					tail[i] = i + 1;
+			} while (0);
+			tail[plen - 2] = plen - 2;
+			if (!skb->hw_xfrm)
+				tail[plen - 1] = *skb_mac_header(skb);
+			else
+				tail[plen - 1] = proto;
+
+			kunmap_atomic(vaddr);
+
+			nfrags = skb_shinfo(skb)->nr_frags;
+
+			__skb_fill_page_desc(skb, nfrags, page, pfrag->offset, tfclen + plen + alen);
+			skb_shinfo(skb)->nr_frags = ++nfrags;
+
+			pfrag->offset = pfrag->offset + allocsize;
+			nfrags++;
+
+			skb->len += tfclen + plen + alen;
+			skb->data_len += tfclen + plen + alen;
+			skb->truesize += tfclen + plen + alen;
+			if (sk)
+				atomic_add(tfclen + plen + alen, &sk->sk_wmem_alloc);
+
+			spin_unlock_bh(&x->lock);
+
+			goto skip_cow;
+		}
 	}
 
-	seqhi = esp_tmp_seqhi(tmp);
-	iv = esp_tmp_iv(aead, tmp, seqhilen);
-	req = esp_tmp_req(aead, iv);
-	sg = esp_req_sg(aead, req);
+cow:
+	err = skb_cow_data(skb, tfclen + plen + alen, &trailer);
+	if (err < 0)
+		goto error;
+	nfrags = err;
+	tail = skb_tail_pointer(trailer);
 
 	/* Fill padding... */
-	tail = skb_tail_pointer(trailer);
 	if (tfclen) {
 		memset(tail, 0, tfclen);
 		tail += tfclen;
@@ -277,6 +357,19 @@ static int esp_output(struct xfrm_state *x, struct sk_buff *skb)
 
 	pskb_put(skb, trailer, clen - skb->len + alen);
 
+skip_cow:
+	tmp = esp_alloc_tmp(aead, nfrags, seqhilen);
+	if (!tmp) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	seqhi = esp_tmp_seqhi(tmp);
+	iv = esp_tmp_iv(aead, tmp, seqhilen);
+	req = esp_tmp_req(aead, iv);
+	sg = esp_req_sg(aead, req);
+
+
 	skb_push(skb, -skb_network_offset(skb));
 	esph = ip_esp_hdr(skb);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 10/13] xfrm: Add basic infrastructure for IPsec device offloading
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (8 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 09/13] esp: Avoid skb_cow_data whenever possible Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 11/13] net: Enable IPsec software GSO Steffen Klassert
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch fills the IPsec device offloading callbacks for
software GSO.

We handle async crypto with the xfrm_dev_resume() function.
This tries to do a direct call to dev_hard_start_xmit().
If the netdevice is busy, we defere the transmit to the
NET_TX_SOFTIRQ softirq.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |   4 +
 net/core/dev.c            |   1 +
 net/xfrm/xfrm_device.c    | 207 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 212 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index adbca16..d049c02 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2679,6 +2679,10 @@ struct softnet_data {
 	struct Qdisc		**output_queue_tailp;
 	struct sk_buff		*completion_queue;
 
+#ifdef CONFIG_XFRM
+	struct sk_buff_head	xfrm_backlog;
+#endif
+
 #ifdef CONFIG_RPS
 	/* Elements below can be accessed between CPUs for RPS */
 	struct call_single_data	csd ____cacheline_aligned_in_smp;
diff --git a/net/core/dev.c b/net/core/dev.c
index 1a456ea..f083cbb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8078,6 +8078,7 @@ static int __init net_dev_init(void)
 
 		skb_queue_head_init(&sd->input_pkt_queue);
 		skb_queue_head_init(&sd->process_queue);
+		skb_queue_head_init(&sd->xfrm_backlog);
 		INIT_LIST_HEAD(&sd->poll_list);
 		sd->output_queue_tailp = &sd->output_queue;
 #ifdef CONFIG_RPS
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 34a260a..2c68502 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -22,11 +22,218 @@
 #include <net/xfrm.h>
 #include <linux/notifier.h>
 
+static void xfrm_dev_resume(struct sk_buff *skb, int err)
+{
+	int ret = NETDEV_TX_BUSY;
+	unsigned long flags;
+	struct netdev_queue *txq;
+	struct softnet_data *sd;
+	struct xfrm_state *x = skb_dst(skb)->xfrm;
+	struct net_device *dev = skb->dev;
+
+	if (err) {
+		XFRM_INC_STATS(xs_net(x), LINUX_MIB_XFRMOUTSTATEPROTOERROR);
+		return;
+	}
+
+	txq = netdev_pick_tx(dev, skb, NULL);
+
+	HARD_TX_LOCK(dev, txq, smp_processor_id());
+	if (!netif_xmit_frozen_or_stopped(txq))
+		skb = dev_hard_start_xmit(skb, dev, txq, &ret);
+	HARD_TX_UNLOCK(dev, txq);
+
+	if (!dev_xmit_complete(ret)) {
+		local_irq_save(flags);
+		sd = this_cpu_ptr(&softnet_data);
+		skb_queue_tail(&sd->xfrm_backlog, skb);
+		raise_softirq_irqoff(NET_TX_SOFTIRQ);
+		local_irq_restore(flags);
+	}
+}
+
+void xfrm_dev_backlog(struct sk_buff_head *xfrm_backlog)
+{
+	struct sk_buff *skb;
+	struct sk_buff_head list;
+
+	__skb_queue_head_init(&list);
+
+	spin_lock(&xfrm_backlog->lock);
+	skb_queue_splice_init(xfrm_backlog, &list);
+	spin_unlock(&xfrm_backlog->lock);
+
+	while (!skb_queue_empty(&list)) {
+		skb = __skb_dequeue(&list);
+		xfrm_dev_resume(skb, 0);
+	}
+
+}
+
+static int xfrm_dev_validate(struct sk_buff *skb)
+{
+	struct xfrm_state *x = skb_dst(skb)->xfrm;
+
+	return x->type->output_tail(x, skb);
+}
+
+static int xfrm_skb_check_space(struct sk_buff *skb, struct dst_entry *dst)
+{
+	int nhead = dst->header_len + LL_RESERVED_SPACE(dst->dev)
+		- skb_headroom(skb);
+	int ntail =  0;
+
+	if (!(skb_shinfo(skb)->gso_type & SKB_GSO_ESP))
+		ntail = dst->dev->needed_tailroom - skb_tailroom(skb);
+
+	if (nhead <= 0) {
+		if (ntail <= 0)
+			return 0;
+		nhead = 0;
+	} else if (ntail < 0)
+		ntail = 0;
+
+	return pskb_expand_head(skb, nhead, ntail, GFP_ATOMIC);
+}
+
+static int xfrm_dev_prepare(struct sk_buff *skb)
+{
+	int err;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
+	struct net *net = xs_net(x);
+
+	do {
+		spin_lock_bh(&x->lock);
+
+		if (unlikely(x->km.state != XFRM_STATE_VALID)) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEINVALID);
+			err = -EINVAL;
+			goto error;
+		}
+
+		err = xfrm_state_check_expire(x);
+		if (err) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEEXPIRED);
+			goto error;
+		}
+
+		err = x->repl->overflow(x, skb);
+		if (err) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATESEQERROR);
+			goto error;
+		}
+
+		x->curlft.bytes += skb->len;
+		x->curlft.packets++;
+
+		spin_unlock_bh(&x->lock);
+
+		skb_dst_force(skb);
+
+		skb->hw_xfrm = 1;
+
+		err = x->type->output(x, skb);
+		if (err == -EINPROGRESS)
+			goto out;
+
+		if (err) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEPROTOERROR);
+			goto error_nolock;
+		}
+
+		dst = dst->child;
+		if (!dst) {
+			XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+			err = -EHOSTUNREACH;
+			goto error_nolock;
+		}
+		x = dst->xfrm;
+	} while (x && !(x->outer_mode->flags & XFRM_MODE_FLAG_TUNNEL));
+
+	return 0;
+
+error:
+	spin_unlock_bh(&x->lock);
+error_nolock:
+	kfree_skb(skb);
+out:
+	return err;
+}
+
+static int xfrm_dev_encap(struct sk_buff *skb)
+{
+	int err;
+	struct dst_entry *dst = skb_dst(skb);
+	struct dst_entry *path = dst->path;
+	struct xfrm_state *x = dst->xfrm;
+	struct net *net = xs_net(x);
+
+	err = xfrm_skb_check_space(skb, dst);
+	if (err) {
+		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+		return err;
+	}
+
+	err = x->outer_mode->output(x, skb);
+	if (err) {
+		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTSTATEMODEERROR);
+		return err;
+	}
+
+	x->type->encap(x, skb);
+
+	return path->output(net, skb->sk, skb);
+}
+
+static const struct xfrmdev_ops xfrmdev_soft_ops = {
+	.xdo_dev_encap	= xfrm_dev_encap,
+	.xdo_dev_prepare = xfrm_dev_prepare,
+	.xdo_dev_validate = xfrm_dev_validate,
+	.xdo_dev_resume = xfrm_dev_resume,
+};
+
+static int xfrm_dev_register(struct net_device *dev)
+{
+	if (dev->hw_features & NETIF_F_ESP_OFFLOAD)
+		goto out;
+
+	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
+
+	dev->xfrmdev_ops = &xfrmdev_soft_ops;
+out:
+	return NOTIFY_DONE;
+}
+
+static int xfrm_dev_unregister(struct net_device *dev)
+{
+
+	return NOTIFY_DONE;
+}
+
+static int xfrm_dev_feat_change(struct net_device *dev)
+{
+	if (!(dev->hw_features & NETIF_F_ESP_OFFLOAD) &&
+	    dev->features & NETIF_F_ESP_OFFLOAD)
+		dev->xfrmdev_ops = &xfrmdev_soft_ops;
+
+	return NOTIFY_DONE;
+}
+
 static int xfrm_dev_event(struct notifier_block *this, unsigned long event, void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
 
 	switch (event) {
+	case NETDEV_REGISTER:
+		return xfrm_dev_register(dev);
+
+	case NETDEV_UNREGISTER:
+		return xfrm_dev_unregister(dev);
+
+	case NETDEV_FEAT_CHANGE:
+		return xfrm_dev_feat_change(dev);
+
 	case NETDEV_DOWN:
 		xfrm_garbage_collect(dev_net(dev));
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 11/13] net: Enable IPsec software GSO.
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (9 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 10/13] xfrm: Add basic infrastructure for IPsec device offloading Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 12/13] crypto: Make the page handling of hash walk compatible to networking Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 13/13] net: Allow IPsec GSO for locally sent traffic Steffen Klassert
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch hooks the IPsec GSO code into the generic
network stack.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |  2 +-
 include/net/xfrm.h        |  1 +
 net/core/dev.c            | 40 +++++++++++++++++++++++++++++++++++-----
 net/ipv4/ip_output.c      |  8 +++++---
 net/ipv4/xfrm4_output.c   |  2 +-
 net/sched/sch_generic.c   |  2 +-
 net/xfrm/xfrm_output.c    | 14 +++++++++++++-
 7 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d049c02..659eeec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3192,7 +3192,7 @@ int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
 			   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
-struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
+struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, int *ret);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 7939c39..3a69883 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1505,6 +1505,7 @@ struct xfrmk_spdinfo {
 	u32 spdhmcnt;
 };
 
+void xfrm_dev_backlog(struct sk_buff_head *xfrm_backlog);
 struct xfrm_state *xfrm_find_acq_byseq(struct net *net, u32 mark, u32 seq);
 int xfrm_state_delete(struct xfrm_state *x);
 int xfrm_state_flush(struct net *net, u8 proto, bool task_valid);
diff --git a/net/core/dev.c b/net/core/dev.c
index f083cbb..611e93c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2911,9 +2911,10 @@ static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
 	return skb;
 }
 
-static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev)
+static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev, int *ret)
 {
 	netdev_features_t features;
+	int err = 0;
 
 	if (skb->next)
 		return skb;
@@ -2925,6 +2926,7 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
 
 	if (netif_needs_gso(skb, features)) {
 		struct sk_buff *segs;
+		struct sk_buff *skb2;
 
 		segs = skb_gso_segment(skb, features);
 		if (IS_ERR(segs)) {
@@ -2932,7 +2934,25 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
 		} else if (segs) {
 			consume_skb(skb);
 			skb = segs;
+
+			if  (skb->hw_xfrm) {
+				do {
+					skb2 = segs->next;
+					segs->next = NULL;
+
+					err = dev->xfrmdev_ops->xdo_dev_validate(segs);
+					if (!err)
+						segs->next = skb2;
+					else if (err != -EINPROGRESS)
+						kfree_skb(segs);
+					else if (skb == segs)
+						skb = skb2;
+
+					segs = skb2;
+				} while (segs);
+			}
 		}
+
 	} else {
 		if (skb_needs_linearize(skb, features) &&
 		    __skb_linearize(skb))
@@ -2955,6 +2975,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
 		}
 	}
 
+	if ((err == -EINPROGRESS) && !skb)
+		*ret = NETDEV_TX_OK;
+
 	return skb;
 
 out_kfree_skb:
@@ -2963,7 +2986,7 @@ out_null:
 	return NULL;
 }
 
-struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev)
+struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, int *ret)
 {
 	struct sk_buff *next, *head = NULL, *tail;
 
@@ -2974,7 +2997,7 @@ struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *d
 		/* in case skb wont be segmented, point to itself */
 		skb->prev = skb;
 
-		skb = validate_xmit_skb(skb, dev);
+		skb = validate_xmit_skb(skb, dev, ret);
 		if (!skb)
 			continue;
 
@@ -3347,8 +3370,10 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
 			if (__this_cpu_read(xmit_recursion) > RECURSION_LIMIT)
 				goto recursion_alert;
 
-			skb = validate_xmit_skb(skb, dev);
-			if (!skb)
+			skb = validate_xmit_skb(skb, dev, &rc);
+			if (!skb && rc == NETDEV_TX_OK)
+				goto out;
+			else if (!skb)
 				goto drop;
 
 			HARD_TX_LOCK(dev, txq, cpu);
@@ -3867,6 +3892,11 @@ static void net_tx_action(struct softirq_action *h)
 			}
 		}
 	}
+
+#ifdef CONFIG_XFRM
+	if (!skb_queue_empty(&sd->xfrm_backlog))
+			xfrm_dev_backlog(&sd->xfrm_backlog);
+#endif
 }
 
 #if (defined(CONFIG_BRIDGE) || defined(CONFIG_BRIDGE_MODULE)) && \
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 64878ef..0d75161 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -173,7 +173,7 @@ EXPORT_SYMBOL_GPL(ip_build_and_send_pkt);
 
 static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	struct dst_entry *dst = skb_dst(skb);
+	struct dst_entry *dst = skb_dst(skb)->path;
 	struct rtable *rt = (struct rtable *)dst;
 	struct net_device *dev = dst->dev;
 	unsigned int hh_len = LL_RESERVED_SPACE(dev);
@@ -269,7 +269,9 @@ static int ip_finish_output(struct net *net, struct sock *sk, struct sk_buff *sk
 
 #if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
 	/* Policy lookup after SNAT yielded a new policy */
-	if (skb_dst(skb)->xfrm) {
+	if (skb_dst(skb)->xfrm &&
+	    !((skb_dst(skb)->dev->features & NETIF_F_ESP_OFFLOAD) ||
+	      (skb_shinfo(skb)->gso_type & SKB_GSO_ESP))) {
 		IPCB(skb)->flags |= IPSKB_REROUTED;
 		return dst_output(net, sk, skb);
 	}
@@ -348,7 +350,7 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 
 int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	struct net_device *dev = skb_dst(skb)->dev;
+	struct net_device *dev = skb_dst(skb)->path->dev;
 
 	IP_UPD_PO_STATS(net, IPSTATS_MIB_OUT, skb->len);
 
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 7ee6518..14e42ba 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -29,7 +29,7 @@ static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 		goto out;
 
 	mtu = dst_mtu(skb_dst(skb));
-	if (skb->len > mtu) {
+	if ((!skb_is_gso(skb) && skb->len > mtu) || (skb_is_gso(skb) && skb_gso_network_seglen(skb) > ip_skb_dst_mtu(skb))) {
 		skb->protocol = htons(ETH_P_IP);
 
 		if (skb->sk)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 16bc83b..5b11424 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -157,7 +157,7 @@ int sch_direct_xmit(struct sk_buff *skb, struct Qdisc *q,
 
 	/* Note that we validate skb (GSO, checksum, ...) outside of locks */
 	if (validate)
-		skb = validate_xmit_skb_list(skb, dev);
+		skb = validate_xmit_skb_list(skb, dev, &ret);
 
 	if (skb) {
 		HARD_TX_LOCK(dev, txq, smp_processor_id());
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index ff4a91f..ad452e0 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -196,9 +196,21 @@ static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb
 
 int xfrm_output(struct sock *sk, struct sk_buff *skb)
 {
-	struct net *net = dev_net(skb_dst(skb)->dev);
+	struct net_device *dev = skb_dst(skb)->dev;
+	struct net *net = dev_net(dev);
 	int err;
 
+	if ((dev->features & NETIF_F_ESP_OFFLOAD) || skb_is_gso(skb)) {
+		err = skb_dst(skb)->ops->local_out(net, skb->sk, skb);
+		if (unlikely(err != 1))
+			return err;
+
+		if (skb_is_gso(skb))
+			skb_shinfo(skb)->gso_type |= SKB_GSO_ESP;
+
+		return dev->xfrmdev_ops->xdo_dev_encap(skb);
+	}
+
 	if (skb_is_gso(skb))
 		return xfrm_output_gso(net, sk, skb);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 12/13] crypto: Make the page handling of hash walk compatible to networking.
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (10 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 11/13] net: Enable IPsec software GSO Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  2016-02-04  6:37 ` [PATCH RFC 13/13] net: Allow IPsec GSO for locally sent traffic Steffen Klassert
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

The network layer tries to allocate high order pages for skb_buff
fragments, this leads to problems if we pass such a buffer to
crypto because crypto assumes to have always order null pages
in the scatterlists.

This was not a problem so far, bacause the network stack linearized
all buffers before passing them to crypto. Due to a recent change to
IPsec "esp: Avoid skb_cow_data whenever possible" the linearization
is avoided in most of the cases what made this incompatibility
visible.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 crypto/ahash.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index d19b523..a0eb4e6 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -44,8 +44,7 @@ static int hash_walk_next(struct crypto_hash_walk *walk)
 {
 	unsigned int alignmask = walk->alignmask;
 	unsigned int offset = walk->offset;
-	unsigned int nbytes = min(walk->entrylen,
-				  ((unsigned int)(PAGE_SIZE)) - offset);
+	unsigned int nbytes = walk->entrylen;
 
 	if (walk->flags & CRYPTO_ALG_ASYNC)
 		walk->data = kmap(walk->pg);
@@ -91,8 +90,6 @@ int crypto_hash_walk_done(struct crypto_hash_walk *walk, int err)
 		walk->offset = ALIGN(walk->offset, alignmask + 1);
 		walk->data += walk->offset;
 
-		nbytes = min(nbytes,
-			     ((unsigned int)(PAGE_SIZE)) - walk->offset);
 		walk->entrylen -= nbytes;
 
 		return nbytes;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH RFC 13/13] net: Allow IPsec GSO for locally sent traffic.
  2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
                   ` (11 preceding siblings ...)
  2016-02-04  6:37 ` [PATCH RFC 12/13] crypto: Make the page handling of hash walk compatible to networking Steffen Klassert
@ 2016-02-04  6:37 ` Steffen Klassert
  12 siblings, 0 replies; 14+ messages in thread
From: Steffen Klassert @ 2016-02-04  6:37 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, sowmini.varadhan

This patch finally allows locally sent IPsec packets to
use the GSO codepath.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/sock.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 6c1c8bc..8fca8b0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1583,13 +1583,9 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
 	sk->sk_route_caps &= ~sk->sk_route_nocaps;
 	if (sk_can_gso(sk)) {
-		if (dst->header_len) {
-			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
-		} else {
-			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
-			sk->sk_gso_max_size = dst->dev->gso_max_size;
-			max_segs = max_t(u32, dst->dev->gso_max_segs, 1);
-		}
+		sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
+		sk->sk_gso_max_size = dst->dev->gso_max_size;
+		max_segs = max_t(u32, dst->dev->gso_max_segs, 1);
 	}
 	sk->sk_gso_max_segs = max_segs;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-04  7:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04  6:36 [PATCH RFC] IPsec performance improvements (discussion base for the IPsec performance BoF) Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 01/13] net: allow to leave the buffer fragmented in skb_cow_data() Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 02/13] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 03/13] esp: Add a software GRO codepath Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 04/13] xfrm: Move device notifications to a sepatate file Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 05/13] xfrm: Add callbacks for IPsec GSO offloading Steffen Klassert
2016-02-04  6:36 ` [PATCH RFC 06/13] net: Add xfrm offload callbacks to struct net_device Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 07/13] net: Add ESP offload features Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 08/13] esp4: Add a software GSO codepath Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 09/13] esp: Avoid skb_cow_data whenever possible Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 10/13] xfrm: Add basic infrastructure for IPsec device offloading Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 11/13] net: Enable IPsec software GSO Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 12/13] crypto: Make the page handling of hash walk compatible to networking Steffen Klassert
2016-02-04  6:37 ` [PATCH RFC 13/13] net: Allow IPsec GSO for locally sent traffic Steffen Klassert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.