All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] IPsec improvements
@ 2013-11-05 13:54 Mathias Krause
  2013-11-05 13:54 ` [PATCH net-next 1/3] net: move pskb_put() to core code Mathias Krause
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-05 13:54 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert, Herbert Xu
  Cc: Dmitry Tarnyagin, netdev, Mathias Krause

This series moves pskb_put() to the core code -- making the code
duplication in caif obsolete (patches 1 and 2).

Additionally does this series optimize the IPsec receive path in patch 3
by allowing skb_cow_data() to leave the buffer fragmented. I noticed the
linearization to be a bottleneck when doing some VPN gateway benchmarks.
Linearization of the buffer isn't needed in the receive path as the
crypto API (and all other users of skb_cow_data) can handle sg.

With patch 3 applied I was able to increase the throughput of an IPsec
gateway from 7.12 Gbit/s to 7.28 Gbit/s.


Please apply!

Mathias Krause (3):
  net: move pskb_put() to core code
  caif: use pskb_put() instead of reimplementing its functionality
  net: allow to leave the buffer fragmented in skb_cow_data()

 include/linux/skbuff.h  |    1 +
 include/net/esp.h       |    2 -
 net/caif/cfpkt_skbuff.c |   12 +----------
 net/core/skbuff.c       |   52 +++++++++++++++++++++++++++++++++++++----------
 net/xfrm/xfrm_algo.c    |   13 -----------
 5 files changed, 43 insertions(+), 37 deletions(-)

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/3] net: move pskb_put() to core code
  2013-11-05 13:54 [PATCH net-next 0/3] IPsec improvements Mathias Krause
@ 2013-11-05 13:54 ` Mathias Krause
  2013-11-05 18:33   ` Ben Hutchings
  2013-11-05 13:54 ` [PATCH net-next 2/3] caif: use pskb_put() instead of reimplementing its functionality Mathias Krause
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Mathias Krause @ 2013-11-05 13:54 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert, Herbert Xu
  Cc: Dmitry Tarnyagin, netdev, Mathias Krause, Steffen Klassert,
	David S. Miller, Herbert Xu

This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 include/linux/skbuff.h |    1 +
 include/net/esp.h      |    2 --
 net/core/skbuff.c      |   23 +++++++++++++++++++++++
 net/xfrm/xfrm_algo.c   |   13 -------------
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c2d8933..9d785f4d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1418,6 +1418,7 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset)
 /*
  *	Add data to an sk_buff
  */
+unsigned char *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len);
 unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
 static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
 {
diff --git a/include/net/esp.h b/include/net/esp.h
index d584513..5b4fdcc 100644
--- a/include/net/esp.h
+++ b/include/net/esp.h
@@ -13,8 +13,6 @@ struct esp_data {
 
 #include <linux/skbuff.h>
 
-void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len);
-
 struct ip_esp_hdr;
 
 static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff1..247023d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1257,6 +1257,29 @@ free_skb:
 EXPORT_SYMBOL(skb_pad);
 
 /**
+ *	pskb_put - add data to the tail of a potentially fragmented buffer
+ *	@skb: start of the buffer to use
+ *	@tail: tail fragment of the buffer to use
+ *	@len: amount of data to add
+ *
+ *	This function extends the used data area of the potentially
+ *	fragmented buffer. &tail must be the last fragment of &skb -- or
+ *	&skb itself. If this would exceed the total buffer size the kernel
+ *	will panic. A pointer to the first byte of the extra data is
+ *	returned.
+ */
+
+unsigned char *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len)
+{
+	if (tail != skb) {
+		skb->data_len += len;
+		skb->len += len;
+	}
+	return skb_put(tail, len);
+}
+EXPORT_SYMBOL_GPL(pskb_put);
+
+/**
  *	skb_put - add data to a buffer
  *	@skb: buffer to use
  *	@len: amount of data to add
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index ab4ef72..debe733 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -802,17 +802,4 @@ int xfrm_count_pfkey_enc_supported(void)
 }
 EXPORT_SYMBOL_GPL(xfrm_count_pfkey_enc_supported);
 
-#if defined(CONFIG_INET_ESP) || defined(CONFIG_INET_ESP_MODULE) || defined(CONFIG_INET6_ESP) || defined(CONFIG_INET6_ESP_MODULE)
-
-void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len)
-{
-	if (tail != skb) {
-		skb->data_len += len;
-		skb->len += len;
-	}
-	return skb_put(tail, len);
-}
-EXPORT_SYMBOL_GPL(pskb_put);
-#endif
-
 MODULE_LICENSE("GPL");
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/3] caif: use pskb_put() instead of reimplementing its functionality
  2013-11-05 13:54 [PATCH net-next 0/3] IPsec improvements Mathias Krause
  2013-11-05 13:54 ` [PATCH net-next 1/3] net: move pskb_put() to core code Mathias Krause
@ 2013-11-05 13:54 ` Mathias Krause
  2013-11-05 13:54 ` [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data() Mathias Krause
  2013-11-06  4:48 ` [PATCH net-next 0/3] IPsec improvements David Miller
  3 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-05 13:54 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert, Herbert Xu
  Cc: Dmitry Tarnyagin, netdev, Mathias Krause, David S. Miller

Also remove the warning for fragmented packets -- skb_cow_data() will
linearize the buffer, removing all fragments.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
---
 net/caif/cfpkt_skbuff.c |   12 +-----------
 1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/net/caif/cfpkt_skbuff.c b/net/caif/cfpkt_skbuff.c
index 6493351..1be0b52 100644
--- a/net/caif/cfpkt_skbuff.c
+++ b/net/caif/cfpkt_skbuff.c
@@ -203,20 +203,10 @@ int cfpkt_add_body(struct cfpkt *pkt, const void *data, u16 len)
 			PKT_ERROR(pkt, "cow failed\n");
 			return -EPROTO;
 		}
-		/*
-		 * Is the SKB non-linear after skb_cow_data()? If so, we are
-		 * going to add data to the last SKB, so we need to adjust
-		 * lengths of the top SKB.
-		 */
-		if (lastskb != skb) {
-			pr_warn("Packet is non-linear\n");
-			skb->len += len;
-			skb->data_len += len;
-		}
 	}
 
 	/* All set to put the last SKB and optionally write data there. */
-	to = skb_put(lastskb, len);
+	to = pskb_put(skb, lastskb, len);
 	if (likely(data))
 		memcpy(to, data, len);
 	return 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-05 13:54 [PATCH net-next 0/3] IPsec improvements Mathias Krause
  2013-11-05 13:54 ` [PATCH net-next 1/3] net: move pskb_put() to core code Mathias Krause
  2013-11-05 13:54 ` [PATCH net-next 2/3] caif: use pskb_put() instead of reimplementing its functionality Mathias Krause
@ 2013-11-05 13:54 ` Mathias Krause
  2013-11-05 14:33   ` Eric Dumazet
  2013-11-06  9:30   ` Herbert Xu
  2013-11-06  4:48 ` [PATCH net-next 0/3] IPsec improvements David Miller
  3 siblings, 2 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-05 13:54 UTC (permalink / raw)
  To: David S. Miller, Steffen Klassert, Herbert Xu
  Cc: Dmitry Tarnyagin, netdev, Mathias Krause, David S. Miller, Herbert Xu

Do not linearize the buffer per se but only if we're expected to expand
the tail. All callers can handle fragmented buffers and even expect
them!

Not linearizing the buffer leads to a small performance improvement for
the IPsec receive path in case the network driver passed us a fragmented
buffer.

With this patch applied I was able to increase the throughput of an
IPsec gateway from 7.12 Gbit/s to 7.28 Gbit/s.

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 net/core/skbuff.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 247023d..5eec1b9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3207,7 +3207,7 @@ EXPORT_SYMBOL_GPL(skb_to_sgvec);
  *
  *	If @tailbits is given, make sure that there is space to write @tailbits
  *	bytes of data beyond current end of socket buffer.  @trailer will be
- *	set to point to the skb in which this space begins.
+ *	linearized and set to point to the skb in which this space begins.
  *
  *	The number of scatterlist elements required to completely map the
  *	COW'd and extended socket buffer will be returned.
@@ -3218,11 +3218,10 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 	int elt;
 	struct sk_buff *skb1, **skb_p;
 
-	/* If skb is cloned or its head is paged, reallocate
-	 * head pulling out all the pages (pages are considered not writable
-	 * at the moment even if they are anonymous).
+	/* If skb is cloned reallocate head pulling out all the pages (pages are
+	 * considered not writable at the moment even if they are anonymous).
 	 */
-	if ((skb_cloned(skb) || skb_shinfo(skb)->nr_frags) &&
+	if (skb_cloned(skb) &&
 	    __pskb_pull_tail(skb, skb_pagelen(skb)-skb_headlen(skb)) == NULL)
 		return -ENOMEM;
 
@@ -3233,18 +3232,26 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 		 * good frames. OK, on miss we reallocate and reserve even more
 		 * space, 128 bytes is fair. */
 
-		if (skb_tailroom(skb) < tailbits &&
-		    pskb_expand_head(skb, 0, tailbits-skb_tailroom(skb)+128, GFP_ATOMIC))
-			return -ENOMEM;
+		if (tailbits) {
+			if (skb_linearize(skb))
+				return -ENOMEM;
+
+			if (skb_tailroom(skb) < tailbits) {
+				int ntail = tailbits - skb_tailroom(skb) + 128;
+
+				if (pskb_expand_head(skb, 0, ntail, GFP_ATOMIC))
+					return -ENOMEM;
+			}
+		}
 
 		/* Voila! */
 		*trailer = skb;
-		return 1;
+		return skb_shinfo(skb)->nr_frags + 1;
 	}
 
 	/* Misery. We are in troubles, going to mincer fragments... */
 
-	elt = 1;
+	elt = skb_shinfo(skb)->nr_frags + 1;
 	skb_p = &skb_shinfo(skb)->frag_list;
 	copyflag = 0;
 
@@ -3296,7 +3303,7 @@ int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
 			kfree_skb(skb1);
 			skb1 = skb2;
 		}
-		elt++;
+		elt += skb_shinfo(skb1)->nr_frags + 1;
 		*trailer = skb1;
 		skb_p = &skb1->next;
 	}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-05 13:54 ` [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data() Mathias Krause
@ 2013-11-05 14:33   ` Eric Dumazet
  2013-11-05 14:43     ` Mathias Krause
  2013-11-06  9:30   ` Herbert Xu
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-05 14:33 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Herbert Xu, Dmitry Tarnyagin, netdev

On Tue, 2013-11-05 at 14:54 +0100, Mathias Krause wrote:
> Do not linearize the buffer per se but only if we're expected to expand
> the tail. All callers can handle fragmented buffers and even expect
> them!
> 
> Not linearizing the buffer leads to a small performance improvement for
> the IPsec receive path in case the network driver passed us a fragmented
> buffer.
> 
> With this patch applied I was able to increase the throughput of an
> IPsec gateway from 7.12 Gbit/s to 7.28 Gbit/s.

What is the driver you used ?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-05 14:33   ` Eric Dumazet
@ 2013-11-05 14:43     ` Mathias Krause
  0 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-05 14:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Steffen Klassert, Herbert Xu, Dmitry Tarnyagin, netdev

On 05.11.2013 15:33, Eric Dumazet wrote:
> On Tue, 2013-11-05 at 14:54 +0100, Mathias Krause wrote:
>> Do not linearize the buffer per se but only if we're expected to expand
>> the tail. All callers can handle fragmented buffers and even expect
>> them!
>>
>> Not linearizing the buffer leads to a small performance improvement for
>> the IPsec receive path in case the network driver passed us a fragmented
>> buffer.
>>
>> With this patch applied I was able to increase the throughput of an
>> IPsec gateway from 7.12 Gbit/s to 7.28 Gbit/s.
> 
> What is the driver you used ?

The device driver is ixgbe driving an Intel X520-T2. Why you're asking?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/3] net: move pskb_put() to core code
  2013-11-05 13:54 ` [PATCH net-next 1/3] net: move pskb_put() to core code Mathias Krause
@ 2013-11-05 18:33   ` Ben Hutchings
  2013-11-06  9:01     ` Mathias Krause
  0 siblings, 1 reply; 20+ messages in thread
From: Ben Hutchings @ 2013-11-05 18:33 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Herbert Xu, Dmitry Tarnyagin, netdev

On Tue, 2013-11-05 at 14:54 +0100, Mathias Krause wrote:
> This function has usage beside IPsec so move it to the core skbuff code.
> While doing so, give it some documentation and change its return type to
> 'unsigned char *' to be in line with skb_put().
[...]
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1257,6 +1257,29 @@ free_skb:
>  EXPORT_SYMBOL(skb_pad);
>  
>  /**
> + *	pskb_put - add data to the tail of a potentially fragmented buffer
> + *	@skb: start of the buffer to use
> + *	@tail: tail fragment of the buffer to use
> + *	@len: amount of data to add
> + *
> + *	This function extends the used data area of the potentially
> + *	fragmented buffer. &tail must be the last fragment of &skb -- or
> + *	&skb itself. If this would exceed the total buffer size the kernel

Keep using @ to refer to the parameters in this description.

Ben.

> + *	will panic. A pointer to the first byte of the extra data is
> + *	returned.
> + */
[...]

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/3] IPsec improvements
  2013-11-05 13:54 [PATCH net-next 0/3] IPsec improvements Mathias Krause
                   ` (2 preceding siblings ...)
  2013-11-05 13:54 ` [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data() Mathias Krause
@ 2013-11-06  4:48 ` David Miller
  2013-11-06  9:02   ` Mathias Krause
  3 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2013-11-06  4:48 UTC (permalink / raw)
  To: mathias.krause; +Cc: steffen.klassert, herbert, dmitry.tarnyagin, netdev

From: Mathias Krause <mathias.krause@secunet.com>
Date: Tue,  5 Nov 2013 14:54:08 +0100

> This series moves pskb_put() to the core code -- making the code
> duplication in caif obsolete (patches 1 and 2).
> 
> Additionally does this series optimize the IPsec receive path in patch 3
> by allowing skb_cow_data() to leave the buffer fragmented. I noticed the
> linearization to be a bottleneck when doing some VPN gateway benchmarks.
> Linearization of the buffer isn't needed in the receive path as the
> crypto API (and all other users of skb_cow_data) can handle sg.
> 
> With patch 3 applied I was able to increase the throughput of an IPsec
> gateway from 7.12 Gbit/s to 7.28 Gbit/s.

Please deal with the feedback given to you by Ben about referring to
variables with the '@' prefix consistently in comments, then resend
this entire patch series.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 1/3] net: move pskb_put() to core code
  2013-11-05 18:33   ` Ben Hutchings
@ 2013-11-06  9:01     ` Mathias Krause
  0 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-06  9:01 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David S. Miller, Steffen Klassert, Herbert Xu, Dmitry Tarnyagin, netdev

On 05.11.2013 19:33, Ben Hutchings wrote:
> On Tue, 2013-11-05 at 14:54 +0100, Mathias Krause wrote:
>> This function has usage beside IPsec so move it to the core skbuff code.
>> While doing so, give it some documentation and change its return type to
>> 'unsigned char *' to be in line with skb_put().
> [...]
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -1257,6 +1257,29 @@ free_skb:
>>  EXPORT_SYMBOL(skb_pad);
>>  
>>  /**
>> + *	pskb_put - add data to the tail of a potentially fragmented buffer
>> + *	@skb: start of the buffer to use
>> + *	@tail: tail fragment of the buffer to use
>> + *	@len: amount of data to add
>> + *
>> + *	This function extends the used data area of the potentially
>> + *	fragmented buffer. &tail must be the last fragment of &skb -- or
>> + *	&skb itself. If this would exceed the total buffer size the kernel
> 
> Keep using @ to refer to the parameters in this description.
> 
> Ben.

Thanks Ben. Looks like I looked at the wrong examples on how to do it.
Will fix those as well.

> 
>> + *	will panic. A pointer to the first byte of the extra data is
>> + *	returned.
>> + */
> [...]
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/3] IPsec improvements
  2013-11-06  4:48 ` [PATCH net-next 0/3] IPsec improvements David Miller
@ 2013-11-06  9:02   ` Mathias Krause
  0 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-06  9:02 UTC (permalink / raw)
  To: David Miller; +Cc: steffen.klassert, herbert, dmitry.tarnyagin, netdev

On 06.11.2013 05:48, David Miller wrote:
> From: Mathias Krause <mathias.krause@secunet.com>
> Date: Tue,  5 Nov 2013 14:54:08 +0100
> 
>> This series moves pskb_put() to the core code -- making the code
>> duplication in caif obsolete (patches 1 and 2).
>>
>> Additionally does this series optimize the IPsec receive path in patch 3
>> by allowing skb_cow_data() to leave the buffer fragmented. I noticed the
>> linearization to be a bottleneck when doing some VPN gateway benchmarks.
>> Linearization of the buffer isn't needed in the receive path as the
>> crypto API (and all other users of skb_cow_data) can handle sg.
>>
>> With patch 3 applied I was able to increase the throughput of an IPsec
>> gateway from 7.12 Gbit/s to 7.28 Gbit/s.
> 
> Please deal with the feedback given to you by Ben about referring to
> variables with the '@' prefix consistently in comments, then resend
> this entire patch series.

Will do. Thanks, Dave.

> 
> Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-05 13:54 ` [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data() Mathias Krause
  2013-11-05 14:33   ` Eric Dumazet
@ 2013-11-06  9:30   ` Herbert Xu
  2013-11-06  9:49     ` Mathias Krause
  1 sibling, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2013-11-06  9:30 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On Tue, Nov 05, 2013 at 02:54:11PM +0100, Mathias Krause wrote:
>
> -	/* If skb is cloned or its head is paged, reallocate
> -	 * head pulling out all the pages (pages are considered not writable
> -	 * at the moment even if they are anonymous).
> +	/* If skb is cloned reallocate head pulling out all the pages (pages are
> +	 * considered not writable at the moment even if they are anonymous).
>  	 */

Hang on, you haven't explained why it is OK to write to pages.
What if said page is owned by the virt host or some app?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06  9:30   ` Herbert Xu
@ 2013-11-06  9:49     ` Mathias Krause
  2013-11-06  9:52       ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Mathias Krause @ 2013-11-06  9:49 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On 06.11.2013 10:30, Herbert Xu wrote:
> On Tue, Nov 05, 2013 at 02:54:11PM +0100, Mathias Krause wrote:
>> -	/* If skb is cloned or its head is paged, reallocate
>> -	 * head pulling out all the pages (pages are considered not writable
>> -	 * at the moment even if they are anonymous).
>> +	/* If skb is cloned reallocate head pulling out all the pages (pages are
>> +	 * considered not writable at the moment even if they are anonymous).
>>  	 */
> 
> Hang on, you haven't explained why it is OK to write to pages.

Why wouldn't it if the skb isn't cloned?

> What if said page is owned by the virt host or some app?

How would one detect such a case. I could image not by testing
skb_shinfo(skb)->nr_frags as it is right now?


Regards,
Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06  9:49     ` Mathias Krause
@ 2013-11-06  9:52       ` Herbert Xu
  2013-11-06 12:42         ` Mathias Krause
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2013-11-06  9:52 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On Wed, Nov 06, 2013 at 10:49:18AM +0100, Mathias Krause wrote:
> On 06.11.2013 10:30, Herbert Xu wrote:
>
> > Hang on, you haven't explained why it is OK to write to pages.
> 
> Why wouldn't it if the skb isn't cloned?

Because if it's owned by an entity outside our stack (e.g., virt
host or app) then the skb itself won't be cloned.

> > What if said page is owned by the virt host or some app?
> 
> How would one detect such a case. I could image not by testing
> skb_shinfo(skb)->nr_frags as it is right now?

You can't.  That's why we always copy.  If you want to do this
properly then we'll need to add at least a bit to indicate whether
the page originated from within the network stack and we have
full ownership.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06  9:52       ` Herbert Xu
@ 2013-11-06 12:42         ` Mathias Krause
  2013-11-06 12:48           ` Herbert Xu
  0 siblings, 1 reply; 20+ messages in thread
From: Mathias Krause @ 2013-11-06 12:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On 06.11.2013 10:52, Herbert Xu wrote:
> On Wed, Nov 06, 2013 at 10:49:18AM +0100, Mathias Krause wrote:
>> On 06.11.2013 10:30, Herbert Xu wrote:
>>
>>> Hang on, you haven't explained why it is OK to write to pages.
>> Why wouldn't it if the skb isn't cloned?
> 
> Because if it's owned by an entity outside our stack (e.g., virt
> host or app) then the skb itself won't be cloned.

Ok.

>>> What if said page is owned by the virt host or some app?
>> How would one detect such a case. I could image not by testing
>> skb_shinfo(skb)->nr_frags as it is right now?
> 
> You can't.  That's why we always copy.

Well, skb_cow_data() will only copy, i.e. call __pskb_pull_tail(), in
case the skb is either cloned or fragmented. As you already said it
won't be cloned in your case. Does it contain fragments, i.e. is
skb_shinfo(skb)->nr_frags != 0? If not, we won't copy with the current
code either.

We *will* copy, though, if we're expected to expand the tailroom. That
won't change in my patch as we'll linearize the skb in that case -- even
if there would enough room in the skb. That's needed to not hit the
SKB_LINEAR_ASSERT(skb) in skb_put().

> If you want to do this
> properly then we'll need to add at least a bit to indicate whether
> the page originated from within the network stack and we have
> full ownership.

Can you please explain why this would be needed? I still don't get the
reasoning behind "pages are considered not writable at the moment even
if they are anonymous".

skb_cow_data() is only used in those places:

- net/caif/cfpkt_skbuff.c
- net/ipv4/ah4.c
- net/ipv4/esp4.c
- net/ipv6/ah6.c
- net/ipv6/esp6.c
- net/rxrpc/rxkad.c

Can you explain how this change would be a problem for them?


Thanks,
Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06 12:42         ` Mathias Krause
@ 2013-11-06 12:48           ` Herbert Xu
  2013-11-06 16:39             ` Eric Dumazet
  2013-11-07  8:55             ` Mathias Krause
  0 siblings, 2 replies; 20+ messages in thread
From: Herbert Xu @ 2013-11-06 12:48 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On Wed, Nov 06, 2013 at 01:42:03PM +0100, Mathias Krause wrote:
>
> Well, skb_cow_data() will only copy, i.e. call __pskb_pull_tail(), in
> case the skb is either cloned or fragmented. As you already said it
> won't be cloned in your case. Does it contain fragments, i.e. is
> skb_shinfo(skb)->nr_frags != 0? If not, we won't copy with the current
> code either.

Whenever we say page it means nr_frags != 0.  So currently as
long as we have pages in our skb we will copy.  With your patch
we will no longer copy in the case where we have pages but the
skb isn't cloned.  In fact that is the whole point of your patch.

> Can you please explain why this would be needed? I still don't get the
> reasoning behind "pages are considered not writable at the moment even
> if they are anonymous".

As I said you don't know where the page in the skb came from.  It
may point to read-only memory or memory that's shared with another
task that isn't expecting things to change underneath it.

It may well turn out to most if not all cases of pages are safe to
be written to if skb_cloned == 0.  However, we'd need to do a full
audit of every source of page frags to be sure.  For example, you'd
need to look at net drivers and splice.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06 12:48           ` Herbert Xu
@ 2013-11-06 16:39             ` Eric Dumazet
  2013-11-07  8:56               ` Mathias Krause
  2013-11-07  8:55             ` Mathias Krause
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Dumazet @ 2013-11-06 16:39 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Mathias Krause, David S. Miller, Steffen Klassert,
	Dmitry Tarnyagin, netdev

On Wed, 2013-11-06 at 20:48 +0800, Herbert Xu wrote:
> On Wed, Nov 06, 2013 at 01:42:03PM +0100, Mathias Krause wrote:
> >
> > Well, skb_cow_data() will only copy, i.e. call __pskb_pull_tail(), in
> > case the skb is either cloned or fragmented. As you already said it
> > won't be cloned in your case. Does it contain fragments, i.e. is
> > skb_shinfo(skb)->nr_frags != 0? If not, we won't copy with the current
> > code either.
> 
> Whenever we say page it means nr_frags != 0.  So currently as
> long as we have pages in our skb we will copy.  With your patch
> we will no longer copy in the case where we have pages but the
> skb isn't cloned.  In fact that is the whole point of your patch.
> 
> > Can you please explain why this would be needed? I still don't get the
> > reasoning behind "pages are considered not writable at the moment even
> > if they are anonymous".
> 
> As I said you don't know where the page in the skb came from.  It
> may point to read-only memory or memory that's shared with another
> task that isn't expecting things to change underneath it.

Note that we might have now a per skb flag telling is all page frags are
owned.

This is SKBTX_SHARED_FRAG

If SKBTX_SHARED_FRAG is set on shared_info->tx_flags, then at least
one frag is not safe and we must copy all frags.


For example, this flag is set in TCP sendfile() path (vmsplice()...),
and zero copy paths in general (macvtap, tun)

I am not 100% sure, but this could be a hint.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06 12:48           ` Herbert Xu
  2013-11-06 16:39             ` Eric Dumazet
@ 2013-11-07  8:55             ` Mathias Krause
  2013-11-07  9:01               ` Herbert Xu
  1 sibling, 1 reply; 20+ messages in thread
From: Mathias Krause @ 2013-11-07  8:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On 06.11.2013 13:48, Herbert Xu wrote:
> On Wed, Nov 06, 2013 at 01:42:03PM +0100, Mathias Krause wrote:
>> Well, skb_cow_data() will only copy, i.e. call __pskb_pull_tail(), in
>> case the skb is either cloned or fragmented. As you already said it
>> won't be cloned in your case. Does it contain fragments, i.e. is
>> skb_shinfo(skb)->nr_frags != 0? If not, we won't copy with the current
>> code either.
> 
> Whenever we say page it means nr_frags != 0.  So currently as
> long as we have pages in our skb we will copy.  With your patch
> we will no longer copy in the case where we have pages but the
> skb isn't cloned.  In fact that is the whole point of your patch.

Indeed. I want to avoid the costly memcpy() on the CPU serving the NIC
interrupt, as that is a bottleneck in my setup. The packet processing --
encrypting/decrypting of ESP packets -- gets mostly parallelized via
pcrypt, so that's fine. But the initial network processing, i.e. getting
to pcrypt, is what's throttling the throughput currently. (RPS only
partly solves this problem as for the ESP receive path most traffic ends
up on the same flow).

>> Can you please explain why this would be needed? I still don't get the
>> reasoning behind "pages are considered not writable at the moment even
>> if they are anonymous".
> 
> As I said you don't know where the page in the skb came from.  It
> may point to read-only memory or memory that's shared with another
> task that isn't expecting things to change underneath it.
> 
> It may well turn out to most if not all cases of pages are safe to
> be written to if skb_cloned == 0.  However, we'd need to do a full
> audit of every source of page frags to be sure.  For example, you'd
> need to look at net drivers and splice.

Ah, okay. Now that makes sense. I'll look into it.

Thanks,
Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-06 16:39             ` Eric Dumazet
@ 2013-11-07  8:56               ` Mathias Krause
  0 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-07  8:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Herbert Xu, David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On 06.11.2013 17:39, Eric Dumazet wrote:
> On Wed, 2013-11-06 at 20:48 +0800, Herbert Xu wrote:
>> On Wed, Nov 06, 2013 at 01:42:03PM +0100, Mathias Krause wrote:
>>> Well, skb_cow_data() will only copy, i.e. call __pskb_pull_tail(), in
>>> case the skb is either cloned or fragmented. As you already said it
>>> won't be cloned in your case. Does it contain fragments, i.e. is
>>> skb_shinfo(skb)->nr_frags != 0? If not, we won't copy with the current
>>> code either.
>> Whenever we say page it means nr_frags != 0.  So currently as
>> long as we have pages in our skb we will copy.  With your patch
>> we will no longer copy in the case where we have pages but the
>> skb isn't cloned.  In fact that is the whole point of your patch.
>>
>>> Can you please explain why this would be needed? I still don't get the
>>> reasoning behind "pages are considered not writable at the moment even
>>> if they are anonymous".
>> As I said you don't know where the page in the skb came from.  It
>> may point to read-only memory or memory that's shared with another
>> task that isn't expecting things to change underneath it.
> 
> Note that we might have now a per skb flag telling is all page frags are
> owned.
> 
> This is SKBTX_SHARED_FRAG
> 
> If SKBTX_SHARED_FRAG is set on shared_info->tx_flags, then at least
> one frag is not safe and we must copy all frags.
> 
> 
> For example, this flag is set in TCP sendfile() path (vmsplice()...),
> and zero copy paths in general (macvtap, tun)
> 
> I am not 100% sure, but this could be a hint.

Thanks for the hint, Eric!


Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-07  8:55             ` Mathias Krause
@ 2013-11-07  9:01               ` Herbert Xu
  2013-11-07 10:01                 ` Mathias Krause
  0 siblings, 1 reply; 20+ messages in thread
From: Herbert Xu @ 2013-11-07  9:01 UTC (permalink / raw)
  To: Mathias Krause
  Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On Thu, Nov 07, 2013 at 09:55:34AM +0100, Mathias Krause wrote:
>
> Indeed. I want to avoid the costly memcpy() on the CPU serving the NIC
> interrupt, as that is a bottleneck in my setup. The packet processing --
> encrypting/decrypting of ESP packets -- gets mostly parallelized via
> pcrypt, so that's fine. But the initial network processing, i.e. getting
> to pcrypt, is what's throttling the throughput currently. (RPS only
> partly solves this problem as for the ESP receive path most traffic ends
> up on the same flow).

How about this: instead of doing in-place encryption/decryption
let's allocate a new destination buffer and encrypt/decrypt into
it.  Then we essentially get the copy for free.

So instead of modifying skb_cow_data you can change esp4.c, etc.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data()
  2013-11-07  9:01               ` Herbert Xu
@ 2013-11-07 10:01                 ` Mathias Krause
  0 siblings, 0 replies; 20+ messages in thread
From: Mathias Krause @ 2013-11-07 10:01 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, Steffen Klassert, Dmitry Tarnyagin, netdev

On 07.11.2013 10:01, Herbert Xu wrote:
> On Thu, Nov 07, 2013 at 09:55:34AM +0100, Mathias Krause wrote:
>> Indeed. I want to avoid the costly memcpy() on the CPU serving the NIC
>> interrupt, as that is a bottleneck in my setup. The packet processing --
>> encrypting/decrypting of ESP packets -- gets mostly parallelized via
>> pcrypt, so that's fine. But the initial network processing, i.e. getting
>> to pcrypt, is what's throttling the throughput currently. (RPS only
>> partly solves this problem as for the ESP receive path most traffic ends
>> up on the same flow).
> 
> How about this: instead of doing in-place encryption/decryption
> let's allocate a new destination buffer and encrypt/decrypt into
> it.  Then we essentially get the copy for free.
> 
> So instead of modifying skb_cow_data you can change esp4.c, etc.

That might be another option. But I fear it might affect performance due
to caching or NUMA effects. :/

I'll think about it and will try to investigate both approaches.


Thanks,
Mathias

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-11-07 10:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-05 13:54 [PATCH net-next 0/3] IPsec improvements Mathias Krause
2013-11-05 13:54 ` [PATCH net-next 1/3] net: move pskb_put() to core code Mathias Krause
2013-11-05 18:33   ` Ben Hutchings
2013-11-06  9:01     ` Mathias Krause
2013-11-05 13:54 ` [PATCH net-next 2/3] caif: use pskb_put() instead of reimplementing its functionality Mathias Krause
2013-11-05 13:54 ` [PATCH net-next 3/3] net: allow to leave the buffer fragmented in skb_cow_data() Mathias Krause
2013-11-05 14:33   ` Eric Dumazet
2013-11-05 14:43     ` Mathias Krause
2013-11-06  9:30   ` Herbert Xu
2013-11-06  9:49     ` Mathias Krause
2013-11-06  9:52       ` Herbert Xu
2013-11-06 12:42         ` Mathias Krause
2013-11-06 12:48           ` Herbert Xu
2013-11-06 16:39             ` Eric Dumazet
2013-11-07  8:56               ` Mathias Krause
2013-11-07  8:55             ` Mathias Krause
2013-11-07  9:01               ` Herbert Xu
2013-11-07 10:01                 ` Mathias Krause
2013-11-06  4:48 ` [PATCH net-next 0/3] IPsec improvements David Miller
2013-11-06  9:02   ` Mathias Krause

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.