All of lore.kernel.org
 help / color / mirror / Atom feed
* [0/14] GRO: Lots of microoptimisations
@ 2009-05-27  4:45 Herbert Xu
  2009-05-27  4:50 ` [PATCH 1/14] gro: Open-code frags copy in skb_gro_receive Herbert Xu
                   ` (15 more replies)
  0 siblings, 16 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:45 UTC (permalink / raw)
  To: David S. Miller, netdev

Hi Dave:

This series of patches brings GRO performance to within 1% of LRO
on the slow machine that I was testing.  I can overtake LRO by
deleting the IP checksum test and the Ethernet header test from
GRO, which LRO doesn't do anyway, but that's not very nice :)

I'm still looking at other optimisations as time allows.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/14] gro: Open-code frags copy in skb_gro_receive
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 2/14] gro: Inline skb_gro_header and cache frag0 virtual address Herbert Xu
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Open-code frags copy in skb_gro_receive

gcc does a poor job at generating code for the memcpy of the frags
array in skb_gro_receive, which is the primary purpose of that
function when merging frags.  In particular, it can't utilise the
alignment information of the source and destination.  This patch
open-codes the copy so we process words instead of bytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/skbuff.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d429c41..c88426b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2673,6 +2673,9 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	if (skb_shinfo(p)->frag_list)
 		goto merge;
 	else if (skb_headlen(skb) <= skb_gro_offset(skb)) {
+		skb_frag_t *frag;
+		int i;
+
 		if (skb_shinfo(p)->nr_frags + skb_shinfo(skb)->nr_frags >
 		    MAX_SKB_FRAGS)
 			return -E2BIG;
@@ -2682,9 +2685,9 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		skb_shinfo(skb)->frags[0].size -=
 			skb_gro_offset(skb) - skb_headlen(skb);
 
-		memcpy(skb_shinfo(p)->frags + skb_shinfo(p)->nr_frags,
-		       skb_shinfo(skb)->frags,
-		       skb_shinfo(skb)->nr_frags * sizeof(skb_frag_t));
+		frag = skb_shinfo(p)->frags + skb_shinfo(p)->nr_frags;
+		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
+			*frag++ = skb_shinfo(skb)->frags[i];
 
 		skb_shinfo(p)->nr_frags += skb_shinfo(skb)->nr_frags;
 		skb_shinfo(skb)->nr_frags = 0;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/14] gro: Inline skb_gro_header and cache frag0 virtual address
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
  2009-05-27  4:50 ` [PATCH 1/14] gro: Open-code frags copy in skb_gro_receive Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 3/14] gro: Localise offset/headlen in skb_gro_offset Herbert Xu
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Inline skb_gro_header and cache frag0 virtual address

The function skb_gro_header is called four times per packet which
quickly adds up at 10Gb/s.  This patch inlines it to allow better
optimisations.

Some architectures perform multiplication for page_address, which
is done by each skb_gro_header invocation.  This patch caches that
value in skb->cb to avoid the unnecessary multiplications.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |   22 +++++++++++++++-------
 net/core/dev.c            |   27 ++++++++++++---------------
 2 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ae3c209..fcb1cc9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1008,6 +1008,9 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 void netif_napi_del(struct napi_struct *napi);
 
 struct napi_gro_cb {
+	/* Virtual address of skb_shinfo(skb)->frags[0].page + offset. */
+	void *frag0;
+
 	/* This indicates where we are processing relative to skb->data. */
 	int data_offset;
 
@@ -1107,9 +1110,9 @@ extern int		dev_restart(struct net_device *dev);
 #ifdef CONFIG_NETPOLL_TRAP
 extern int		netpoll_trap(void);
 #endif
-extern void	      *skb_gro_header(struct sk_buff *skb, unsigned int hlen);
 extern int	       skb_gro_receive(struct sk_buff **head,
 				       struct sk_buff *skb);
+extern void	       skb_gro_reset_offset(struct sk_buff *skb);
 
 static inline unsigned int skb_gro_offset(const struct sk_buff *skb)
 {
@@ -1126,23 +1129,28 @@ static inline void skb_gro_pull(struct sk_buff *skb, unsigned int len)
 	NAPI_GRO_CB(skb)->data_offset += len;
 }
 
-static inline void skb_gro_reset_offset(struct sk_buff *skb)
+static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen)
 {
-	NAPI_GRO_CB(skb)->data_offset = 0;
+	unsigned int offset = skb_gro_offset(skb);
+
+	hlen += offset;
+	if (!NAPI_GRO_CB(skb)->frag0 ||
+	    unlikely(skb_shinfo(skb)->frags[0].size + skb_headlen(skb) < hlen))
+		return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
+
+	return NAPI_GRO_CB(skb)->frag0 + offset;
 }
 
 static inline void *skb_gro_mac_header(struct sk_buff *skb)
 {
 	return skb_headlen(skb) ? skb_mac_header(skb) :
-	       page_address(skb_shinfo(skb)->frags[0].page) +
-	       skb_shinfo(skb)->frags[0].page_offset;
+	       NAPI_GRO_CB(skb)->frag0;
 }
 
 static inline void *skb_gro_network_header(struct sk_buff *skb)
 {
 	return skb_headlen(skb) ? skb_network_header(skb) :
-	       page_address(skb_shinfo(skb)->frags[0].page) +
-	       skb_shinfo(skb)->frags[0].page_offset + skb_network_offset(skb);
+	       NAPI_GRO_CB(skb)->frag0 + skb_network_offset(skb);
 }
 
 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 241613f..9b4c8da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2387,21 +2387,6 @@ void napi_gro_flush(struct napi_struct *napi)
 }
 EXPORT_SYMBOL(napi_gro_flush);
 
-void *skb_gro_header(struct sk_buff *skb, unsigned int hlen)
-{
-	unsigned int offset = skb_gro_offset(skb);
-
-	hlen += offset;
-	if (unlikely(skb_headlen(skb) ||
-		     skb_shinfo(skb)->frags[0].size < hlen ||
-		     PageHighMem(skb_shinfo(skb)->frags[0].page)))
-		return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
-
-	return page_address(skb_shinfo(skb)->frags[0].page) +
-	       skb_shinfo(skb)->frags[0].page_offset + offset;
-}
-EXPORT_SYMBOL(skb_gro_header);
-
 int dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	struct sk_buff **pp = NULL;
@@ -2517,6 +2502,18 @@ int napi_skb_finish(int ret, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(napi_skb_finish);
 
+void skb_gro_reset_offset(struct sk_buff *skb)
+{
+	NAPI_GRO_CB(skb)->data_offset = 0;
+	NAPI_GRO_CB(skb)->frag0 = NULL;
+
+	if (!skb_headlen(skb) && !PageHighMem(skb_shinfo(skb)->frags[0].page))
+		NAPI_GRO_CB(skb)->frag0 =
+			page_address(skb_shinfo(skb)->frags[0].page) +
+			skb_shinfo(skb)->frags[0].page_offset;
+}
+EXPORT_SYMBOL(skb_gro_reset_offset);
+
 int napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	skb_gro_reset_offset(skb);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/14] gro: Localise offset/headlen in skb_gro_offset
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
  2009-05-27  4:50 ` [PATCH 1/14] gro: Open-code frags copy in skb_gro_receive Herbert Xu
  2009-05-27  4:50 ` [PATCH 2/14] gro: Inline skb_gro_header and cache frag0 virtual address Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 4/14] gro: Only use skb_gro_header for completely non-linear packets Herbert Xu
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Localise offset/headlen in skb_gro_offset

This patch stores the offset/headlen in local variables as they're
used repeatedly in skb_gro_offset.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/skbuff.c |   23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c88426b..168e949 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2666,13 +2666,15 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	struct sk_buff *nskb;
 	unsigned int headroom;
 	unsigned int len = skb_gro_len(skb);
+	unsigned int offset = skb_gro_offset(skb);
+	unsigned int headlen = skb_headlen(skb);
 
 	if (p->len + len >= 65536)
 		return -E2BIG;
 
 	if (skb_shinfo(p)->frag_list)
 		goto merge;
-	else if (skb_headlen(skb) <= skb_gro_offset(skb)) {
+	else if (headlen <= offset) {
 		skb_frag_t *frag;
 		int i;
 
@@ -2680,10 +2682,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		    MAX_SKB_FRAGS)
 			return -E2BIG;
 
-		skb_shinfo(skb)->frags[0].page_offset +=
-			skb_gro_offset(skb) - skb_headlen(skb);
-		skb_shinfo(skb)->frags[0].size -=
-			skb_gro_offset(skb) - skb_headlen(skb);
+		skb_shinfo(skb)->frags[0].page_offset += offset - headlen;
+		skb_shinfo(skb)->frags[0].size -= offset - headlen;
 
 		frag = skb_shinfo(p)->frags + skb_shinfo(p)->nr_frags;
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
@@ -2736,16 +2736,13 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	p = nskb;
 
 merge:
-	if (skb_gro_offset(skb) > skb_headlen(skb)) {
-		skb_shinfo(skb)->frags[0].page_offset +=
-			skb_gro_offset(skb) - skb_headlen(skb);
-		skb_shinfo(skb)->frags[0].size -=
-			skb_gro_offset(skb) - skb_headlen(skb);
-		skb_gro_reset_offset(skb);
-		skb_gro_pull(skb, skb_headlen(skb));
+	if (offset > headlen) {
+		skb_shinfo(skb)->frags[0].page_offset += offset - headlen;
+		skb_shinfo(skb)->frags[0].size -= offset - headlen;
+		offset = headlen;
 	}
 
-	__skb_pull(skb, skb_gro_offset(skb));
+	__skb_pull(skb, offset);
 
 	p->prev->next = skb;
 	p->prev = skb;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/14] gro: Only use skb_gro_header for completely non-linear packets
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (2 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 3/14] gro: Localise offset/headlen in skb_gro_offset Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 5/14] tcp: Optimise GRO port comparisons Herbert Xu
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Only use skb_gro_header for completely non-linear packets

Currently skb_gro_header is used for packets which put the hardware
header in skb->data with the rest in frags.  Since the drivers that
need this optimisation all provide completely non-linear packets,
we can gain extra optimisations by only performing the frag0
optimisation for completely non-linear packets.

In particular, we can simply test frag0 (instead of skb_headlen)
to see whether the optimisation is in force.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |   11 ++++++-----
 net/core/dev.c            |    3 ++-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcb1cc9..38678bc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1135,22 +1135,23 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen)
 
 	hlen += offset;
 	if (!NAPI_GRO_CB(skb)->frag0 ||
-	    unlikely(skb_shinfo(skb)->frags[0].size + skb_headlen(skb) < hlen))
+	    unlikely(skb_shinfo(skb)->frags[0].size < hlen)) {
+		NAPI_GRO_CB(skb)->frag0 = NULL;
 		return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
+	}
 
 	return NAPI_GRO_CB(skb)->frag0 + offset;
 }
 
 static inline void *skb_gro_mac_header(struct sk_buff *skb)
 {
-	return skb_headlen(skb) ? skb_mac_header(skb) :
-	       NAPI_GRO_CB(skb)->frag0;
+	return NAPI_GRO_CB(skb)->frag0 ?: skb_mac_header(skb);
 }
 
 static inline void *skb_gro_network_header(struct sk_buff *skb)
 {
-	return skb_headlen(skb) ? skb_network_header(skb) :
-	       NAPI_GRO_CB(skb)->frag0 + skb_network_offset(skb);
+	return (NAPI_GRO_CB(skb)->frag0 ?: skb->data) +
+	       skb_network_offset(skb);
 }
 
 static inline int dev_hard_header(struct sk_buff *skb, struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index 9b4c8da..07ad237 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2507,7 +2507,8 @@ void skb_gro_reset_offset(struct sk_buff *skb)
 	NAPI_GRO_CB(skb)->data_offset = 0;
 	NAPI_GRO_CB(skb)->frag0 = NULL;
 
-	if (!skb_headlen(skb) && !PageHighMem(skb_shinfo(skb)->frags[0].page))
+	if (skb->mac_header == skb->tail &&
+	    !PageHighMem(skb_shinfo(skb)->frags[0].page))
 		NAPI_GRO_CB(skb)->frag0 =
 			page_address(skb_shinfo(skb)->frags[0].page) +
 			skb_shinfo(skb)->frags[0].page_offset;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/14] tcp: Optimise GRO port comparisons
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (3 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 4/14] gro: Only use skb_gro_header for completely non-linear packets Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 6/14] tcp: Remove unnecessary window comparisons for GRO Herbert Xu
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

tcp: Optimise GRO port comparisons

Instead of doing two 16-bit operations for the source/destination
ports, we can do one 32-bit operation to take care both.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7a0f0b2..ff6adec 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2544,7 +2544,7 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 		th2 = tcp_hdr(p);
 
-		if ((th->source ^ th2->source) | (th->dest ^ th2->dest)) {
+		if (*(u32 *)&th->source ^ *(u32 *)&th2->source) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/14] tcp: Remove unnecessary window comparisons for GRO
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (4 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 5/14] tcp: Optimise GRO port comparisons Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 7/14] tcp: Optimise len/mss comparison Herbert Xu
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

tcp: Remove unnecessary window comparisons for GRO

The window has already been checked as part of the flag word
so there is no need to check it explicitly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index ff6adec..313960e 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2559,7 +2559,7 @@ found:
 	flush |= flags & TCP_FLAG_CWR;
 	flush |= (flags ^ tcp_flag_word(th2)) &
 		  ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH);
-	flush |= (th->ack_seq ^ th2->ack_seq) | (th->window ^ th2->window);
+	flush |= th->ack_seq ^ th2->ack_seq;
 	for (i = sizeof(*th); !flush && i < thlen; i += 4)
 		flush |= *(u32 *)((u8 *)th + i) ^
 			 *(u32 *)((u8 *)th2 + i);

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/14] tcp: Optimise len/mss comparison
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (5 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 6/14] tcp: Remove unnecessary window comparisons for GRO Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 8/14] gro: Optimise length comparison in skb_gro_header Herbert Xu
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

tcp: Optimise len/mss comparison

Instead of checking len > mss || len == 0, we can accomplish
both by checking (len - 1) > mss using the unsigned wraparound.
At nearly a million times a second, this might just help.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 313960e..68342d4 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2566,7 +2566,7 @@ found:
 
 	mss = skb_shinfo(p)->gso_size;
 
-	flush |= (len > mss) | !len;
+	flush |= (len - 1) >= mss;
 	flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
 
 	if (flush || skb_gro_receive(head, skb)) {

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 8/14] gro: Optimise length comparison in skb_gro_header
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (6 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 7/14] tcp: Optimise len/mss comparison Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 9/14] gro: Avoid unnecessary comparison after skb_gro_header Herbert Xu
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Optimise length comparison in skb_gro_header

By caching frag0_len, we can avoid checking both frag0 and the
length separately in skb_gro_header.  This helps as skb_gro_header
is called four times per packet which amounts to a few million
times at 10Gb/s.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |    7 +++++--
 net/core/dev.c            |    5 ++++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 38678bc..966c413 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1011,6 +1011,9 @@ struct napi_gro_cb {
 	/* Virtual address of skb_shinfo(skb)->frags[0].page + offset. */
 	void *frag0;
 
+	/* Length of frag0. */
+	unsigned int frag0_len;
+
 	/* This indicates where we are processing relative to skb->data. */
 	int data_offset;
 
@@ -1134,9 +1137,9 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen)
 	unsigned int offset = skb_gro_offset(skb);
 
 	hlen += offset;
-	if (!NAPI_GRO_CB(skb)->frag0 ||
-	    unlikely(skb_shinfo(skb)->frags[0].size < hlen)) {
+	if (NAPI_GRO_CB(skb)->frag0_len < hlen) {
 		NAPI_GRO_CB(skb)->frag0 = NULL;
+		NAPI_GRO_CB(skb)->frag0_len = 0;
 		return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
 	}
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 07ad237..e634c8a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2506,12 +2506,15 @@ void skb_gro_reset_offset(struct sk_buff *skb)
 {
 	NAPI_GRO_CB(skb)->data_offset = 0;
 	NAPI_GRO_CB(skb)->frag0 = NULL;
+	NAPI_GRO_CB(skb)->frag0_len = 0;
 
 	if (skb->mac_header == skb->tail &&
-	    !PageHighMem(skb_shinfo(skb)->frags[0].page))
+	    !PageHighMem(skb_shinfo(skb)->frags[0].page)) {
 		NAPI_GRO_CB(skb)->frag0 =
 			page_address(skb_shinfo(skb)->frags[0].page) +
 			skb_shinfo(skb)->frags[0].page_offset;
+		NAPI_GRO_CB(skb)->frag0_len = skb_shinfo(skb)->frags[0].size;
+	}
 }
 EXPORT_SYMBOL(skb_gro_reset_offset);
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 9/14] gro: Avoid unnecessary comparison after skb_gro_header
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (7 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 8/14] gro: Optimise length comparison in skb_gro_header Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO Herbert Xu
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Avoid unnecessary comparison after skb_gro_header

For the overwhelming majority of cases, skb_gro_header's return
value cannot be NULL.  Yet we must check it because of its current
form.  This patch splits it up into multiple functions in order
to avoid this.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/netdevice.h |   23 ++++++++++++++---------
 net/core/dev.c            |   17 ++++++++++++-----
 net/ipv4/af_inet.c        |   13 ++++++++++---
 net/ipv4/tcp.c            |   22 ++++++++++++++++------
 net/ipv6/af_inet6.c       |   13 ++++++++++---
 5 files changed, 62 insertions(+), 26 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 966c413..d2b1561 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1132,18 +1132,23 @@ static inline void skb_gro_pull(struct sk_buff *skb, unsigned int len)
 	NAPI_GRO_CB(skb)->data_offset += len;
 }
 
-static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen)
+static inline void *skb_gro_header_fast(struct sk_buff *skb,
+					unsigned int offset)
 {
-	unsigned int offset = skb_gro_offset(skb);
+	return NAPI_GRO_CB(skb)->frag0 + offset;
+}
 
-	hlen += offset;
-	if (NAPI_GRO_CB(skb)->frag0_len < hlen) {
-		NAPI_GRO_CB(skb)->frag0 = NULL;
-		NAPI_GRO_CB(skb)->frag0_len = 0;
-		return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
-	}
+static inline int skb_gro_header_hard(struct sk_buff *skb, unsigned int hlen)
+{
+	return NAPI_GRO_CB(skb)->frag0_len < hlen;
+}
 
-	return NAPI_GRO_CB(skb)->frag0 + offset;
+static inline void *skb_gro_header_slow(struct sk_buff *skb, unsigned int hlen,
+					unsigned int offset)
+{
+	NAPI_GRO_CB(skb)->frag0 = NULL;
+	NAPI_GRO_CB(skb)->frag0_len = 0;
+	return pskb_may_pull(skb, hlen) ? skb->data + offset : NULL;
 }
 
 static inline void *skb_gro_mac_header(struct sk_buff *skb)
diff --git a/net/core/dev.c b/net/core/dev.c
index e634c8a..5a81302 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2587,17 +2587,24 @@ struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 {
 	struct sk_buff *skb = napi->skb;
 	struct ethhdr *eth;
+	unsigned int hlen;
+	unsigned int off;
 
 	napi->skb = NULL;
 
 	skb_reset_mac_header(skb);
 	skb_gro_reset_offset(skb);
 
-	eth = skb_gro_header(skb, sizeof(*eth));
-	if (!eth) {
-		napi_reuse_skb(napi, skb);
-		skb = NULL;
-		goto out;
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*eth);
+	eth = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		eth = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!eth)) {
+			napi_reuse_skb(napi, skb);
+			skb = NULL;
+			goto out;
+		}
 	}
 
 	skb_gro_pull(skb, sizeof(*eth));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1706896..644cc55 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1246,13 +1246,20 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 	struct sk_buff **pp = NULL;
 	struct sk_buff *p;
 	struct iphdr *iph;
+	unsigned int hlen;
+	unsigned int off;
 	int flush = 1;
 	int proto;
 	int id;
 
-	iph = skb_gro_header(skb, sizeof(*iph));
-	if (unlikely(!iph))
-		goto out;
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*iph);
+	iph = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		iph = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!iph))
+			goto out;
+	}
 
 	proto = iph->protocol & (MAX_INET_PROTOS - 1);
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 68342d4..c3dcec5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2518,20 +2518,30 @@ struct sk_buff **tcp_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	unsigned int thlen;
 	unsigned int flags;
 	unsigned int mss = 1;
+	unsigned int hlen;
+	unsigned int off;
 	int flush = 1;
 	int i;
 
-	th = skb_gro_header(skb, sizeof(*th));
-	if (unlikely(!th))
-		goto out;
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*th);
+	th = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		th = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!th))
+			goto out;
+	}
 
 	thlen = th->doff * 4;
 	if (thlen < sizeof(*th))
 		goto out;
 
-	th = skb_gro_header(skb, thlen);
-	if (unlikely(!th))
-		goto out;
+	hlen = off + thlen;
+	if (skb_gro_header_hard(skb, hlen)) {
+		th = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!th))
+			goto out;
+	}
 
 	skb_gro_pull(skb, thlen);
 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 61f5538..b6215be 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -817,13 +817,20 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 	struct sk_buff *p;
 	struct ipv6hdr *iph;
 	unsigned int nlen;
+	unsigned int hlen;
+	unsigned int off;
 	int flush = 1;
 	int proto;
 	__wsum csum;
 
-	iph = skb_gro_header(skb, sizeof(*iph));
-	if (unlikely(!iph))
-		goto out;
+	off = skb_gro_offset(skb);
+	hlen = off + sizeof(*iph);
+	iph = skb_gro_header_fast(skb, off);
+	if (skb_gro_header_hard(skb, hlen)) {
+		iph = skb_gro_header_slow(skb, hlen, off);
+		if (unlikely(!iph))
+			goto out;
+	}
 
 	skb_gro_pull(skb, sizeof(*iph));
 	skb_set_transport_header(skb, skb_gro_offset(skb));

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (8 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 9/14] gro: Avoid unnecessary comparison after skb_gro_header Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27 18:00   ` Andi Kleen
  2009-05-27  4:50 ` [PATCH 11/14] gro: Open-code final pskb_may_pull Herbert Xu
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

ipv4: Use 32-bit loads for ID and length in GRO

This patch optimises the IPv4 GRO code by using 32-bit loads
(instead of 16-bit ones) on the ID and length checks in the receive
function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/af_inet.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 644cc55..5abee4c 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1248,9 +1248,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 	struct iphdr *iph;
 	unsigned int hlen;
 	unsigned int off;
+	unsigned int id;
 	int flush = 1;
 	int proto;
-	int id;
 
 	off = skb_gro_offset(skb);
 	hlen = off + sizeof(*iph);
@@ -1274,9 +1274,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 	if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
 		goto out_unlock;
 
-	flush = ntohs(iph->tot_len) != skb_gro_len(skb) ||
-		iph->frag_off != htons(IP_DF);
-	id = ntohs(iph->id);
+	id = ntohl(*(u32 *)&iph->id);
+	flush = (u16)((ntohl(*(u32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF));
+	id >>= 16;
 
 	for (p = *head; p; p = p->next) {
 		struct iphdr *iph2;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/14] gro: Open-code final pskb_may_pull
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (9 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 12/14] gro: Nasty optimisations for page frags in skb_gro_receive Herbert Xu
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Open-code final pskb_may_pull

As we know the only packets which need the final pskb_may_pull
are completely non-linear, and have all the required bits in
frag0, we can perform a straight memcpy instead of going through
pskb_may_pull and doing skb_copy_bits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/dev.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 5a81302..ed7513c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2449,10 +2449,25 @@ int dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 	ret = GRO_HELD;
 
 pull:
-	if (unlikely(!pskb_may_pull(skb, skb_gro_offset(skb)))) {
-		if (napi->gro_list == skb)
-			napi->gro_list = skb->next;
-		ret = GRO_DROP;
+	if (skb_headlen(skb) < skb_gro_offset(skb)) {
+		int grow = skb_gro_offset(skb) - skb_headlen(skb);
+
+		BUG_ON(skb->end - skb->tail < grow);
+
+		memcpy(skb_tail_pointer(skb), NAPI_GRO_CB(skb)->frag0, grow);
+
+		skb->tail += grow;
+		skb->data_len -= grow;
+
+		skb_shinfo(skb)->frags[0].page_offset += grow;
+		skb_shinfo(skb)->frags[0].size -= grow;
+
+		if (unlikely(!skb_shinfo(skb)->frags[0].size)) {
+			put_page(skb_shinfo(skb)->frags[0].page);
+			memmove(skb_shinfo(skb)->frags,
+				skb_shinfo(skb)->frags + 1,
+				--skb_shinfo(skb)->nr_frags);
+		}
 	}
 
 ok:

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/14] gro: Nasty optimisations for page frags in skb_gro_receive
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (10 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 11/14] gro: Open-code final pskb_may_pull Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 13/14] gro: Store shinfo in local variable " Herbert Xu
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Nasty optimisations for page frags in skb_gro_receive

This patch reverses the direction of the frags array copy in
skb_gro_receive in order simplify the loop conditional.  It
also avoids touching the first element of the original frags
array.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/skbuff.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 168e949..19afb18 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2676,21 +2676,26 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		goto merge;
 	else if (headlen <= offset) {
 		skb_frag_t *frag;
-		int i;
+		skb_frag_t *frag2;
+		int i = skb_shinfo(skb)->nr_frags;
+		int nr_frags = skb_shinfo(p)->nr_frags + i;
+
+		offset -= headlen;
 
-		if (skb_shinfo(p)->nr_frags + skb_shinfo(skb)->nr_frags >
-		    MAX_SKB_FRAGS)
+		if (nr_frags > MAX_SKB_FRAGS)
 			return -E2BIG;
 
-		skb_shinfo(skb)->frags[0].page_offset += offset - headlen;
-		skb_shinfo(skb)->frags[0].size -= offset - headlen;
+		skb_shinfo(p)->nr_frags = nr_frags;
+		skb_shinfo(skb)->nr_frags = 0;
 
-		frag = skb_shinfo(p)->frags + skb_shinfo(p)->nr_frags;
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
-			*frag++ = skb_shinfo(skb)->frags[i];
+		frag = skb_shinfo(p)->frags + nr_frags;
+		frag2 = skb_shinfo(skb)->frags + i;
+		do {
+			*--frag = *--frag2;
+		} while (--i);
 
-		skb_shinfo(p)->nr_frags += skb_shinfo(skb)->nr_frags;
-		skb_shinfo(skb)->nr_frags = 0;
+		frag->page_offset += offset;
+		frag->size -= offset;
 
 		skb->truesize -= skb->data_len;
 		skb->len -= skb->data_len;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/14] gro: Store shinfo in local variable in skb_gro_receive
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (11 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 12/14] gro: Nasty optimisations for page frags in skb_gro_receive Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27  4:50 ` [PATCH 14/14] tcp: Do not check flush when comparing options for GRO Herbert Xu
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

gro: Store shinfo in local variable in skb_gro_receive

This patch stores the two shinfo pointers in local variables
because they're used over and over again in skb_gro_receive.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/core/skbuff.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 19afb18..8e815e6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2664,6 +2664,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
 	struct sk_buff *p = *head;
 	struct sk_buff *nskb;
+	struct skb_shared_info *skbinfo = skb_shinfo(skb);
+	struct skb_shared_info *pinfo = skb_shinfo(p);
 	unsigned int headroom;
 	unsigned int len = skb_gro_len(skb);
 	unsigned int offset = skb_gro_offset(skb);
@@ -2672,24 +2674,24 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 	if (p->len + len >= 65536)
 		return -E2BIG;
 
-	if (skb_shinfo(p)->frag_list)
+	if (pinfo->frag_list)
 		goto merge;
 	else if (headlen <= offset) {
 		skb_frag_t *frag;
 		skb_frag_t *frag2;
-		int i = skb_shinfo(skb)->nr_frags;
-		int nr_frags = skb_shinfo(p)->nr_frags + i;
+		int i = skbinfo->nr_frags;
+		int nr_frags = pinfo->nr_frags + i;
 
 		offset -= headlen;
 
 		if (nr_frags > MAX_SKB_FRAGS)
 			return -E2BIG;
 
-		skb_shinfo(p)->nr_frags = nr_frags;
-		skb_shinfo(skb)->nr_frags = 0;
+		pinfo->nr_frags = nr_frags;
+		skbinfo->nr_frags = 0;
 
-		frag = skb_shinfo(p)->frags + nr_frags;
-		frag2 = skb_shinfo(skb)->frags + i;
+		frag = pinfo->frags + nr_frags;
+		frag2 = skbinfo->frags + i;
 		do {
 			*--frag = *--frag2;
 		} while (--i);
@@ -2726,7 +2728,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 	*NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
 	skb_shinfo(nskb)->frag_list = p;
-	skb_shinfo(nskb)->gso_size = skb_shinfo(p)->gso_size;
+	skb_shinfo(nskb)->gso_size = pinfo->gso_size;
 	skb_header_release(p);
 	nskb->prev = p;
 
@@ -2742,8 +2744,8 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 
 merge:
 	if (offset > headlen) {
-		skb_shinfo(skb)->frags[0].page_offset += offset - headlen;
-		skb_shinfo(skb)->frags[0].size -= offset - headlen;
+		skbinfo->frags[0].page_offset += offset - headlen;
+		skbinfo->frags[0].size -= offset - headlen;
 		offset = headlen;
 	}
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 14/14] tcp: Do not check flush when comparing options for GRO
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (12 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 13/14] gro: Store shinfo in local variable " Herbert Xu
@ 2009-05-27  4:50 ` Herbert Xu
  2009-05-27 10:42 ` [0/14] GRO: Lots of microoptimisations David Miller
  2009-05-27 17:52 ` Benjamin LaHaise
  15 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27  4:50 UTC (permalink / raw)
  To: David S. Miller, netdev

tcp: Do not check flush when comparing options for GRO

There is no need to repeatedly check flush when comparing TCP
options for GRO as it will be false 99% of the time where it
matters.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv4/tcp.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c3dcec5..0fb8b44 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2570,7 +2570,7 @@ found:
 	flush |= (flags ^ tcp_flag_word(th2)) &
 		  ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH);
 	flush |= th->ack_seq ^ th2->ack_seq;
-	for (i = sizeof(*th); !flush && i < thlen; i += 4)
+	for (i = sizeof(*th); i < thlen; i += 4)
 		flush |= *(u32 *)((u8 *)th + i) ^
 			 *(u32 *)((u8 *)th2 + i);
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (13 preceding siblings ...)
  2009-05-27  4:50 ` [PATCH 14/14] tcp: Do not check flush when comparing options for GRO Herbert Xu
@ 2009-05-27 10:42 ` David Miller
  2009-05-27 17:52 ` Benjamin LaHaise
  15 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2009-05-27 10:42 UTC (permalink / raw)
  To: herbert; +Cc: netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 27 May 2009 14:45:39 +1000

> This series of patches brings GRO performance to within 1% of LRO
> on the slow machine that I was testing.  I can overtake LRO by
> deleting the IP checksum test and the Ethernet header test from
> GRO, which LRO doesn't do anyway, but that's not very nice :)
> 
> I'm still looking at other optimisations as time allows.

All applied to net-next-2.6, thanks Herbert!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
                   ` (14 preceding siblings ...)
  2009-05-27 10:42 ` [0/14] GRO: Lots of microoptimisations David Miller
@ 2009-05-27 17:52 ` Benjamin LaHaise
  2009-05-27 23:08   ` Herbert Xu
  15 siblings, 1 reply; 32+ messages in thread
From: Benjamin LaHaise @ 2009-05-27 17:52 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

Hi Herbert,

On Wed, May 27, 2009 at 02:45:39PM +1000, Herbert Xu wrote:
> Hi Dave:
> 
> This series of patches brings GRO performance to within 1% of LRO
> on the slow machine that I was testing.  I can overtake LRO by
> deleting the IP checksum test and the Ethernet header test from
> GRO, which LRO doesn't do anyway, but that's not very nice :)
> 
> I'm still looking at other optimisations as time allows.

A few questions for you: I've been looking a bit into potential GRO 
optimisations that are possible with the vxge driver.  At least from my 
existing testing on a P4 Xeon, it seems that doing packet rx via 
napi_gro_receive() was a bit slower.  I'll retest with these changes 
of yours.  What platform have your tests been run on?  Also, do you have 
any notes/ideas on how best to make use of the GRO functionality within 
the kernel?  I'm hoping it's possible to make use of a few of the hardware 
hints to improve fast path performance.

	-ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO
  2009-05-27  4:50 ` [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO Herbert Xu
@ 2009-05-27 18:00   ` Andi Kleen
  2009-05-27 21:26     ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Andi Kleen @ 2009-05-27 18:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

Herbert Xu <herbert@gondor.apana.org.au> writes:

> ipv4: Use 32-bit loads for ID and length in GRO
>
> This patch optimises the IPv4 GRO code by using 32-bit loads
> (instead of 16-bit ones) on the ID and length checks in the receive
> function.

On what architecture is that faster?

At least on x86 they should be the same performance, except that
the 16bit one is one byte larger, but that shouldn't make a difference.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO
  2009-05-27 18:00   ` Andi Kleen
@ 2009-05-27 21:26     ` Herbert Xu
  0 siblings, 0 replies; 32+ messages in thread
From: Herbert Xu @ 2009-05-27 21:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, netdev

On Wed, May 27, 2009 at 08:00:36PM +0200, Andi Kleen wrote:
>
> On what architecture is that faster?
> 
> At least on x86 they should be the same performance, except that
> the 16bit one is one byte larger, but that shouldn't make a difference.

It shrunk the code by more than a byte, at least with some versions
of gcc on x86-64.

Yes some of these patches might seem a tad trivial, but it does add
up.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-27 17:52 ` Benjamin LaHaise
@ 2009-05-27 23:08   ` Herbert Xu
  2009-05-28 15:21     ` Benjamin LaHaise
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-05-27 23:08 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

On Wed, May 27, 2009 at 01:52:23PM -0400, Benjamin LaHaise wrote:
> 
> A few questions for you: I've been looking a bit into potential GRO 
> optimisations that are possible with the vxge driver.  At least from my 
> existing testing on a P4 Xeon, it seems that doing packet rx via 
> napi_gro_receive() was a bit slower.  I'll retest with these changes 

Slower compared to LRO or GRO off?

> of yours.  What platform have your tests been run on?  Also, do you have 
> any notes/ideas on how best to make use of the GRO functionality within 
> the kernel?  I'm hoping it's possible to make use of a few of the hardware 
> hints to improve fast path performance.

What sort of hints do you have?

There aren't any current support for that but I'm happy to add
things as appropriate.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-27 23:08   ` Herbert Xu
@ 2009-05-28 15:21     ` Benjamin LaHaise
  2009-05-29  9:28       ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Benjamin LaHaise @ 2009-05-28 15:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Thu, May 28, 2009 at 09:08:58AM +1000, Herbert Xu wrote:
> On Wed, May 27, 2009 at 01:52:23PM -0400, Benjamin LaHaise wrote:
> > 
> > A few questions for you: I've been looking a bit into potential GRO 
> > optimisations that are possible with the vxge driver.  At least from my 
> > existing testing on a P4 Xeon, it seems that doing packet rx via 
> > napi_gro_receive() was a bit slower.  I'll retest with these changes 
> 
> Slower compared to LRO or GRO off?

With GRO off I'm getting ~4.7-5Gbps to the receiver which is CPU bound with 
netperf.  With GRO on, that drops to ~3.9-4.3Gbps.  The only real difference 
is the entry point into the net code being napi_gro_receive() vs 
netif_receive_skb().

> > of yours.  What platform have your tests been run on?  Also, do you have 
> > any notes/ideas on how best to make use of the GRO functionality within 
> > the kernel?  I'm hoping it's possible to make use of a few of the hardware 
> > hints to improve fast path performance.
> 
> What sort of hints do you have?

We have a few bits in the hardware descriptor which indicate if the packet 
is TCP or UDP, IPv4 or IPv6, as well as whether TCP packets are fast path 
eligible.  The hardware can also split up the headers to place the ethernet 
MAC, IP and payload in separate buffers.  I plan to run a few tests to see 
if dispatching directly from the driver into the TCP fast path makes much 
difference.

		-ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-28 15:21     ` Benjamin LaHaise
@ 2009-05-29  9:28       ` Herbert Xu
  2009-05-29  9:29         ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-05-29  9:28 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

On Thu, May 28, 2009 at 11:21:43AM -0400, Benjamin LaHaise wrote:
> 
> With GRO off I'm getting ~4.7-5Gbps to the receiver which is CPU bound with 
> netperf.  With GRO on, that drops to ~3.9-4.3Gbps.  The only real difference 
> is the entry point into the net code being napi_gro_receive() vs 
> netif_receive_skb().

That doesn't sound right at all.  Can you run tcpdump on it to see
if it's actually aggregating?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-29  9:28       ` Herbert Xu
@ 2009-05-29  9:29         ` Herbert Xu
  2009-05-29 16:23           ` Benjamin LaHaise
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-05-29  9:29 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

On Fri, May 29, 2009 at 07:28:21PM +1000, Herbert Xu wrote:
>
> That doesn't sound right at all.  Can you run tcpdump on it to see
> if it's actually aggregating?

Oh and when you say GRO off, is that with it turned off using
ethtool or with the GRO patch to your driver reverted?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-29  9:29         ` Herbert Xu
@ 2009-05-29 16:23           ` Benjamin LaHaise
  2009-06-10  5:44             ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Benjamin LaHaise @ 2009-05-29 16:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Fri, May 29, 2009 at 07:29:30PM +1000, Herbert Xu wrote:
> Oh and when you say GRO off, is that with it turned off using
> ethtool or with the GRO patch to your driver reverted?

It's just being turned off by setting a loadable module option.

		-ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-05-29 16:23           ` Benjamin LaHaise
@ 2009-06-10  5:44             ` Herbert Xu
  2009-06-12 16:09               ` Benjamin LaHaise
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-06-10  5:44 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David S. Miller, netdev

On Fri, May 29, 2009 at 12:23:13PM -0400, Benjamin LaHaise wrote:
>
> It's just being turned off by setting a loadable module option.

Let me see if I can get my hands on a machine with your NIC.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-10  5:44             ` Herbert Xu
@ 2009-06-12 16:09               ` Benjamin LaHaise
  2009-06-12 23:48                 ` David Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Benjamin LaHaise @ 2009-06-12 16:09 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

On Wed, Jun 10, 2009 at 03:44:49PM +1000, Herbert Xu wrote:
> On Fri, May 29, 2009 at 12:23:13PM -0400, Benjamin LaHaise wrote:
> >
> > It's just being turned off by setting a loadable module option.
> 
> Let me see if I can get my hands on a machine with your NIC.

I found at least one reason why: the first skb_shinfo()->frag_list touch 
in dev_gro_receive() was causing a cache miss.  Adding a prefetch in the 
driver helps that a little bit, but there's still > 500Mbps difference.  
I'd like to retest on a system with newer CPUs, but it will probably be a 
few weeks until all the hardware is in the same place.

		-ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-12 16:09               ` Benjamin LaHaise
@ 2009-06-12 23:48                 ` David Miller
  2009-06-16 16:35                   ` Benjamin LaHaise
  0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2009-06-12 23:48 UTC (permalink / raw)
  To: ben.lahaise; +Cc: herbert, netdev

From: Benjamin LaHaise <ben.lahaise@neterion.com>
Date: Fri, 12 Jun 2009 12:09:26 -0400

> I found at least one reason why: the first skb_shinfo()->frag_list touch 
> in dev_gro_receive() was causing a cache miss.  Adding a prefetch in the 
> driver helps that a little bit, but there's still > 500Mbps difference.  

I find a 500Mbps difference, due to just one single cache miss on
every packet, simply astounding and unbelievable.  But hey, it is
what you are seeing, so something has to account for it. :)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-12 23:48                 ` David Miller
@ 2009-06-16 16:35                   ` Benjamin LaHaise
  2009-06-16 16:38                     ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Benjamin LaHaise @ 2009-06-16 16:35 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev

On Fri, Jun 12, 2009 at 04:48:33PM -0700, David Miller wrote:
> I find a 500Mbps difference, due to just one single cache miss on
> every packet, simply astounding and unbelievable.  But hey, it is
> what you are seeing, so something has to account for it. :)

The cache miss only accounts for ~50Mbpsi, it'd be nice if there was an 
easy way to get the whole 500Mbps back.  The rest seems to be in the 
general overhead of the GRO code vs the normal NAPI rx path.  The P4 
Xeon is substantially worse at string operations than the Core 2 / Core i7 
based Xeons, so I'm hoping to test and see if they do any better with the 
GRO code when I get access to a new machine soon.

		-ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-16 16:35                   ` Benjamin LaHaise
@ 2009-06-16 16:38                     ` Herbert Xu
  2009-06-17  8:07                       ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-06-16 16:38 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David Miller, netdev

On Tue, Jun 16, 2009 at 12:35:47PM -0400, Benjamin LaHaise wrote:
> On Fri, Jun 12, 2009 at 04:48:33PM -0700, David Miller wrote:
> > I find a 500Mbps difference, due to just one single cache miss on
> > every packet, simply astounding and unbelievable.  But hey, it is
> > what you are seeing, so something has to account for it. :)
> 
> The cache miss only accounts for ~50Mbpsi, it'd be nice if there was an 
> easy way to get the whole 500Mbps back.  The rest seems to be in the 
> general overhead of the GRO code vs the normal NAPI rx path.  The P4 
> Xeon is substantially worse at string operations than the Core 2 / Core i7 
> based Xeons, so I'm hoping to test and see if they do any better with the 
> GRO code when I get access to a new machine soon.

I'm hoping to get onto this tomorrow.  I think there's definitely
something broken because there's no way GRO should be slower than
GRO off.  Slightly slower than LRO perhaps but surely not worse than
not merging at all :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-16 16:38                     ` Herbert Xu
@ 2009-06-17  8:07                       ` Herbert Xu
  2009-06-17  8:08                         ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-06-17  8:07 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David Miller, netdev

On Wed, Jun 17, 2009 at 02:38:56AM +1000, Herbert Xu wrote:
> 
> I'm hoping to get onto this tomorrow.  I think there's definitely
> something broken because there's no way GRO should be slower than
> GRO off.  Slightly slower than LRO perhaps but surely not worse than
> not merging at all :)

OK, I'm seeing the opposite :)

[root@perf21 ~]# netperf -H 172.17.10.22
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.10.22 (172.17.10.22) port 0 AF_INET : spin interval : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    9280.04
[root@perf21 ~]# netperf -H 172.17.10.22
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.17.10.22 (172.17.10.22) port 0 AF_INET : spin interval : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    6224.12
[root@perf21 ~]# 

The corresponding commands on the receiver:

[root@perf22 ~]# ./ethtool -k eth0
Offload parameters for eth0:
Cannot get device flags: Operation not supported
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on
udp fragmentation offload: off
generic segmentation offload: on
generic receive offload: on
large receive offload: off
[root@perf22 ~]# ./ethtool -K eth0 gro off
[root@perf22 ~]# ethtool -i eth0
driver: vxge
version: 2.0.1.17129-k
firmware-version: 0.41.1
bus-info: 0000:04:00.0
[root@perf22 ~]# 

Last CPU on receiver

processor       : 15
vendor_id       : GenuineIntel
cpu family      : 6
model           : 26
model name      : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
stepping        : 5
cpu MHz         : 2933.779
cache size      : 8192 KB
physical id     : 1
siblings        : 8
core id         : 3
cpu cores       : 4
apicid          : 23
initial apicid  : 23
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology tsc_reliable nonstop_tsc pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority ept vpid
bogomips        : 5865.80
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

The kernel is the Linus tree of an hour ago.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-17  8:07                       ` Herbert Xu
@ 2009-06-17  8:08                         ` Herbert Xu
  2009-06-17 20:14                           ` Rick Jones
  0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2009-06-17  8:08 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: David Miller, netdev

On Wed, Jun 17, 2009 at 04:07:30PM +0800, Herbert Xu wrote:
> On Wed, Jun 17, 2009 at 02:38:56AM +1000, Herbert Xu wrote:
> > 
> > I'm hoping to get onto this tomorrow.  I think there's definitely
> > something broken because there's no way GRO should be slower than
> > GRO off.  Slightly slower than LRO perhaps but surely not worse than
> > not merging at all :)
> 
> OK, I'm seeing the opposite :)

Can you check your sender to see whether it's maxed out? That
can play havoc when benchmarking the receiver.

Also what are the raw numbers that you were getting?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [0/14] GRO: Lots of microoptimisations
  2009-06-17  8:08                         ` Herbert Xu
@ 2009-06-17 20:14                           ` Rick Jones
  0 siblings, 0 replies; 32+ messages in thread
From: Rick Jones @ 2009-06-17 20:14 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Benjamin LaHaise, David Miller, netdev

Herbert Xu wrote:
> On Wed, Jun 17, 2009 at 04:07:30PM +0800, Herbert Xu wrote:
> 
>>On Wed, Jun 17, 2009 at 02:38:56AM +1000, Herbert Xu wrote:
>>
>>>I'm hoping to get onto this tomorrow.  I think there's definitely
>>>something broken because there's no way GRO should be slower than
>>>GRO off.  Slightly slower than LRO perhaps but surely not worse than
>>>not merging at all :)
>>
>>OK, I'm seeing the opposite :)
> 
> 
> Can you check your sender to see whether it's maxed out? That
> can play havoc when benchmarking the receiver.

Realted to that, and dealing with the inaccuracy of what classic netperf reports 
for thesocket buffers (your previous message) and wanting to know about the CPUs 
on either side, if one grabs netperf 2.4.5  and ./configure --enable-omni on both 
sides, a command like:

netperf -H 172.17.10.22 -c -C -t omni -- -m 16K -O foo

where the contents of the file foo are:

LSS_SIZE_END,RSR_SIZE_END,LOCAL_SEND_SIZE,ELAPSED_TIME,THROUGHPUT,THROUGHPUT_UNITS
LOCAL_CPU_UTIL,LOCAL_SD,LOCAL_CPU_COUNT,LOCAL_CPU_PEAK_UTIL,LOCAL_CPU_PEAK_ID
REMOTE_CPU_UTIL,REMOTE_SD,REMOTE_CPU_COUNT,REMOTE_CPU_PEAK_UTIL,REMOTE_CPU_PEAK_ID
LOCAL_CPU_MODEL,LOCAL_CPU_FREQUENCY,REMOTE_CPU_MODEL,REMOTE_CPU_FREQUENCY

will result in output that looks like:

raj@tardy:~/netperf2_trunk$ src/netperf -t omni -H localhost -c -C -- -m 16K -O foo
OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost.localdomain 
(127.0.0.1) port 0 AF_INET
Local       Remote      Local  Elapsed Throughput Throughput
Send Socket Recv Socket Send   Time               Units
Size        Size        Size   (sec)
Final       Final
606960      2072576     16384  10.00   3437.47    10^6bits/s

Local   Local   Local Local   Local
CPU     Service CPU   Peak    Peak
Util    Demand  Count Per CPU Per CPU
%                     Util %  ID
100.00  2.383   1     100.00  0

Remote  Remote  Remote Remote  Remote
CPU     Service CPU    Peak    Peak
Util    Demand  Count  Per CPU Per CPU
%                      Util %  ID
100.00  2.383   1      100.00  0

Local                          Local     Remote                         Remote
CPU                            CPU       CPU                            CPU
Model                          Frequency Model                          Frequency
                                MHz                                      MHz
Intel(R) XEON(TM) CPU 2.00GHz  1995      Intel(R) XEON(TM) CPU 2.00GHz  1995

modulo some wrapping,  can also use -o to get one set of CSV or -k to get keyval. 
  The file can have as many as four lines which will be honored by the -O option, 
with -o producing one big line regardless and -k producing one line per keyval 
regardless.  netperf -t omni -- -O \?  will give a list of all the known output 
selections.

So, you can get netperf to tell you about the CPUs, and whether one of them 
pegged on either side.

happy benchmarking,

rick jones

> 
> Also what are the raw numbers that you were getting?
> 
> Cheers,


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2009-06-17 20:14 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-27  4:45 [0/14] GRO: Lots of microoptimisations Herbert Xu
2009-05-27  4:50 ` [PATCH 1/14] gro: Open-code frags copy in skb_gro_receive Herbert Xu
2009-05-27  4:50 ` [PATCH 2/14] gro: Inline skb_gro_header and cache frag0 virtual address Herbert Xu
2009-05-27  4:50 ` [PATCH 3/14] gro: Localise offset/headlen in skb_gro_offset Herbert Xu
2009-05-27  4:50 ` [PATCH 4/14] gro: Only use skb_gro_header for completely non-linear packets Herbert Xu
2009-05-27  4:50 ` [PATCH 5/14] tcp: Optimise GRO port comparisons Herbert Xu
2009-05-27  4:50 ` [PATCH 6/14] tcp: Remove unnecessary window comparisons for GRO Herbert Xu
2009-05-27  4:50 ` [PATCH 7/14] tcp: Optimise len/mss comparison Herbert Xu
2009-05-27  4:50 ` [PATCH 8/14] gro: Optimise length comparison in skb_gro_header Herbert Xu
2009-05-27  4:50 ` [PATCH 9/14] gro: Avoid unnecessary comparison after skb_gro_header Herbert Xu
2009-05-27  4:50 ` [PATCH 10/14] ipv4: Use 32-bit loads for ID and length in GRO Herbert Xu
2009-05-27 18:00   ` Andi Kleen
2009-05-27 21:26     ` Herbert Xu
2009-05-27  4:50 ` [PATCH 11/14] gro: Open-code final pskb_may_pull Herbert Xu
2009-05-27  4:50 ` [PATCH 12/14] gro: Nasty optimisations for page frags in skb_gro_receive Herbert Xu
2009-05-27  4:50 ` [PATCH 13/14] gro: Store shinfo in local variable " Herbert Xu
2009-05-27  4:50 ` [PATCH 14/14] tcp: Do not check flush when comparing options for GRO Herbert Xu
2009-05-27 10:42 ` [0/14] GRO: Lots of microoptimisations David Miller
2009-05-27 17:52 ` Benjamin LaHaise
2009-05-27 23:08   ` Herbert Xu
2009-05-28 15:21     ` Benjamin LaHaise
2009-05-29  9:28       ` Herbert Xu
2009-05-29  9:29         ` Herbert Xu
2009-05-29 16:23           ` Benjamin LaHaise
2009-06-10  5:44             ` Herbert Xu
2009-06-12 16:09               ` Benjamin LaHaise
2009-06-12 23:48                 ` David Miller
2009-06-16 16:35                   ` Benjamin LaHaise
2009-06-16 16:38                     ` Herbert Xu
2009-06-17  8:07                       ` Herbert Xu
2009-06-17  8:08                         ` Herbert Xu
2009-06-17 20:14                           ` Rick Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.