[RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool
@ 2014-11-27  0:05 Alexander Duyck
  2014-11-27  0:05 ` [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag Alexander Duyck
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Alexander Duyck @ 2014-11-27  0:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast

This patch series implements a means of allocating page fragments without
the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
this I am able to decrease packet processing time by 11ns per packet in my
test environment.

---

Alexander Duyck (3):
      net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
      net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb
      fm10k/igb/ixgbe: Use napi_alloc_skb


 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    4 -
 drivers/net/ethernet/intel/igb/igb_main.c     |    3 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 -
 include/linux/skbuff.h                        |   11 ++
 net/core/dev.c                                |    2 
 net/core/skbuff.c                             |  160 ++++++++++++++++++-------
 6 files changed, 133 insertions(+), 51 deletions(-)

--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
  2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
@ 2014-11-27  0:05 ` Alexander Duyck
  2014-11-27  5:29   ` Alexei Starovoitov
  2014-11-27  0:06 ` [RFC PATCH 2/3] net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb Alexander Duyck
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Alexander Duyck @ 2014-11-27  0:05 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast

This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache.  By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context.  The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.

I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag.  First I updated the allocation to do either a 32K
allocation or an order 0 page.  Then I also rewrote the logic to work from
the end of the page to the start.  By doing this the size value doesn't
have to be used unless we have run out of space for page fragments.
Finally I cleaned up the atomic bits so that we just do an
atomic_sub_return and if that returns 0 then we set the page->_count via an
atomic_set.  This way we can remove the extra conditional for the
atomic_read since it would have led to an atomic_inc in the case of success
anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    2 +
 net/core/skbuff.c      |   86 +++++++++++++++++++++++++++---------------------
 2 files changed, 51 insertions(+), 37 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6333835..e596efa 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2184,6 +2184,8 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 	return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC);
 }
 
+void *napi_alloc_frag(unsigned int fragsz);
+
 /**
  * __dev_alloc_pages - allocate page for network Rx
  * @gfp_mask: allocation priority. Set __GFP_NOMEMALLOC if not for network Rx
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 92116df..6dd2b44 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -336,59 +336,60 @@ struct netdev_alloc_cache {
 	unsigned int		pagecnt_bias;
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
+static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
 
-static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+static void *__alloc_page_frag(struct netdev_alloc_cache __percpu *cache,
+			       unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc;
-	void *data = NULL;
-	int order;
-	unsigned long flags;
+	struct netdev_alloc_cache *nc = this_cpu_ptr(cache);
 
-	local_irq_save(flags);
-	nc = this_cpu_ptr(&netdev_alloc_cache);
 	if (unlikely(!nc->frag.page)) {
 refill:
-		for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
-			gfp_t gfp = gfp_mask;
-
-			if (order)
-				gfp |= __GFP_COMP | __GFP_NOWARN;
-			nc->frag.page = alloc_pages(gfp, order);
-			if (likely(nc->frag.page))
-				break;
-			if (--order < 0)
-				goto end;
+		nc->frag.size = NETDEV_FRAG_PAGE_MAX_SIZE;
+		nc->frag.page = alloc_pages_node(NUMA_NO_NODE,
+						 gfp_mask |
+						 __GFP_COMP |
+						 __GFP_NOWARN,
+						 NETDEV_FRAG_PAGE_MAX_ORDER);
+		if (unlikely(!nc->frag.page)) {
+			nc->frag.size = PAGE_SIZE;
+			nc->frag.page = alloc_pages_node(NUMA_NO_NODE,
+							 gfp_mask, 0);
+			if (unlikely(!nc->frag.page))
+				return NULL;
 		}
-		nc->frag.size = PAGE_SIZE << order;
+
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
-		atomic_add(NETDEV_PAGECNT_MAX_BIAS - 1,
-			   &nc->frag.page->_count);
+		atomic_add(NETDEV_PAGECNT_MAX_BIAS - 1, &nc->frag.page->_count);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
-		nc->frag.offset = 0;
+		nc->frag.offset = nc->frag.size;
 	}
 
-	if (nc->frag.offset + fragsz > nc->frag.size) {
-		if (atomic_read(&nc->frag.page->_count) != nc->pagecnt_bias) {
-			if (!atomic_sub_and_test(nc->pagecnt_bias,
-						 &nc->frag.page->_count))
-				goto refill;
-			/* OK, page count is 0, we can safely set it */
-			atomic_set(&nc->frag.page->_count,
-				   NETDEV_PAGECNT_MAX_BIAS);
-		} else {
-			atomic_add(NETDEV_PAGECNT_MAX_BIAS - nc->pagecnt_bias,
-				   &nc->frag.page->_count);
-		}
+	if (nc->frag.offset < fragsz) {
+		if (atomic_sub_return(nc->pagecnt_bias, &nc->frag.page->_count))
+			goto refill;
+
+		/* OK, page count is 0, we can safely set it */
+		atomic_set(&nc->frag.page->_count, NETDEV_PAGECNT_MAX_BIAS);
 		nc->pagecnt_bias = NETDEV_PAGECNT_MAX_BIAS;
-		nc->frag.offset = 0;
+		nc->frag.offset = nc->frag.size;
 	}
 
-	data = page_address(nc->frag.page) + nc->frag.offset;
-	nc->frag.offset += fragsz;
+	nc->frag.offset -= fragsz;
 	nc->pagecnt_bias--;
-end:
+
+	return page_address(nc->frag.page) + nc->frag.offset;
+}
+
+static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+{
+	unsigned long flags;
+	void *data;
+
+	local_irq_save(flags);
+	data = __alloc_page_frag(&netdev_alloc_cache, fragsz, gfp_mask);
 	local_irq_restore(flags);
 	return data;
 }
@@ -406,6 +407,17 @@ void *netdev_alloc_frag(unsigned int fragsz)
 }
 EXPORT_SYMBOL(netdev_alloc_frag);
 
+static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
+{
+	return __alloc_page_frag(&napi_alloc_cache, fragsz, gfp_mask);
+}
+
+void *napi_alloc_frag(unsigned int fragsz)
+{
+	return __napi_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+}
+EXPORT_SYMBOL(napi_alloc_frag);
+
 /**
  *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
  *	@dev: network device to receive on

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/3] net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb
  2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
  2014-11-27  0:05 ` [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag Alexander Duyck
@ 2014-11-27  0:06 ` Alexander Duyck
  2014-11-27  0:06 ` [RFC PATCH 3/3] fm10k/igb/ixgbe: Use napi_alloc_skb Alexander Duyck
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Duyck @ 2014-11-27  0:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast

This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb.  The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from.  If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used.  The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.

In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    9 ++++++
 net/core/dev.c         |    2 +
 net/core/skbuff.c      |   74 +++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e596efa..67de1d0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -151,6 +151,7 @@ struct net_device;
 struct scatterlist;
 struct pipe_inode_info;
 struct iov_iter;
+struct napi_struct;
 
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 struct nf_conntrack {
@@ -674,6 +675,7 @@ struct sk_buff {
 
 #define SKB_ALLOC_FCLONE	0x01
 #define SKB_ALLOC_RX		0x02
+#define SKB_ALLOC_NAPI		0x04
 
 /* Returns true if the skb was allocated from PFMEMALLOC reserves */
 static inline bool skb_pfmemalloc(const struct sk_buff *skb)
@@ -2185,6 +2187,13 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 }
 
 void *napi_alloc_frag(unsigned int fragsz);
+struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
+				 unsigned int length, gfp_t gfp_mask);
+static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
+					     unsigned int length)
+{
+	return __napi_alloc_skb(napi, length, GFP_ATOMIC);
+}
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index ac48362..ff636ad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4172,7 +4172,7 @@ struct sk_buff *napi_get_frags(struct napi_struct *napi)
 	struct sk_buff *skb = napi->skb;
 
 	if (!skb) {
-		skb = netdev_alloc_skb_ip_align(napi->dev, GRO_MAX_HEAD);
+		skb = napi_alloc_skb(napi, GRO_MAX_HEAD);
 		napi->skb = skb;
 	}
 	return skb;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6dd2b44..397efd8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -419,10 +419,13 @@ void *napi_alloc_frag(unsigned int fragsz)
 EXPORT_SYMBOL(napi_alloc_frag);
 
 /**
- *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
- *	@dev: network device to receive on
+ *	__alloc_rx_skb - allocate an skbuff for rx
  *	@length: length to allocate
  *	@gfp_mask: get_free_pages mask, passed to alloc_skb
+ *	@flags:	If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
+ *		allocations in case we have to fallback to __alloc_skb()
+ *		If SKB_ALLOC_NAPI is set, page fragment will be allocated
+ *		from napi_cache instead of netdev_cache.
  *
  *	Allocate a new &sk_buff and assign it a usage count of one. The
  *	buffer has unspecified headroom built in. Users should allocate
@@ -431,11 +434,11 @@ EXPORT_SYMBOL(napi_alloc_frag);
  *
  *	%NULL is returned if there is no free memory.
  */
-struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
-				   unsigned int length, gfp_t gfp_mask)
+static struct sk_buff *__alloc_rx_skb(unsigned int length, gfp_t gfp_mask,
+				      int flags)
 {
 	struct sk_buff *skb = NULL;
-	unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) +
+	unsigned int fragsz = SKB_DATA_ALIGN(length) +
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
 	if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
@@ -444,7 +447,9 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 		if (sk_memalloc_socks())
 			gfp_mask |= __GFP_MEMALLOC;
 
-		data = __netdev_alloc_frag(fragsz, gfp_mask);
+		data = (flags & SKB_ALLOC_NAPI) ?
+			__napi_alloc_frag(fragsz, gfp_mask) :
+			__netdev_alloc_frag(fragsz, gfp_mask);
 
 		if (likely(data)) {
 			skb = build_skb(data, fragsz);
@@ -452,17 +457,72 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
 				put_page(virt_to_head_page(data));
 		}
 	} else {
-		skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
+		skb = __alloc_skb(length, gfp_mask,
 				  SKB_ALLOC_RX, NUMA_NO_NODE);
 	}
+	return skb;
+}
+
+/**
+ *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
+ *	@dev: network device to receive on
+ *	@length: length to allocate
+ *	@gfp_mask: get_free_pages mask, passed to alloc_skb
+ *
+ *	Allocate a new &sk_buff and assign it a usage count of one. The
+ *	buffer has NET_SKB_PAD headroom built in. Users should allocate
+ *	the headroom they think they need without accounting for the
+ *	built in space. The built in space is used for optimisations.
+ *
+ *	%NULL is returned if there is no free memory.
+ */
+struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
+				   unsigned int length, gfp_t gfp_mask)
+{
+	struct sk_buff *skb;
+
+	length += NET_SKB_PAD;
+	skb = __alloc_rx_skb(length, gfp_mask, 0);
+
 	if (likely(skb)) {
 		skb_reserve(skb, NET_SKB_PAD);
 		skb->dev = dev;
 	}
+
 	return skb;
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
 
+/**
+ *	__napi_alloc_skb - allocate skbuff for rx in a specific NAPI instance
+ *	@napi: napi instance this buffer was allocated for
+ *	@length: length to allocate
+ *	@gfp_mask: get_free_pages mask, passed to alloc_skb and alloc_pages
+ *
+ *	Allocate a new sk_buff for use in NAPI receive.  This buffer will
+ *	attempt to allocate the head from a special reserved region used
+ *	only for NAPI Rx allocation.  By doing this we can save several
+ *	CPU cycles by avoiding having to disable and re-enable IRQs.
+ *
+ *	%NULL is returned if there is no free memory.
+ */
+struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
+				 unsigned int length, gfp_t gfp_mask)
+{
+	struct sk_buff *skb;
+
+	length += NET_SKB_PAD + NET_IP_ALIGN;
+	skb = __alloc_rx_skb(length, gfp_mask, SKB_ALLOC_NAPI);
+
+	if (likely(skb)) {
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb->dev = napi->dev;
+	}
+
+	return skb;
+}
+EXPORT_SYMBOL(__napi_alloc_skb);
+
 void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
 		     int size, unsigned int truesize)
 {

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/3] fm10k/igb/ixgbe: Use napi_alloc_skb
  2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
  2014-11-27  0:05 ` [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag Alexander Duyck
  2014-11-27  0:06 ` [RFC PATCH 2/3] net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb Alexander Duyck
@ 2014-11-27  0:06 ` Alexander Duyck
  2014-11-27 12:00 ` [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Jesper Dangaard Brouer
  2014-12-03  3:30 ` David Miller
  4 siblings, 0 replies; 7+ messages in thread
From: Alexander Duyck @ 2014-11-27  0:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, brouer, jeffrey.t.kirsher, eric.dumazet, ast

This change replaces calls to netdev_alloc_skb_ip_align with
napi_alloc_skb.  The advantage of napi_alloc_skb is currently the fact that
the page allocation doesn't make use of any irq disable calls.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c |    4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c     |    3 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 3acdec3..7eff7c6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -310,8 +310,8 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct fm10k_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						FM10K_RX_HDR_LEN);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi,
+				     FM10K_RX_HDR_LEN);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_failed++;
 			return NULL;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 536ef9d..922d78c 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6638,8 +6638,7 @@ static struct sk_buff *igb_fetch_rx_buffer(struct igb_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						IGB_RX_HDR_LEN);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi, IGB_RX_HDR_LEN);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_failed++;
 			return NULL;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index a195d24..03df91f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1858,8 +1858,8 @@ static struct sk_buff *ixgbe_fetch_rx_buffer(struct ixgbe_ring *rx_ring,
 #endif
 
 		/* allocate a skb to store the frags */
-		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
-						IXGBE_RX_HDR_SIZE);
+		skb = napi_alloc_skb(&rx_ring->q_vector->napi,
+				     IXGBE_RX_HDR_SIZE);
 		if (unlikely(!skb)) {
 			rx_ring->rx_stats.alloc_rx_buff_failed++;
 			return NULL;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
  2014-11-27  0:05 ` [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag Alexander Duyck
@ 2014-11-27  5:29   ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2014-11-27  5:29 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Network Development, David S. Miller, Jesper Dangaard Brouer,
	Jeff Kirsher, Eric Dumazet

On Wed, Nov 26, 2014 at 4:05 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> This patch splits the netdev_alloc_frag function up so that it can be used
> on one of two page frag pools instead of being fixed on the
> netdev_alloc_cache.  By doing this we can add a NAPI specific function
> __napi_alloc_frag that accesses a pool that is only used from softirq
> context.  The advantage to this is that we do not need to call
> local_irq_save/restore which can be a significant savings.
>
> I also took the opportunity to refactor the core bits that were placed in
> __alloc_page_frag.  First I updated the allocation to do either a 32K
> allocation or an order 0 page.  Then I also rewrote the logic to work from
> the end of the page to the start.  By doing this the size value doesn't
> have to be used unless we have run out of space for page fragments.
> Finally I cleaned up the atomic bits so that we just do an
> atomic_sub_return and if that returns 0 then we set the page->_count via an
> atomic_set.  This way we can remove the extra conditional for the
> atomic_read since it would have led to an atomic_inc in the case of success
> anyway.

Nice simplification. Complicated, but looks good to me.
I think only replacement of loop with 32k+page begs
better explanation in the commit log. I'm guessing you're
killing intermediate sizes to simplify the code?

Thank you for doing it. It's a great improvement.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool
  2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
                   ` (2 preceding siblings ...)
  2014-11-27  0:06 ` [RFC PATCH 3/3] fm10k/igb/ixgbe: Use napi_alloc_skb Alexander Duyck
@ 2014-11-27 12:00 ` Jesper Dangaard Brouer
  2014-12-03  3:30 ` David Miller
  4 siblings, 0 replies; 7+ messages in thread
From: Jesper Dangaard Brouer @ 2014-11-27 12:00 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, eric.dumazet, ast, brouer

On Wed, 26 Nov 2014 16:05:50 -0800
Alexander Duyck <alexander.h.duyck@redhat.com> wrote:

> This patch series implements a means of allocating page fragments without
> the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
> this I am able to decrease packet processing time by 11ns per packet in my
> test environment.

This is really good work!

I've tested the patchset (detail see below).  Two different packet
sizes 64bytes and 272bytes, due to "copy-break" point in driver.

Notice, these tests are single flow, resulting in single CPU getting
activated on receiver.

If I drop packets very early in iptables "raw" table, I see an
improvement 10.51 ns to 13.22 ns (for 272bytes between 9.64 ns to 11.97
ns).  Which corrospond with Alex'es observations.

A little surprising, when doing full forwarding (IP-routing), I see a
much larger "nanosec" improvement, for 64bytes of between 47.64ns to
58.15ns (for 272bytes between 29.08ns to 30.14ns).  This improvement is
larger than I expected.  One pitfall is with full forwarding, we can
only forwards approx 1Mpps (single CPU), and the accuracy between tests
runs vary more.

Setup
-----
Generator: ixgbe, pktgen (3x CPUs), sending 10G wirespeed
 - Single flow pktgen, resulting in single CPU activation on target
 - pkt@64bytes:  tx:14900856 pps (wirespeed)
 - pkt@272bytes: tx: 4228696 pps (wirespeed)

Ethernet wirespeed:
 * (1/((64+20)*8))*(10*10^9)  = 14880952
 * (1/((272+20)*8))*(10*10^9) =  4280822

Receiver CPU E5-2695 running state-c0@2.8GHz

baseline
--------

Baseline: Full forwarding (no-netfilter):

 * pkt@64bytes: tx:977414 pps
 * pkt@64bytes: tx:974404 pps
 * test-variation@64bytes: 3010pps (1/977414*10^9)-(1/974404*10^9) = -3.16ns

 * pkt@272bytes: tx:911657 pps
 * pkt@272bytes: tx:906229 pps
 * test-variation@272bytes: 5428pps -6.57ns

Baseline: Drop in iptables RAW:

 * pkt@64bytes: rx:2801058 pps
 * pkt@64bytes: rx:2785579 pps
 * test-variation@64bytes: 15479pps -1.98 ns

 * pkt@272bytes: rx:2559718 pps
 * pkt@272bytes: rx:2544577 pps
 * test-variation@64bytes diff: 6230pps 0.746ns

With patch: alex'es napi_alloc_skb
----------------------------------

Full forwarding (no-netfilter) (pkt@64bytes):

 * pkt@64bytes: tx:1025150 pps
 * pkt@64bytes: tx:1032930 pps
 * test-variation@64bytes: -7780pps 7.34ns
 * Patchset improvements@64-fwd:
 - 977414 -> 1025150 = 47736pps -> 47.64ns
 - 974404 -> 1032930 = 58526pps -> 58.15ns

 * pkt@272bytes: tx:937416 pps
 * pkt@272bytes: tx:930761 pps
 * test-variation@272bytes: 6655pps -7.62ns
 * Patchset improvements@272-fwd:
  - 911657 -> 937416 = 25759pps -> 30.14ns
  - 906229 -> 930761 = 24532pps -> 29.08ns

Drop in iptables RAW (pkt@64bytes):

 * pkt@64bytes: rx:2885820 pps
 * pkt@64bytes: rx:2892050 pps
 * test-variation@64bytes diff: 6230pps 0.746ns
 * Patchset improvements@64-drop:
  - 2800896 -> 2885820 =  84924pps -> 10.51 ns
  - 2785579 -> 2892050 = 106471pps -> 13.22 ns

 * pkt@272bytes: rx:2624484 pps
 * pkt@272bytes: rx:2624492 pps
 * test-variation: pkt@272bytes diff: 8pps 0ns
 * Patchset improvements@272-drop:
  - 2624484 -> 2559718 = 64766 pps ->  9.64 ns
  - 2624492 -> 2544577 = 79915 pps -> 11.97 ns


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool
  2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
                   ` (3 preceding siblings ...)
  2014-11-27 12:00 ` [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Jesper Dangaard Brouer
@ 2014-12-03  3:30 ` David Miller
  4 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2014-12-03  3:30 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev, brouer, jeffrey.t.kirsher, eric.dumazet, ast

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Wed, 26 Nov 2014 16:05:50 -0800

> This patch series implements a means of allocating page fragments without
> the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
> this I am able to decrease packet processing time by 11ns per packet in my
> test environment.

No fundamental objections from me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-03  3:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-27  0:05 [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Alexander Duyck
2014-11-27  0:05 ` [RFC PATCH 1/3] net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag Alexander Duyck
2014-11-27  5:29   ` Alexei Starovoitov
2014-11-27  0:06 ` [RFC PATCH 2/3] net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb Alexander Duyck
2014-11-27  0:06 ` [RFC PATCH 3/3] fm10k/igb/ixgbe: Use napi_alloc_skb Alexander Duyck
2014-11-27 12:00 ` [RFC PATCH 0/3] net: Alloc NAPI page frags from their own pool Jesper Dangaard Brouer
2014-12-03  3:30 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.