[PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context
@ 2015-10-23 12:46 Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 1/4] net: bulk free infrastructure for NAPI context, use napi_consume_skb Jesper Dangaard Brouer
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2015-10-23 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, linux-mm, Jesper Dangaard Brouer, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

It have been a long road. Back in July 2014 I realized that network
stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs, but
had no solution.  In Dec 2014 I had implemented a solution called
qmempool[1], that showed it was possible to improve this, but got
rejected due to being a cache on top of kmem_cache.  In July 2015
improvements to kmem_cache were proposed, and recently Oct 2015 my
kmem_cache (SLAB+SLUB) patches for bulk alloc and free have been
accepted into the AKPM quilt tree.

This patchset is the first real use-case kmem_cache bulk alloc and free.
And is joint work with Alexander Duyck while still at Red Hat.

Using bulk free to avoid the SLUB slowpath shows the full potential.
In this patchset it is realized in NAPI/softirq context.  1. During
DMA TX completion bulk free is optimal and does not introduce any
added latency. 2. bulk free of SKBs delay free'ed due to IRQ context
in net_tx_action softirq completion queue.

Using bulk alloc is showing minor improvements for SLUB(+0.9%), but a
very slight slowdown for SLAB(-0.1%).

[1] http://thread.gmane.org/gmane.linux.network/342347/focus=126138

This patchset is based on net-next (commit 26440c835), BUT I've
applied several patches from AKPMs MM-tree.

Cherrypick some commits from MMOTM tree on branch/tag mmotm-2015-10-06-16-30
from git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
(Below commit IDs are obviously not stable)

Pickup my own MM-changes:
 b0aa3e95ce82 ("slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG")
 114a2b37847c ("slab: implement bulking for SLAB allocator")
 606397476e8b ("slub: support for bulk free with SLUB freelists")
 ee29cd6a570c ("slub: optimize bulk slowpath free by detached freelist")
 491c6e0ca89d ("slub-optimize-bulk-slowpath-free-by-detached-freelist-fix")

Pickup slab.h changes:
 d9a47e0b776b ("compiler.h: add support for function attribute assume_aligned")
 1c3a5c789b4f ("slab.h: sprinkle __assume_aligned attributes")

Wanted Kirill A. Shutemov's changes as they change virt_to_head_page(),
had to apply patches manually from http://ozlabs.org/~akpm/mmotm/
(stamp-2015-10-20-16-33) as AKPM made several small fixes.

---

Jesper Dangaard Brouer (4):
      net: bulk free infrastructure for NAPI context, use napi_consume_skb
      net: bulk free SKBs that were delay free'ed due to IRQ context
      ixgbe: bulk free SKBs during TX completion cleanup cycle
      net: bulk alloc and reuse of SKBs in NAPI context

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 +
 include/linux/skbuff.h                        |    4 +
 net/core/dev.c                                |    9 ++
 net/core/skbuff.c                             |  122 +++++++++++++++++++++++--
 4 files changed, 127 insertions(+), 14 deletions(-)

--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/4] net: bulk free infrastructure for NAPI context, use napi_consume_skb
  2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
@ 2015-10-23 12:46 ` Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 2/4] net: bulk free SKBs that were delay free'ed due to IRQ context Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2015-10-23 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, linux-mm, Jesper Dangaard Brouer, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

Discovered that network stack were hitting the kmem_cache/SLUB
slowpath when freeing SKBs.  Doing bulk free with kmem_cache_free_bulk
can speedup this slowpath.

NAPI context is a bit special, lets take advantage of that for bulk
free'ing SKBs.

In NAPI context we are running in softirq, which gives us certain
protection.  A softirq can run on several CPUs at once.  BUT the
important part is a softirq will never preempt another softirq running
on the same CPU.  This gives us the opportunity to access per-cpu
variables in softirq context.

Extend napi_alloc_cache (before only contained page_frag_cache) to be
a struct with a small array based stack for holding SKBs.  Introduce a
SKB defer and flush API for accessing this.

Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
when running in NAPI context.  A small trick to handle/detect if we
are called from netpoll is to see if budget is 0.  In that case, we
need to invoke dev_consume_skb_irq().

Joint work with Alexander Duyck.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    3 ++
 net/core/dev.c         |    1 +
 net/core/skbuff.c      |   83 +++++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4398411236f1..a3dec82e0e2c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2308,6 +2308,9 @@ static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
 {
 	return __napi_alloc_skb(napi, length, GFP_ATOMIC);
 }
+void napi_consume_skb(struct sk_buff *skb, int budget);
+
+void __kfree_skb_flush(void);
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index 1225b4be8ed6..204059f67154 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4841,6 +4841,7 @@ static void net_rx_action(struct softirq_action *h)
 		}
 	}
 
+	__kfree_skb_flush();
 	local_irq_disable();
 
 	list_splice_tail_init(&sd->poll_list, &list);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index fab4599ba8b2..2682ac46d640 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -347,8 +347,16 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size)
 }
 EXPORT_SYMBOL(build_skb);
 
+#define NAPI_SKB_CACHE_SIZE	64
+
+struct napi_alloc_cache {
+	struct page_frag_cache page;
+	size_t skb_count;
+	void *skb_cache[NAPI_SKB_CACHE_SIZE];
+};
+
 static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
-static DEFINE_PER_CPU(struct page_frag_cache, napi_alloc_cache);
+static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
@@ -378,9 +386,9 @@ EXPORT_SYMBOL(netdev_alloc_frag);
 
 static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
-	return __alloc_page_frag(nc, fragsz, gfp_mask);
+	return __alloc_page_frag(&nc->page, fragsz, gfp_mask);
 }
 
 void *napi_alloc_frag(unsigned int fragsz)
@@ -474,7 +482,7 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 	struct sk_buff *skb;
 	void *data;
 
@@ -494,7 +502,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (sk_memalloc_socks())
 		gfp_mask |= __GFP_MEMALLOC;
 
-	data = __alloc_page_frag(nc, len, gfp_mask);
+	data = __alloc_page_frag(&nc->page, len, gfp_mask);
 	if (unlikely(!data))
 		return NULL;
 
@@ -505,7 +513,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	}
 
 	/* use OR instead of assignment to avoid clearing of bits in mask */
-	if (nc->pfmemalloc)
+	if (nc->page.pfmemalloc)
 		skb->pfmemalloc = 1;
 	skb->head_frag = 1;
 
@@ -747,6 +755,69 @@ void consume_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(consume_skb);
 
+void __kfree_skb_flush(void)
+{
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+	/* flush skb_cache if containing objects */
+	if (nc->skb_count) {
+		kmem_cache_free_bulk(skbuff_head_cache, nc->skb_count,
+				     nc->skb_cache);
+		nc->skb_count = 0;
+	}
+}
+
+static void __kfree_skb_defer(struct sk_buff *skb)
+{
+	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+	/* drop skb->head and call any destructors for packet */
+	skb_release_all(skb);
+
+	/* record skb to CPU local list */
+	nc->skb_cache[nc->skb_count++] = skb;
+
+#ifdef CONFIG_SLUB
+	/* SLUB writes into objects when freeing */
+	prefetchw(skb);
+#endif
+
+	/* flush skb_cache if it is filled */
+	if (unlikely(nc->skb_count == NAPI_SKB_CACHE_SIZE)) {
+		kmem_cache_free_bulk(skbuff_head_cache, NAPI_SKB_CACHE_SIZE,
+				     nc->skb_cache);
+		nc->skb_count = 0;
+	}
+}
+
+void napi_consume_skb(struct sk_buff *skb, int budget)
+{
+	if (unlikely(!skb))
+		return;
+
+	/* if budget is 0 assume netpoll w/ IRQs disabled */
+	if (unlikely(!budget)) {
+		dev_consume_skb_irq(skb);
+		return;
+	}
+
+	if (likely(atomic_read(&skb->users) == 1))
+		smp_rmb();
+	else if (likely(!atomic_dec_and_test(&skb->users)))
+		return;
+	/* if reaching here SKB is ready to free */
+	trace_consume_skb(skb);
+
+	/* if SKB is a clone, don't handle this case */
+	if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) {
+		__kfree_skb(skb);
+		return;
+	}
+
+	__kfree_skb_defer(skb);
+}
+EXPORT_SYMBOL(napi_consume_skb);
+
 /* Make sure a field is enclosed inside headers_start/headers_end section */
 #define CHECK_SKB_FIELD(field) \
 	BUILD_BUG_ON(offsetof(struct sk_buff, field) <		\

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/4] net: bulk free SKBs that were delay free'ed due to IRQ context
  2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 1/4] net: bulk free infrastructure for NAPI context, use napi_consume_skb Jesper Dangaard Brouer
@ 2015-10-23 12:46 ` Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 3/4] ixgbe: bulk free SKBs during TX completion cleanup cycle Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2015-10-23 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, linux-mm, Jesper Dangaard Brouer, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

The network stack defers SKBs free, in-case free happens in IRQ or
when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
writes SKBs that were free'ed during IRQ to the softirq completion
queue (softnet_data.completion_queue).

These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
in function net_tx_action().  Take advantage of this a use the skb
defer and flush API, as we are already in softirq context.

For modern drivers this rarely happens. Although most drivers do call
dev_kfree_skb_any(), which detects the situation and calls
__dev_kfree_skb_irq() when needed.  This due to netpoll can call from
IRQ context.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/skbuff.h |    1 +
 net/core/dev.c         |    8 +++++++-
 net/core/skbuff.c      |    8 ++++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a3dec82e0e2c..f314abff2cbd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2311,6 +2311,7 @@ static inline struct sk_buff *napi_alloc_skb(struct napi_struct *napi,
 void napi_consume_skb(struct sk_buff *skb, int budget);
 
 void __kfree_skb_flush(void);
+void __kfree_skb_defer(struct sk_buff *skb);
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index 204059f67154..0e88397db1fa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3588,8 +3588,14 @@ static void net_tx_action(struct softirq_action *h)
 				trace_consume_skb(skb);
 			else
 				trace_kfree_skb(skb, net_tx_action);
-			__kfree_skb(skb);
+
+			if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
+				__kfree_skb(skb);
+			else
+				__kfree_skb_defer(skb);
 		}
+
+		__kfree_skb_flush();
 	}
 
 	if (sd->output_queue) {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2682ac46d640..2ffcb014e00b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -767,7 +767,7 @@ void __kfree_skb_flush(void)
 	}
 }
 
-static void __kfree_skb_defer(struct sk_buff *skb)
+static inline void _kfree_skb_defer(struct sk_buff *skb)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
@@ -789,6 +789,10 @@ static void __kfree_skb_defer(struct sk_buff *skb)
 		nc->skb_count = 0;
 	}
 }
+void __kfree_skb_defer(struct sk_buff *skb)
+{
+	_kfree_skb_defer(skb);
+}
 
 void napi_consume_skb(struct sk_buff *skb, int budget)
 {
@@ -814,7 +818,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 		return;
 	}
 
-	__kfree_skb_defer(skb);
+	_kfree_skb_defer(skb);
 }
 EXPORT_SYMBOL(napi_consume_skb);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/4] ixgbe: bulk free SKBs during TX completion cleanup cycle
  2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 1/4] net: bulk free infrastructure for NAPI context, use napi_consume_skb Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 2/4] net: bulk free SKBs that were delay free'ed due to IRQ context Jesper Dangaard Brouer
@ 2015-10-23 12:46 ` Jesper Dangaard Brouer
  2015-10-23 12:46 ` [PATCH 4/4] net: bulk alloc and reuse of SKBs in NAPI context Jesper Dangaard Brouer
  2015-10-27  1:09 ` [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack " David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2015-10-23 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, linux-mm, Jesper Dangaard Brouer, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

There is an opportunity to bulk free SKBs during reclaiming of
resources after DMA transmit completes in ixgbe_clean_tx_irq.  Thus,
bulk freeing at this point does not introduce any added latency.

Simply use napi_consume_skb() which were recently introduced.  The
napi_budget parameter is needed by napi_consume_skb() to detect if it
is called from netpoll.

Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
 Single CPU/flow numbers: before: 1982144 pps ->  after : 2064446 pps
 Improvement: +82302 pps, -20 nanosec, +4.1%
 (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))

Joint work with Alexander Duyck.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9f8a7fd7a195..4614dfb3febd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1090,7 +1090,7 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * @tx_ring: tx ring to clean
  **/
 static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring)
+			       struct ixgbe_ring *tx_ring, int napi_budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
@@ -1128,7 +1128,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		total_packets += tx_buffer->gso_segs;
 
 		/* free the skb */
-		dev_consume_skb_any(tx_buffer->skb);
+		napi_consume_skb(tx_buffer->skb, napi_budget);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -2784,7 +2784,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
 	ixgbe_for_each_ring(ring, q_vector->tx)
-		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
+		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring, budget);
 
 	if (!ixgbe_qv_lock_napi(q_vector))
 		return budget;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/4] net: bulk alloc and reuse of SKBs in NAPI context
  2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2015-10-23 12:46 ` [PATCH 3/4] ixgbe: bulk free SKBs during TX completion cleanup cycle Jesper Dangaard Brouer
@ 2015-10-23 12:46 ` Jesper Dangaard Brouer
  2015-10-27  1:09 ` [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack " David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jesper Dangaard Brouer @ 2015-10-23 12:46 UTC (permalink / raw)
  To: netdev
  Cc: Alexander Duyck, linux-mm, Jesper Dangaard Brouer, Joonsoo Kim,
	Andrew Morton, Christoph Lameter

Think twice before applying
 - This patch can potentially introduce added latency in some workloads

This patch introduce bulk alloc of SKBs and allow reuse of SKBs
free'ed in same softirq cycle.  SKBs are normally free'ed during TX
completion, but most high speed drivers also cleanup TX ring during
NAPI RX poll cycle.  Thus, if using napi_consume_skb/__kfree_skb_defer,
SKBs will be avail in the napi_alloc_cache->skb_cache.

If no SKBs are avail for reuse, then only bulk alloc 8 SKBs, to limit
the potential overshooting unused SKBs needed to free'ed when NAPI
cycle ends (flushed in net_rx_action via __kfree_skb_flush()).

Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
(GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))
 Allocator SLUB:
  Single CPU/flow numbers: before: 2064446 pps -> after: 2083031 pps
  Improvement: +18585 pps, -4.3 nanosec, +0.9%
 Allocator SLAB:
  Single CPU/flow numbers: before: 2035949 pps -> after: 2033567 pps
  Regression: -2382 pps, +0.57 nanosec, -0.1 %

Even-though benchmarking does show an improvement for SLUB(+0.9%), I'm
not convinced bulk alloc will be a win in all situations:
 * I see stalls on walking the SLUB freelist (normal hidden by prefetch)
 * In case RX queue is not full, alloc and free more SKBs than needed

More testing is needed with more real life benchmarks.

Joint work with Alexander Duyck.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/core/skbuff.c |   35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2ffcb014e00b..d0e0ccccfb11 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -483,13 +483,14 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
 	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct skb_shared_info *shinfo;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
-	    (gfp_mask & (__GFP_WAIT | GFP_DMA))) {
+	if (unlikely((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+		     (gfp_mask & (__GFP_WAIT | GFP_DMA)))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 		if (!skb)
 			goto skb_fail;
@@ -506,12 +507,38 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 	if (unlikely(!data))
 		return NULL;
 
-	skb = __build_skb(data, len);
-	if (unlikely(!skb)) {
+#define BULK_ALLOC_SIZE 8
+	if (!nc->skb_count &&
+	    kmem_cache_alloc_bulk(skbuff_head_cache, gfp_mask,
+				  BULK_ALLOC_SIZE, nc->skb_cache)) {
+		nc->skb_count = BULK_ALLOC_SIZE;
+	}
+	if (likely(nc->skb_count)) {
+		skb = (struct sk_buff *)nc->skb_cache[--nc->skb_count];
+	} else {
+		/* alloc bulk failed */
 		skb_free_frag(data);
 		return NULL;
 	}
 
+	len -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	skb->truesize = SKB_TRUESIZE(len);
+	atomic_set(&skb->users, 1);
+	skb->head = data;
+	skb->data = data;
+	skb_reset_tail_pointer(skb);
+	skb->end = skb->tail + len;
+	skb->mac_header = (typeof(skb->mac_header))~0U;
+	skb->transport_header = (typeof(skb->transport_header))~0U;
+
+	/* make sure we initialize shinfo sequentially */
+	shinfo = skb_shinfo(skb);
+	memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
+	atomic_set(&shinfo->dataref, 1);
+	kmemcheck_annotate_variable(shinfo->destructor_arg);
+
 	/* use OR instead of assignment to avoid clearing of bits in mask */
 	if (nc->page.pfmemalloc)
 		skb->pfmemalloc = 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context
  2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2015-10-23 12:46 ` [PATCH 4/4] net: bulk alloc and reuse of SKBs in NAPI context Jesper Dangaard Brouer
@ 2015-10-27  1:09 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2015-10-27  1:09 UTC (permalink / raw)
  To: brouer; +Cc: netdev, alexander.duyck, linux-mm, iamjoonsoo.kim, akpm, cl

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 23 Oct 2015 14:46:01 +0200

> It have been a long road. Back in July 2014 I realized that network
> stack were hitting the kmem_cache/SLUB slowpath when freeing SKBs, but
> had no solution.  In Dec 2014 I had implemented a solution called
> qmempool[1], that showed it was possible to improve this, but got
> rejected due to being a cache on top of kmem_cache.  In July 2015
> improvements to kmem_cache were proposed, and recently Oct 2015 my
> kmem_cache (SLAB+SLUB) patches for bulk alloc and free have been
> accepted into the AKPM quilt tree.
> 
> This patchset is the first real use-case kmem_cache bulk alloc and free.
> And is joint work with Alexander Duyck while still at Red Hat.
> 
> Using bulk free to avoid the SLUB slowpath shows the full potential.
> In this patchset it is realized in NAPI/softirq context.  1. During
> DMA TX completion bulk free is optimal and does not introduce any
> added latency. 2. bulk free of SKBs delay free'ed due to IRQ context
> in net_tx_action softirq completion queue.
> 
> Using bulk alloc is showing minor improvements for SLUB(+0.9%), but a
> very slight slowdown for SLAB(-0.1%).
> 
> [1] http://thread.gmane.org/gmane.linux.network/342347/focus=126138
> 
> 
> This patchset is based on net-next (commit 26440c835), BUT I've
> applied several patches from AKPMs MM-tree.
> 
> Cherrypick some commits from MMOTM tree on branch/tag mmotm-2015-10-06-16-30
> from git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
> (Below commit IDs are obviously not stable)

Logically I'm fine with this series, but as you mention there are
dependencies that need to hit upstream before I can merge any of
this stuff into my tree.

I also think that patch #4 is a net-win, and also will expose the
bulking code to more testing since it will be used more often.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-27  0:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-23 12:46 [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack in NAPI context Jesper Dangaard Brouer
2015-10-23 12:46 ` [PATCH 1/4] net: bulk free infrastructure for NAPI context, use napi_consume_skb Jesper Dangaard Brouer
2015-10-23 12:46 ` [PATCH 2/4] net: bulk free SKBs that were delay free'ed due to IRQ context Jesper Dangaard Brouer
2015-10-23 12:46 ` [PATCH 3/4] ixgbe: bulk free SKBs during TX completion cleanup cycle Jesper Dangaard Brouer
2015-10-23 12:46 ` [PATCH 4/4] net: bulk alloc and reuse of SKBs in NAPI context Jesper Dangaard Brouer
2015-10-27  1:09 ` [PATCH 0/4] net: mitigating kmem_cache slowpath for network stack " David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).