netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] Refactor netdev page frags and move them into mm/
@ 2015-05-07  4:11 Alexander Duyck
  2015-05-07  4:11 ` [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page Alexander Duyck
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:11 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

This patch series addresses several things.

First I found an issue in the performance of the pfmemalloc check from
build_skb.  To work around it I have provided a cached copy of pfmemalloc
to be used in __netdev_alloc_skb and __napi_alloc_skb.

Second I moved the page fragment allocation logic into the mm tree and
added functionality for freeing page fragments.  I had to fix igb before I
could do this as it was using a reference to NETDEV_FRAG_PAGE_MAX_SIZE
incorrectly.

Finally I went through and replaced all of the duplicate code that was
calling put_page and replaced it with calls to skb_free_frag.

With these changes in place a simple receive and drop test increased from a
packet rate of 8.9Mpps to 9.8Mpps.  The gains breakdown as follows:

8.9Mpps	Before			9.8Mpps	After
------------------------	------------------------
7.8%	put_compound_page	9.1%	__free_page_frag
3.9%	skb_free_head
1.1%	put_page

4.9%	build_skb		3.8%	__napi_alloc_skb
2.5%	__alloc_rx_skb
1.9%	__napi_alloc_skb

---

Alexander Duyck (10):
      net: Use cached copy of pfmemalloc to avoid accessing page
      igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation
      net: Store virtual address instead of page in netdev_alloc_cache
      mm/net: Rename and move page fragment handling from net/ to mm/
      net: Add skb_free_frag to replace use of put_page in freeing skb->head
      netcp: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag
      mvneta: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag
      e1000: Replace e1000_free_frag with skb_free_frag
      hisilicon: Replace put_page(virt_to_head_page()) with skb_free_frag()
      bnx2x, tg3: Replace put_page(virt_to_head_page()) with skb_free_frag()


 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    2 
 drivers/net/ethernet/broadcom/tg3.c             |    2 
 drivers/net/ethernet/hisilicon/hip04_eth.c      |    2 
 drivers/net/ethernet/intel/e1000/e1000_main.c   |   19 +-
 drivers/net/ethernet/intel/igb/igb_main.c       |   11 -
 drivers/net/ethernet/marvell/mvneta.c           |    2 
 drivers/net/ethernet/ti/netcp_core.c            |    2 
 include/linux/gfp.h                             |    5 +
 include/linux/mm_types.h                        |   18 ++
 include/linux/skbuff.h                          |    9 +
 mm/page_alloc.c                                 |   98 ++++++++++
 net/core/skbuff.c                               |  224 ++++++++---------------
 12 files changed, 223 insertions(+), 171 deletions(-)

--

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
@ 2015-05-07  4:11 ` Alexander Duyck
  2015-05-10 23:18   ` David Miller
  2015-05-07  4:11 ` [PATCH 02/10] igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation Alexander Duyck
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:11 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

While testing I found that the testing for pfmemalloc in build_skb was
rather expensive.  I found the issue to be two-fold.  First we have to get
from the virtual address to the head page and that comes at the cost of
something like 11 cycles.  Then there is the cost for reading pfmemalloc out
of the head page which can be cache cold due to the fact that
put_page_testzero is likely invalidating the cache-line on one or more
CPUs as the fragments can be shared.

To avoid this extra expense I have added a pfmemalloc member to the
netdev_alloc_cache.  I then pushed pieces of __alloc_rx_skb into
__napi_alloc_skb and __netdev_alloc_skb so that I could rewrite them to
make use of the cached pfmemalloc value.  The result is that my perf traces
show a reduction from 9.28% overhead to 3.7% for the code covered by
build_skb, __alloc_rx_skb, and __napi_alloc_skb when performing a test with
the packet being dropped instead of being handed to napi_gro_receive.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 net/core/skbuff.c |  141 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 79 insertions(+), 62 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b9eb90b39ac7..d6851ca32598 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -353,6 +353,7 @@ struct netdev_alloc_cache {
 	 * containing page->_count every time we allocate a fragment.
 	 */
 	unsigned int		pagecnt_bias;
+	bool pfmemalloc;
 };
 static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
 static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
@@ -379,10 +380,9 @@ static struct page *__page_frag_refill(struct netdev_alloc_cache *nc,
 	return page;
 }
 
-static void *__alloc_page_frag(struct netdev_alloc_cache __percpu *cache,
+static void *__alloc_page_frag(struct netdev_alloc_cache *nc,
 			       unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc = this_cpu_ptr(cache);
 	struct page *page = nc->frag.page;
 	unsigned int size;
 	int offset;
@@ -402,6 +402,7 @@ refill:
 		atomic_add(size - 1, &page->_count);
 
 		/* reset page count bias and offset to start of new frag */
+		nc->pfmemalloc = page->pfmemalloc;
 		nc->pagecnt_bias = size;
 		nc->frag.offset = size;
 	}
@@ -430,11 +431,13 @@ refill:
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
+	struct netdev_alloc_cache *nc;
 	unsigned long flags;
 	void *data;
 
 	local_irq_save(flags);
-	data = __alloc_page_frag(&netdev_alloc_cache, fragsz, gfp_mask);
+	nc = this_cpu_ptr(&netdev_alloc_cache);
+	data = __alloc_page_frag(nc, fragsz, gfp_mask);
 	local_irq_restore(flags);
 	return data;
 }
@@ -454,7 +457,9 @@ EXPORT_SYMBOL(netdev_alloc_frag);
 
 static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
-	return __alloc_page_frag(&napi_alloc_cache, fragsz, gfp_mask);
+	struct netdev_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+	return __alloc_page_frag(nc, fragsz, gfp_mask);
 }
 
 void *napi_alloc_frag(unsigned int fragsz)
@@ -464,76 +469,64 @@ void *napi_alloc_frag(unsigned int fragsz)
 EXPORT_SYMBOL(napi_alloc_frag);
 
 /**
- *	__alloc_rx_skb - allocate an skbuff for rx
+ *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
+ *	@dev: network device to receive on
  *	@length: length to allocate
  *	@gfp_mask: get_free_pages mask, passed to alloc_skb
- *	@flags:	If SKB_ALLOC_RX is set, __GFP_MEMALLOC will be used for
- *		allocations in case we have to fallback to __alloc_skb()
- *		If SKB_ALLOC_NAPI is set, page fragment will be allocated
- *		from napi_cache instead of netdev_cache.
  *
  *	Allocate a new &sk_buff and assign it a usage count of one. The
- *	buffer has unspecified headroom built in. Users should allocate
+ *	buffer has NET_SKB_PAD headroom built in. Users should allocate
  *	the headroom they think they need without accounting for the
  *	built in space. The built in space is used for optimisations.
  *
  *	%NULL is returned if there is no free memory.
  */
-static struct sk_buff *__alloc_rx_skb(unsigned int length, gfp_t gfp_mask,
-				      int flags)
+struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
+				   gfp_t gfp_mask)
 {
-	struct sk_buff *skb = NULL;
-	unsigned int fragsz = SKB_DATA_ALIGN(length) +
-			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	struct netdev_alloc_cache *nc;
+	unsigned long flags;
+	struct sk_buff *skb;
+	bool pfmemalloc;
+	void *data;
 
-	if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) {
-		void *data;
+	len += NET_SKB_PAD;
 
-		if (sk_memalloc_socks())
-			gfp_mask |= __GFP_MEMALLOC;
+	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	    (gfp_mask & (__GFP_WAIT | GFP_DMA)))
+		return __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 
-		data = (flags & SKB_ALLOC_NAPI) ?
-			__napi_alloc_frag(fragsz, gfp_mask) :
-			__netdev_alloc_frag(fragsz, gfp_mask);
+	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	len = SKB_DATA_ALIGN(len);
 
-		if (likely(data)) {
-			skb = build_skb(data, fragsz);
-			if (unlikely(!skb))
-				put_page(virt_to_head_page(data));
-		}
-	} else {
-		skb = __alloc_skb(length, gfp_mask,
-				  SKB_ALLOC_RX, NUMA_NO_NODE);
-	}
-	return skb;
-}
+	if (sk_memalloc_socks())
+		gfp_mask |= __GFP_MEMALLOC;
 
-/**
- *	__netdev_alloc_skb - allocate an skbuff for rx on a specific device
- *	@dev: network device to receive on
- *	@length: length to allocate
- *	@gfp_mask: get_free_pages mask, passed to alloc_skb
- *
- *	Allocate a new &sk_buff and assign it a usage count of one. The
- *	buffer has NET_SKB_PAD headroom built in. Users should allocate
- *	the headroom they think they need without accounting for the
- *	built in space. The built in space is used for optimisations.
- *
- *	%NULL is returned if there is no free memory.
- */
-struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
-				   unsigned int length, gfp_t gfp_mask)
-{
-	struct sk_buff *skb;
+	local_irq_save(flags);
 
-	length += NET_SKB_PAD;
-	skb = __alloc_rx_skb(length, gfp_mask, 0);
+	nc = this_cpu_ptr(&netdev_alloc_cache);
+	data = __alloc_page_frag(nc, len, gfp_mask);
+	pfmemalloc = nc->pfmemalloc;
 
-	if (likely(skb)) {
-		skb_reserve(skb, NET_SKB_PAD);
-		skb->dev = dev;
+	local_irq_restore(flags);
+
+	if (unlikely(!data))
+		return NULL;
+
+	skb = __build_skb(data, len);
+	if (unlikely(!skb)) {
+		put_page(virt_to_head_page(data));
+		return NULL;
 	}
 
+	/* use OR instead of assignment to avoid clearing of bits in mask */
+	if (pfmemalloc)
+		skb->pfmemalloc = 1;
+	skb->head_frag = 1;
+
+	skb_reserve(skb, NET_SKB_PAD);
+	skb->dev = dev;
+
 	return skb;
 }
 EXPORT_SYMBOL(__netdev_alloc_skb);
@@ -551,19 +544,43 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
  *
  *	%NULL is returned if there is no free memory.
  */
-struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
-				 unsigned int length, gfp_t gfp_mask)
+struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
+				 gfp_t gfp_mask)
 {
+	struct netdev_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 	struct sk_buff *skb;
+	void *data;
+
+	len += NET_SKB_PAD + NET_IP_ALIGN;
+
+	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	    (gfp_mask & (__GFP_WAIT | GFP_DMA)))
+		return __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
+
+	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	len = SKB_DATA_ALIGN(len);
 
-	length += NET_SKB_PAD + NET_IP_ALIGN;
-	skb = __alloc_rx_skb(length, gfp_mask, SKB_ALLOC_NAPI);
+	if (sk_memalloc_socks())
+		gfp_mask |= __GFP_MEMALLOC;
+
+	data = __alloc_page_frag(nc, len, gfp_mask);
+	if (unlikely(!data))
+		return NULL;
 
-	if (likely(skb)) {
-		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
-		skb->dev = napi->dev;
+	skb = __build_skb(data, len);
+	if (unlikely(!skb)) {
+		put_page(virt_to_head_page(data));
+		return NULL;
 	}
 
+	/* use OR instead of assignment to avoid clearing of bits in mask */
+	if (nc->pfmemalloc)
+		skb->pfmemalloc = 1;
+	skb->head_frag = 1;
+
+	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+	skb->dev = napi->dev;
+
 	return skb;
 }
 EXPORT_SYMBOL(__napi_alloc_skb);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/10] igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
  2015-05-07  4:11 ` [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page Alexander Duyck
@ 2015-05-07  4:11 ` Alexander Duyck
  2015-05-07  4:11 ` [PATCH 03/10] net: Store virtual address instead of page in netdev_alloc_cache Alexander Duyck
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:11 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

This change updates igb so that it will correctly perform the descriptor
count calculation.  Previously it was taking NETDEV_FRAG_PAGE_MAX_SIZE
into account with isn't really correct since a different value is used to
determine the size of the pages used for TCP.  That is actually determined
by SKB_FRAG_PAGE_ORDER.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |   11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index d3b4098fce48..1394642e4859 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -4974,6 +4974,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	struct igb_tx_buffer *first;
 	int tso;
 	u32 tx_flags = 0;
+	unsigned short f;
 	u16 count = TXD_USE_COUNT(skb_headlen(skb));
 	__be16 protocol = vlan_get_protocol(skb);
 	u8 hdr_len = 0;
@@ -4984,14 +4985,8 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	 *       + 1 desc for context descriptor,
 	 * otherwise try next time
 	 */
-	if (NETDEV_FRAG_PAGE_MAX_SIZE > IGB_MAX_DATA_PER_TXD) {
-		unsigned short f;
-
-		for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
-			count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size);
-	} else {
-		count += skb_shinfo(skb)->nr_frags;
-	}
+	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
+		count += TXD_USE_COUNT(skb_shinfo(skb)->frags[f].size);
 
 	if (igb_maybe_stop_tx(tx_ring, count + 3)) {
 		/* this is a hard error */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/10] net: Store virtual address instead of page in netdev_alloc_cache
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
  2015-05-07  4:11 ` [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page Alexander Duyck
  2015-05-07  4:11 ` [PATCH 02/10] igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation Alexander Duyck
@ 2015-05-07  4:11 ` Alexander Duyck
  2015-05-07  4:11 ` [PATCH 04/10] mm/net: Rename and move page fragment handling from net/ to mm/ Alexander Duyck
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:11 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

This change makes it so that we store the virtual address of the page
in the netdev_alloc_cache instead of the page pointer.  The idea behind
this is to avoid multiple calls to page_address since the virtual address
is required for every access, but the page pointer is only needed at
allocation or reset of the page.

While I was at it I also reordered the netdev_alloc_cache structure a bit
so that the size is always 16 bytes by dropping size in the case where
PAGE_SIZE is greater than or equal to 32KB.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    5 ++--
 net/core/skbuff.c      |   55 ++++++++++++++++++++++++++++--------------------
 2 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9c2f793573fa..8b9a2c35a9d7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2128,9 +2128,8 @@ static inline void __skb_queue_purge(struct sk_buff_head *list)
 		kfree_skb(skb);
 }
 
-#define NETDEV_FRAG_PAGE_MAX_ORDER get_order(32768)
-#define NETDEV_FRAG_PAGE_MAX_SIZE  (PAGE_SIZE << NETDEV_FRAG_PAGE_MAX_ORDER)
-#define NETDEV_PAGECNT_MAX_BIAS	   NETDEV_FRAG_PAGE_MAX_SIZE
+#define NETDEV_FRAG_PAGE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
+#define NETDEV_FRAG_PAGE_MAX_ORDER	get_order(NETDEV_FRAG_PAGE_MAX_SIZE)
 
 void *netdev_alloc_frag(unsigned int fragsz);
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d6851ca32598..a3062ec341c3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -348,7 +348,13 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size)
 EXPORT_SYMBOL(build_skb);
 
 struct netdev_alloc_cache {
-	struct page_frag	frag;
+	void * va;
+#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
+	__u16 offset;
+	__u16 size;
+#else
+	__u32 offset;
+#endif
 	/* we maintain a pagecount bias, so that we dont dirty cache line
 	 * containing page->_count every time we allocate a fragment.
 	 */
@@ -361,21 +367,20 @@ static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
 static struct page *__page_frag_refill(struct netdev_alloc_cache *nc,
 				       gfp_t gfp_mask)
 {
-	const unsigned int order = NETDEV_FRAG_PAGE_MAX_ORDER;
 	struct page *page = NULL;
 	gfp_t gfp = gfp_mask;
 
-	if (order) {
-		gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
-			    __GFP_NOMEMALLOC;
-		page = alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
-		nc->frag.size = PAGE_SIZE << (page ? order : 0);
-	}
-
+#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
+	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
+		    __GFP_NOMEMALLOC;
+	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
+				NETDEV_FRAG_PAGE_MAX_ORDER);
+	nc->size = page ? NETDEV_FRAG_PAGE_MAX_SIZE : PAGE_SIZE;
+#endif
 	if (unlikely(!page))
 		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
 
-	nc->frag.page = page;
+	nc->va = page ? page_address(page) : NULL;
 
 	return page;
 }
@@ -383,19 +388,20 @@ static struct page *__page_frag_refill(struct netdev_alloc_cache *nc,
 static void *__alloc_page_frag(struct netdev_alloc_cache *nc,
 			       unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct page *page = nc->frag.page;
-	unsigned int size;
+	unsigned int size = PAGE_SIZE;
+	struct page *page;
 	int offset;
 
-	if (unlikely(!page)) {
+	if (unlikely(!nc->va)) {
 refill:
 		page = __page_frag_refill(nc, gfp_mask);
 		if (!page)
 			return NULL;
 
-		/* if size can vary use frag.size else just use PAGE_SIZE */
-		size = NETDEV_FRAG_PAGE_MAX_ORDER ? nc->frag.size : PAGE_SIZE;
-
+#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
+		/* if size can vary use size else just use PAGE_SIZE */
+		size = nc->size;
+#endif
 		/* Even if we own the page, we do not use atomic_set().
 		 * This would break get_page_unless_zero() users.
 		 */
@@ -404,17 +410,20 @@ refill:
 		/* reset page count bias and offset to start of new frag */
 		nc->pfmemalloc = page->pfmemalloc;
 		nc->pagecnt_bias = size;
-		nc->frag.offset = size;
+		nc->offset = size;
 	}
 
-	offset = nc->frag.offset - fragsz;
+	offset = nc->offset - fragsz;
 	if (unlikely(offset < 0)) {
+		page = virt_to_page(nc->va);
+
 		if (!atomic_sub_and_test(nc->pagecnt_bias, &page->_count))
 			goto refill;
 
-		/* if size can vary use frag.size else just use PAGE_SIZE */
-		size = NETDEV_FRAG_PAGE_MAX_ORDER ? nc->frag.size : PAGE_SIZE;
-
+#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
+		/* if size can vary use size else just use PAGE_SIZE */
+		size = nc->size;
+#endif
 		/* OK, page count is 0, we can safely set it */
 		atomic_set(&page->_count, size);
 
@@ -424,9 +433,9 @@ refill:
 	}
 
 	nc->pagecnt_bias--;
-	nc->frag.offset = offset;
+	nc->offset = offset;
 
-	return page_address(page) + offset;
+	return nc->va + offset;
 }
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/10] mm/net: Rename and move page fragment handling from net/ to mm/
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (2 preceding siblings ...)
  2015-05-07  4:11 ` [PATCH 03/10] net: Store virtual address instead of page in netdev_alloc_cache Alexander Duyck
@ 2015-05-07  4:11 ` Alexander Duyck
  2015-05-07  4:12 ` [PATCH 05/10] net: Add skb_free_frag to replace use of put_page in freeing skb->head Alexander Duyck
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:11 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

This change moves the __alloc_page_frag functionality out of the networking
stack and into the page allocation portion of mm.  The idea it so help make
this maintainable by placing it with other page allocation functions.

Since we are moving it from skbuff.c to page_alloc.c I have also renamed
the basic defines and structure from netdev_alloc_cache to page_frag_cache
to reflect that this is now part of a different kernel subsystem.

I have also added a simple __free_page_frag function which can handle
freeing the frags based on the skb->head pointer.  The model for this is
based off of __free_pages since we don't actually need to deal with all of
the cases that put_page handles.  I incorporated the virt_to_head_page call
and compound_order into the function as it actually allows for a signficant
size reduction by reducing code duplication.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/gfp.h      |    5 ++
 include/linux/mm_types.h |   18 ++++++++
 include/linux/skbuff.h   |    3 -
 mm/page_alloc.c          |   98 +++++++++++++++++++++++++++++++++++++++++++++
 net/core/skbuff.c        |  100 +++-------------------------------------------
 5 files changed, 127 insertions(+), 97 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 97a9373e61e8..70a7fee1efb3 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -366,6 +366,11 @@ extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_hot_cold_page(struct page *page, bool cold);
 extern void free_hot_cold_page_list(struct list_head *list, bool cold);
 
+struct page_frag_cache;
+extern void *__alloc_page_frag(struct page_frag_cache *nc,
+			       unsigned int fragsz, gfp_t gfp_mask);
+extern void __free_page_frag(void *addr);
+
 extern void __free_kmem_pages(struct page *page, unsigned int order);
 extern void free_kmem_pages(unsigned long addr, unsigned int order);
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8d37e26a1007..0038ac7466fd 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -226,6 +226,24 @@ struct page_frag {
 #endif
 };
 
+#define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
+#define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
+
+struct page_frag_cache {
+	void * va;
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+	__u16 offset;
+	__u16 size;
+#else
+	__u32 offset;
+#endif
+	/* we maintain a pagecount bias, so that we dont dirty cache line
+	 * containing page->_count every time we allocate a fragment.
+	 */
+	unsigned int		pagecnt_bias;
+	bool pfmemalloc;
+};
+
 typedef unsigned long __nocast vm_flags_t;
 
 /*
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8b9a2c35a9d7..0039fcc45b3b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2128,9 +2128,6 @@ static inline void __skb_queue_purge(struct sk_buff_head *list)
 		kfree_skb(skb);
 }
 
-#define NETDEV_FRAG_PAGE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
-#define NETDEV_FRAG_PAGE_MAX_ORDER	get_order(NETDEV_FRAG_PAGE_MAX_SIZE)
-
 void *netdev_alloc_frag(unsigned int fragsz);
 
 struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ebffa0e4a9c0..2fd31aebef30 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2967,6 +2967,104 @@ void free_pages(unsigned long addr, unsigned int order)
 EXPORT_SYMBOL(free_pages);
 
 /*
+ * Page Fragment:
+ *  An arbitrary-length arbitrary-offset area of memory which resides
+ *  within a 0 or higher order page.  Multiple fragments within that page
+ *  are individually refcounted, in the page's reference counter.
+ *
+ * The page_frag functions below provide a simple allocation framework for
+ * page fragments.  This is used by the network stack and network device
+ * drivers to provide a backing region of memory for use as either an
+ * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
+ */
+static struct page *__page_frag_refill(struct page_frag_cache *nc,
+				       gfp_t gfp_mask)
+{
+	struct page *page = NULL;
+	gfp_t gfp = gfp_mask;
+
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
+		    __GFP_NOMEMALLOC;
+	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
+				PAGE_FRAG_CACHE_MAX_ORDER);
+	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
+#endif
+	if (unlikely(!page))
+		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
+
+	nc->va = page ? page_address(page) : NULL;
+
+	return page;
+}
+
+void *__alloc_page_frag(struct page_frag_cache *nc,
+			unsigned int fragsz, gfp_t gfp_mask)
+{
+	unsigned int size = PAGE_SIZE;
+	struct page *page;
+	int offset;
+
+	if (unlikely(!nc->va)) {
+refill:
+		page = __page_frag_refill(nc, gfp_mask);
+		if (!page)
+			return NULL;
+
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+		/* if size can vary use size else just use PAGE_SIZE */
+		size = nc->size;
+#endif
+		/* Even if we own the page, we do not use atomic_set().
+		 * This would break get_page_unless_zero() users.
+		 */
+		atomic_add(size - 1, &page->_count);
+
+		/* reset page count bias and offset to start of new frag */
+		nc->pfmemalloc = page->pfmemalloc;
+		nc->pagecnt_bias = size;
+		nc->offset = size;
+	}
+
+	offset = nc->offset - fragsz;
+	if (unlikely(offset < 0)) {
+		page = virt_to_page(nc->va);
+
+		if (!atomic_sub_and_test(nc->pagecnt_bias, &page->_count))
+			goto refill;
+
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+		/* if size can vary use size else just use PAGE_SIZE */
+		size = nc->size;
+#endif
+		/* OK, page count is 0, we can safely set it */
+		atomic_set(&page->_count, size);
+
+		/* reset page count bias and offset to start of new frag */
+		nc->pagecnt_bias = size;
+		offset = size - fragsz;
+	}
+
+	nc->pagecnt_bias--;
+	nc->offset = offset;
+
+	return nc->va + offset;
+}
+EXPORT_SYMBOL(__alloc_page_frag);
+
+/*
+ * Frees a page fragment allocated out of either a compound or order 0 page.
+ */
+void __free_page_frag(void *addr)
+{
+	struct page *page = virt_to_head_page(addr);
+
+	if (unlikely(put_page_testzero(page)))
+		__free_pages_ok(page, compound_order(page));
+}
+EXPORT_SYMBOL(__free_page_frag);
+
+/*
  * alloc_kmem_pages charges newly allocated pages to the kmem resource counter
  * of the current memory cgroup.
  *
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a3062ec341c3..dcc0e07abf47 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -347,100 +347,12 @@ struct sk_buff *build_skb(void *data, unsigned int frag_size)
 }
 EXPORT_SYMBOL(build_skb);
 
-struct netdev_alloc_cache {
-	void * va;
-#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
-	__u16 offset;
-	__u16 size;
-#else
-	__u32 offset;
-#endif
-	/* we maintain a pagecount bias, so that we dont dirty cache line
-	 * containing page->_count every time we allocate a fragment.
-	 */
-	unsigned int		pagecnt_bias;
-	bool pfmemalloc;
-};
-static DEFINE_PER_CPU(struct netdev_alloc_cache, netdev_alloc_cache);
-static DEFINE_PER_CPU(struct netdev_alloc_cache, napi_alloc_cache);
-
-static struct page *__page_frag_refill(struct netdev_alloc_cache *nc,
-				       gfp_t gfp_mask)
-{
-	struct page *page = NULL;
-	gfp_t gfp = gfp_mask;
-
-#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
-	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
-		    __GFP_NOMEMALLOC;
-	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
-				NETDEV_FRAG_PAGE_MAX_ORDER);
-	nc->size = page ? NETDEV_FRAG_PAGE_MAX_SIZE : PAGE_SIZE;
-#endif
-	if (unlikely(!page))
-		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
-
-	nc->va = page ? page_address(page) : NULL;
-
-	return page;
-}
-
-static void *__alloc_page_frag(struct netdev_alloc_cache *nc,
-			       unsigned int fragsz, gfp_t gfp_mask)
-{
-	unsigned int size = PAGE_SIZE;
-	struct page *page;
-	int offset;
-
-	if (unlikely(!nc->va)) {
-refill:
-		page = __page_frag_refill(nc, gfp_mask);
-		if (!page)
-			return NULL;
-
-#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
-		/* Even if we own the page, we do not use atomic_set().
-		 * This would break get_page_unless_zero() users.
-		 */
-		atomic_add(size - 1, &page->_count);
-
-		/* reset page count bias and offset to start of new frag */
-		nc->pfmemalloc = page->pfmemalloc;
-		nc->pagecnt_bias = size;
-		nc->offset = size;
-	}
-
-	offset = nc->offset - fragsz;
-	if (unlikely(offset < 0)) {
-		page = virt_to_page(nc->va);
-
-		if (!atomic_sub_and_test(nc->pagecnt_bias, &page->_count))
-			goto refill;
-
-#if (PAGE_SIZE < NETDEV_FRAG_PAGE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
-		/* OK, page count is 0, we can safely set it */
-		atomic_set(&page->_count, size);
-
-		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
-		offset = size - fragsz;
-	}
-
-	nc->pagecnt_bias--;
-	nc->offset = offset;
-
-	return nc->va + offset;
-}
+static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
+static DEFINE_PER_CPU(struct page_frag_cache, napi_alloc_cache);
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc;
+	struct page_frag_cache *nc;
 	unsigned long flags;
 	void *data;
 
@@ -466,7 +378,7 @@ EXPORT_SYMBOL(netdev_alloc_frag);
 
 static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
 	return __alloc_page_frag(nc, fragsz, gfp_mask);
 }
@@ -493,7 +405,7 @@ EXPORT_SYMBOL(napi_alloc_frag);
 struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 				   gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc;
+	struct page_frag_cache *nc;
 	unsigned long flags;
 	struct sk_buff *skb;
 	bool pfmemalloc;
@@ -556,7 +468,7 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct netdev_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 	struct sk_buff *skb;
 	void *data;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/10] net: Add skb_free_frag to replace use of put_page in freeing skb->head
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (3 preceding siblings ...)
  2015-05-07  4:11 ` [PATCH 04/10] mm/net: Rename and move page fragment handling from net/ to mm/ Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-07  4:12 ` [PATCH 06/10] netcp: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag Alexander Duyck
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

This change adds a function called skb_free_frag which is meant to
compliment the function netdev_alloc_frag.  The general idea is to enable a
more lightweight version of page freeing since we don't actually need all
the overhead of a put_page, and we don't quite fit the model of __free_pages.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 include/linux/skbuff.h |    5 +++++
 net/core/skbuff.c      |   10 ++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 0039fcc45b3b..c0b574a414e7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2182,6 +2182,11 @@ static inline struct sk_buff *netdev_alloc_skb_ip_align(struct net_device *dev,
 	return __netdev_alloc_skb_ip_align(dev, length, GFP_ATOMIC);
 }
 
+static inline void skb_free_frag(void *addr)
+{
+	__free_page_frag(addr);
+}
+
 void *napi_alloc_frag(unsigned int fragsz);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi,
 				 unsigned int length, gfp_t gfp_mask);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dcc0e07abf47..d67e612bf0ef 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -436,7 +436,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 
 	skb = __build_skb(data, len);
 	if (unlikely(!skb)) {
-		put_page(virt_to_head_page(data));
+		skb_free_frag(data);
 		return NULL;
 	}
 
@@ -490,7 +490,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 
 	skb = __build_skb(data, len);
 	if (unlikely(!skb)) {
-		put_page(virt_to_head_page(data));
+		skb_free_frag(data);
 		return NULL;
 	}
 
@@ -549,10 +549,12 @@ static void skb_clone_fraglist(struct sk_buff *skb)
 
 static void skb_free_head(struct sk_buff *skb)
 {
+	unsigned char *head = skb->head;
+
 	if (skb->head_frag)
-		put_page(virt_to_head_page(skb->head));
+		skb_free_frag(head);
 	else
-		kfree(skb->head);
+		kfree(head);
 }
 
 static void skb_release_data(struct sk_buff *skb)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/10] netcp: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (4 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 05/10] net: Add skb_free_frag to replace use of put_page in freeing skb->head Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-07  4:12 ` [PATCH 07/10] mvneta: " Alexander Duyck
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/ti/netcp_core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 43efc3a0cda5..0a28c07361cf 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -537,7 +537,7 @@ int netcp_unregister_rxhook(struct netcp_intf *netcp_priv, int order,
 static void netcp_frag_free(bool is_frag, void *ptr)
 {
 	if (is_frag)
-		put_page(virt_to_head_page(ptr));
+		skb_free_frag(ptr);
 	else
 		kfree(ptr);
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/10] mvneta: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (5 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 06/10] netcp: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-07  4:12 ` [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag Alexander Duyck
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ce5f7f9cff06..ecce8261ce3b 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1359,7 +1359,7 @@ static void *mvneta_frag_alloc(const struct mvneta_port *pp)
 static void mvneta_frag_free(const struct mvneta_port *pp, void *data)
 {
 	if (likely(pp->frag_size <= PAGE_SIZE))
-		put_page(virt_to_head_page(data));
+		skb_free_frag(data);
 	else
 		kfree(data);
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (6 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 07/10] mvneta: " Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-07  8:46   ` Jeff Kirsher
  2015-05-07  4:12 ` [PATCH 09/10] hisilicon: Replace put_page(virt_to_head_page()) with skb_free_frag() Alexander Duyck
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/intel/e1000/e1000_main.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 983eb4e6f7aa..74dc15055971 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2079,11 +2079,6 @@ static void *e1000_alloc_frag(const struct e1000_adapter *a)
 	return data;
 }
 
-static void e1000_free_frag(const void *data)
-{
-	put_page(virt_to_head_page(data));
-}
-
 /**
  * e1000_clean_rx_ring - Free Rx Buffers per Queue
  * @adapter: board private structure
@@ -2107,7 +2102,7 @@ static void e1000_clean_rx_ring(struct e1000_adapter *adapter,
 						 adapter->rx_buffer_len,
 						 DMA_FROM_DEVICE);
 			if (buffer_info->rxbuf.data) {
-				e1000_free_frag(buffer_info->rxbuf.data);
+				skb_free_frag(buffer_info->rxbuf.data);
 				buffer_info->rxbuf.data = NULL;
 			}
 		} else if (adapter->clean_rx == e1000_clean_jumbo_rx_irq) {
@@ -4594,28 +4589,28 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 			data = e1000_alloc_frag(adapter);
 			/* Failed allocation, critical failure */
 			if (!data) {
-				e1000_free_frag(olddata);
+				skb_free_frag(olddata);
 				adapter->alloc_rx_buff_failed++;
 				break;
 			}
 
 			if (!e1000_check_64k_bound(adapter, data, bufsz)) {
 				/* give up */
-				e1000_free_frag(data);
-				e1000_free_frag(olddata);
+				skb_free_frag(data);
+				skb_free_frag(olddata);
 				adapter->alloc_rx_buff_failed++;
 				break;
 			}
 
 			/* Use new allocation */
-			e1000_free_frag(olddata);
+			skb_free_frag(olddata);
 		}
 		buffer_info->dma = dma_map_single(&pdev->dev,
 						  data,
 						  adapter->rx_buffer_len,
 						  DMA_FROM_DEVICE);
 		if (dma_mapping_error(&pdev->dev, buffer_info->dma)) {
-			e1000_free_frag(data);
+			skb_free_frag(data);
 			buffer_info->dma = 0;
 			adapter->alloc_rx_buff_failed++;
 			break;
@@ -4637,7 +4632,7 @@ static void e1000_alloc_rx_buffers(struct e1000_adapter *adapter,
 					 adapter->rx_buffer_len,
 					 DMA_FROM_DEVICE);
 
-			e1000_free_frag(data);
+			skb_free_frag(data);
 			buffer_info->rxbuf.data = NULL;
 			buffer_info->dma = 0;
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/10] hisilicon: Replace put_page(virt_to_head_page()) with skb_free_frag()
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (7 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-07  4:12 ` [PATCH 10/10] bnx2x, tg3: " Alexander Duyck
  2015-05-10 23:17 ` [PATCH 00/10] Refactor netdev page frags and move them into mm/ David Miller
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/hisilicon/hip04_eth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
index 3b39fdddeb57..d49bee38cd31 100644
--- a/drivers/net/ethernet/hisilicon/hip04_eth.c
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -798,7 +798,7 @@ static void hip04_free_ring(struct net_device *ndev, struct device *d)
 
 	for (i = 0; i < RX_DESC_NUM; i++)
 		if (priv->rx_buf[i])
-			put_page(virt_to_head_page(priv->rx_buf[i]));
+			skb_free_frag(priv->rx_buf[i]);
 
 	for (i = 0; i < TX_DESC_NUM; i++)
 		if (priv->tx_skb[i])

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/10] bnx2x, tg3: Replace put_page(virt_to_head_page()) with skb_free_frag()
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (8 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 09/10] hisilicon: Replace put_page(virt_to_head_page()) with skb_free_frag() Alexander Duyck
@ 2015-05-07  4:12 ` Alexander Duyck
  2015-05-10 23:17 ` [PATCH 00/10] Refactor netdev page frags and move them into mm/ David Miller
  10 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2015-05-07  4:12 UTC (permalink / raw)
  To: netdev, linux-mm; +Cc: akpm, davem, eric.dumazet

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |    2 +-
 drivers/net/ethernet/broadcom/tg3.c             |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index a8bb8f664d3d..b10d1744e5ae 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -662,7 +662,7 @@ static int bnx2x_fill_frag_skb(struct bnx2x *bp, struct bnx2x_fastpath *fp,
 static void bnx2x_frag_free(const struct bnx2x_fastpath *fp, void *data)
 {
 	if (fp->rx_frag_size)
-		put_page(virt_to_head_page(data));
+		skb_free_frag(data);
 	else
 		kfree(data);
 }
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 069952fa5d64..73c934cf6c61 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6618,7 +6618,7 @@ static void tg3_tx(struct tg3_napi *tnapi)
 static void tg3_frag_free(bool is_frag, void *data)
 {
 	if (is_frag)
-		put_page(virt_to_head_page(data));
+		skb_free_frag(data);
 	else
 		kfree(data);
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag
  2015-05-07  4:12 ` [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag Alexander Duyck
@ 2015-05-07  8:46   ` Jeff Kirsher
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Kirsher @ 2015-05-07  8:46 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, linux-mm, Andrew Morton, David Miller, Eric Dumazet

On Wed, May 6, 2015 at 9:12 PM, Alexander Duyck
<alexander.h.duyck@redhat.com> wrote:
> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c |   19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)

Was my ACK of the first patch not good enough? :-)

Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] Refactor netdev page frags and move them into mm/
  2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
                   ` (9 preceding siblings ...)
  2015-05-07  4:12 ` [PATCH 10/10] bnx2x, tg3: " Alexander Duyck
@ 2015-05-10 23:17 ` David Miller
  2015-05-11 20:36   ` Andrew Morton
  10 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2015-05-10 23:17 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev, linux-mm, akpm, eric.dumazet

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Wed, 06 May 2015 21:11:34 -0700

> This patch series addresses several things.
> 
> First I found an issue in the performance of the pfmemalloc check from
> build_skb.  To work around it I have provided a cached copy of pfmemalloc
> to be used in __netdev_alloc_skb and __napi_alloc_skb.
> 
> Second I moved the page fragment allocation logic into the mm tree and
> added functionality for freeing page fragments.  I had to fix igb before I
> could do this as it was using a reference to NETDEV_FRAG_PAGE_MAX_SIZE
> incorrectly.
> 
> Finally I went through and replaced all of the duplicate code that was
> calling put_page and replaced it with calls to skb_free_frag.
> 
> With these changes in place a simple receive and drop test increased from a
> packet rate of 8.9Mpps to 9.8Mpps.  The gains breakdown as follows:
> 
> 8.9Mpps	Before			9.8Mpps	After
> ------------------------	------------------------
> 7.8%	put_compound_page	9.1%	__free_page_frag
> 3.9%	skb_free_head
> 1.1%	put_page
> 
> 4.9%	build_skb		3.8%	__napi_alloc_skb
> 2.5%	__alloc_rx_skb
> 1.9%	__napi_alloc_skb

I like this series, but again I need to see feedback from some
mm folks before I can consider applying it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page
  2015-05-07  4:11 ` [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page Alexander Duyck
@ 2015-05-10 23:18   ` David Miller
  2015-05-11  0:01     ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2015-05-10 23:18 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: netdev, linux-mm, akpm, eric.dumazet

From: Alexander Duyck <alexander.h.duyck@redhat.com>
Date: Wed, 06 May 2015 21:11:40 -0700

> +	/* use OR instead of assignment to avoid clearing of bits in mask */
> +	if (pfmemalloc)
> +		skb->pfmemalloc = 1;
> +	skb->head_frag = 1;
 ...
> +	/* use OR instead of assignment to avoid clearing of bits in mask */
> +	if (nc->pfmemalloc)
> +		skb->pfmemalloc = 1;
> +	skb->head_frag = 1;

Maybe make these two cases more consistent by either accessing
nc->pfmemalloc or using a local variable in both cases.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page
  2015-05-10 23:18   ` David Miller
@ 2015-05-11  0:01     ` Alexander Duyck
  2015-05-11  0:52       ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2015-05-11  0:01 UTC (permalink / raw)
  To: David Miller, alexander.h.duyck; +Cc: netdev, linux-mm, akpm, eric.dumazet

On 05/10/2015 04:18 PM, David Miller wrote:
> From: Alexander Duyck <alexander.h.duyck@redhat.com>
> Date: Wed, 06 May 2015 21:11:40 -0700
>
>> +	/* use OR instead of assignment to avoid clearing of bits in mask */
>> +	if (pfmemalloc)
>> +		skb->pfmemalloc = 1;
>> +	skb->head_frag = 1;
>   ...
>> +	/* use OR instead of assignment to avoid clearing of bits in mask */
>> +	if (nc->pfmemalloc)
>> +		skb->pfmemalloc = 1;
>> +	skb->head_frag = 1;
> Maybe make these two cases more consistent by either accessing
> nc->pfmemalloc or using a local variable in both cases.

The only option would be to use a local variable in both cases, but then 
I am still stuck with the differences in when I can access the caches.

The reason for the difference between the two is that in the case of 
netdev_alloc_skb/frag the netdev_alloc_cache can only be accessed with 
IRQs disabled, whereas in the napi_alloc_skb case we can access the 
napi_alloc_cache at any point in the function.  Either way I am going to 
be stuck with differences because of the local_irq_save/restore that 
must be called when accessing the page frag cache that doesn't exist in 
the napi case.

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page
  2015-05-11  0:01     ` Alexander Duyck
@ 2015-05-11  0:52       ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-05-11  0:52 UTC (permalink / raw)
  To: alexander.duyck; +Cc: alexander.h.duyck, netdev, linux-mm, akpm, eric.dumazet

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Sun, 10 May 2015 17:01:15 -0700

> The reason for the difference between the two is that in the case of
> netdev_alloc_skb/frag the netdev_alloc_cache can only be accessed with
> IRQs disabled, whereas in the napi_alloc_skb case we can access the
> napi_alloc_cache at any point in the function.  Either way I am going
> to be stuck with differences because of the local_irq_save/restore
> that must be called when accessing the page frag cache that doesn't
> exist in the napi case.

I see, thanks for explaining.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] Refactor netdev page frags and move them into mm/
  2015-05-10 23:17 ` [PATCH 00/10] Refactor netdev page frags and move them into mm/ David Miller
@ 2015-05-11 20:36   ` Andrew Morton
  2015-05-12 14:39     ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2015-05-11 20:36 UTC (permalink / raw)
  To: David Miller; +Cc: alexander.h.duyck, netdev, linux-mm, eric.dumazet

On Sun, 10 May 2015 19:17:58 -0400 (EDT) David Miller <davem@davemloft.net> wrote:

> > 4.9%	build_skb		3.8%	__napi_alloc_skb
> > 2.5%	__alloc_rx_skb
> > 1.9%	__napi_alloc_skb
> 
> I like this series, but again I need to see feedback from some
> mm folks before I can consider applying it.

The MM part looks OK to me - it's largely moving code out of net/ into
mm/.  It's a bit weird and it's unclear whether the code will gain
other callers, but putting it in mm/ increase the likelihood that some other
subsystem will use it.

Please merge it via a net tree when ready.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] Refactor netdev page frags and move them into mm/
  2015-05-11 20:36   ` Andrew Morton
@ 2015-05-12 14:39     ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-05-12 14:39 UTC (permalink / raw)
  To: akpm; +Cc: alexander.h.duyck, netdev, linux-mm, eric.dumazet

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 11 May 2015 13:36:52 -0700

> On Sun, 10 May 2015 19:17:58 -0400 (EDT) David Miller <davem@davemloft.net> wrote:
> 
>> > 4.9%	build_skb		3.8%	__napi_alloc_skb
>> > 2.5%	__alloc_rx_skb
>> > 1.9%	__napi_alloc_skb
>> 
>> I like this series, but again I need to see feedback from some
>> mm folks before I can consider applying it.
> 
> The MM part looks OK to me - it's largely moving code out of net/ into
> mm/.  It's a bit weird and it's unclear whether the code will gain
> other callers, but putting it in mm/ increase the likelihood that some other
> subsystem will use it.
> 
> Please merge it via a net tree when ready.

Ok, will do that now, thanks Andrew!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-12 14:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-07  4:11 [PATCH 00/10] Refactor netdev page frags and move them into mm/ Alexander Duyck
2015-05-07  4:11 ` [PATCH 01/10] net: Use cached copy of pfmemalloc to avoid accessing page Alexander Duyck
2015-05-10 23:18   ` David Miller
2015-05-11  0:01     ` Alexander Duyck
2015-05-11  0:52       ` David Miller
2015-05-07  4:11 ` [PATCH 02/10] igb: Don't use NETDEV_FRAG_PAGE_MAX_SIZE in descriptor calculation Alexander Duyck
2015-05-07  4:11 ` [PATCH 03/10] net: Store virtual address instead of page in netdev_alloc_cache Alexander Duyck
2015-05-07  4:11 ` [PATCH 04/10] mm/net: Rename and move page fragment handling from net/ to mm/ Alexander Duyck
2015-05-07  4:12 ` [PATCH 05/10] net: Add skb_free_frag to replace use of put_page in freeing skb->head Alexander Duyck
2015-05-07  4:12 ` [PATCH 06/10] netcp: Replace put_page(virt_to_head_page(ptr)) w/ skb_free_frag Alexander Duyck
2015-05-07  4:12 ` [PATCH 07/10] mvneta: " Alexander Duyck
2015-05-07  4:12 ` [PATCH 08/10] e1000: Replace e1000_free_frag with skb_free_frag Alexander Duyck
2015-05-07  8:46   ` Jeff Kirsher
2015-05-07  4:12 ` [PATCH 09/10] hisilicon: Replace put_page(virt_to_head_page()) with skb_free_frag() Alexander Duyck
2015-05-07  4:12 ` [PATCH 10/10] bnx2x, tg3: " Alexander Duyck
2015-05-10 23:17 ` [PATCH 00/10] Refactor netdev page frags and move them into mm/ David Miller
2015-05-11 20:36   ` Andrew Morton
2015-05-12 14:39     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).