netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] page_frag_cache improvements
@ 2018-03-22 15:31 Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

Version 1 was completely wrong-headed and I have repented of the error
of my ways.  Thanks for educating me.

I still think it's possible to improve on the current state of the
page_frag allocator, and here are eight patches, each of which I think
represents an improvement.  They're not all that interlinked, although
there will be textual conflicts, so I'll be happy to revise and drop
any that are not actual improvements.

I have discovered (today), much to my chagrin, that testing using trinity
in KVM doesn't actually test the page_frag allocator.  I don't understand
why not.  So, this turns out to only be compile tested.  Sorry.

The net effect of all these patches is a reduction of four instructions
in the fastpath of the allocator on x86.  The page_frag_cache structure
also shrinks, to as small as 8 bytes on 32-bit with CONFIG_BASE_SMALL.

The last patch is probably wrong.  It'll definitely be inaccurate
because the call to page_frag_free() may not be the call which frees
a page; there's a really unlikely race where the page cache finds a
stale RCU pointer, bumps its refcount, discovers it's not the page it
was looking for and calls put_page(), which might end up being the last
reference count.  We can do something about that inaccuracy, but I don't
even know if this is the best approach to accounting these pages.

Matthew Wilcox (8):
  page_frag_cache: Remove pfmemalloc bool
  page_frag_cache: Move slowpath code from page_frag_alloc
  page_frag_cache: Rename 'nc' to 'pfc'
  page_frag_cache: Rename fragsz to size
  page_frag_cache: Save memory on small machines
  page_frag_cache: Use a mask instead of offset
  page_frag: Update documentation
  page_frag: Account allocations

 Documentation/vm/page_frags     |  42 -----------
 Documentation/vm/page_frags.rst |  24 +++++++
 include/linux/mm_types.h        |  20 ++++--
 include/linux/mmzone.h          |   3 +-
 mm/page_alloc.c                 | 155 ++++++++++++++++++++++++----------------
 net/core/skbuff.c               |   5 +-
 6 files changed, 135 insertions(+), 114 deletions(-)
 delete mode 100644 Documentation/vm/page_frags
 create mode 100644 Documentation/vm/page_frags.rst

-- 
2.16.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 16:39   ` Alexander Duyck
  2018-03-22 15:31 ` [PATCH v2 2/8] page_frag_cache: Move slowpath code from page_frag_alloc Matthew Wilcox
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

Save 4/8 bytes by moving the pfmemalloc indicator from its own bool
to the top bit of pagecnt_bias.  This has no effect on the fastpath
of the allocator since the pagecnt_bias cannot go negative.  It's
a couple of extra instructions in the slowpath.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 4 +++-
 mm/page_alloc.c          | 8 +++++---
 net/core/skbuff.c        | 5 ++---
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..a63b138ad1a4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -218,6 +218,7 @@ struct page {
 
 #define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
 #define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
+#define PFC_MEMALLOC			(1U << 31)
 
 struct page_frag_cache {
 	void * va;
@@ -231,9 +232,10 @@ struct page_frag_cache {
 	 * containing page->_refcount every time we allocate a fragment.
 	 */
 	unsigned int		pagecnt_bias;
-	bool pfmemalloc;
 };
 
+#define page_frag_cache_pfmemalloc(pfc)	((pfc)->pagecnt_bias & PFC_MEMALLOC)
+
 typedef unsigned long vm_flags_t;
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 635d7dd29d7f..61366f23e8c8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4395,16 +4395,18 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		page_ref_add(page, size - 1);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pfmemalloc = page_is_pfmemalloc(page);
 		nc->pagecnt_bias = size;
+		if (page_is_pfmemalloc(page))
+			nc->pagecnt_bias |= PFC_MEMALLOC;
 		nc->offset = size;
 	}
 
 	offset = nc->offset - fragsz;
 	if (unlikely(offset < 0)) {
+		unsigned int pagecnt_bias = nc->pagecnt_bias & ~PFC_MEMALLOC;
 		page = virt_to_page(nc->va);
 
-		if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
+		if (!page_ref_sub_and_test(page, pagecnt_bias))
 			goto refill;
 
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
@@ -4415,7 +4417,7 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		set_page_count(page, size);
 
 		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
+		nc->pagecnt_bias = size | (nc->pagecnt_bias - pagecnt_bias);
 		offset = size - fragsz;
 	}
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0bb0d8877954..54bbde8f7541 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -412,7 +412,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
 
 	nc = this_cpu_ptr(&netdev_alloc_cache);
 	data = page_frag_alloc(nc, len, gfp_mask);
-	pfmemalloc = nc->pfmemalloc;
+	pfmemalloc = page_frag_cache_pfmemalloc(nc);
 
 	local_irq_restore(flags);
 
@@ -485,8 +485,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 		return NULL;
 	}
 
-	/* use OR instead of assignment to avoid clearing of bits in mask */
-	if (nc->page.pfmemalloc)
+	if (page_frag_cache_pfmemalloc(&nc->page))
 		skb->pfmemalloc = 1;
 	skb->head_frag = 1;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/8] page_frag_cache: Move slowpath code from page_frag_alloc
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 3/8] page_frag_cache: Rename 'nc' to 'pfc' Matthew Wilcox
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

Put all the unlikely code in __page_frag_cache_refill to make the
fastpath code more obvious.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/page_alloc.c | 70 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 61366f23e8c8..6d2c106f4e5d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4339,20 +4339,50 @@ EXPORT_SYMBOL(free_pages);
 static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 					     gfp_t gfp_mask)
 {
+	unsigned int size = PAGE_SIZE;
 	struct page *page = NULL;
+	struct page *old = nc->va ? virt_to_page(nc->va) : NULL;
 	gfp_t gfp = gfp_mask;
+	unsigned int pagecnt_bias = nc->pagecnt_bias & ~PFC_MEMALLOC;
+
+	/* If all allocations have been freed, we can reuse this page */
+	if (old && page_ref_sub_and_test(old, pagecnt_bias)) {
+		page = old;
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+		/* if size can vary use size else just use PAGE_SIZE */
+		size = nc->size;
+#endif
+		/* Page count is 0, we can safely set it */
+		set_page_count(page, size);
+		goto reset;
+	}
 
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
 	gfp_mask |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY |
 		    __GFP_NOMEMALLOC;
 	page = alloc_pages_node(NUMA_NO_NODE, gfp_mask,
 				PAGE_FRAG_CACHE_MAX_ORDER);
-	nc->size = page ? PAGE_FRAG_CACHE_MAX_SIZE : PAGE_SIZE;
+	if (page)
+		size = PAGE_FRAG_CACHE_MAX_SIZE;
+	nc->size = size;
 #endif
 	if (unlikely(!page))
 		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
+	if (!page) {
+		nc->va = NULL;
+		return NULL;
+	}
+
+	nc->va = page_address(page);
 
-	nc->va = page ? page_address(page) : NULL;
+	/* Using atomic_set() would break get_page_unless_zero() users. */
+	page_ref_add(page, size - 1);
+reset:
+	/* reset page count bias and offset to start of new frag */
+	nc->pagecnt_bias = size;
+	if (page_is_pfmemalloc(page))
+		nc->pagecnt_bias |= PFC_MEMALLOC;
+	nc->offset = size;
 
 	return page;
 }
@@ -4375,7 +4405,6 @@ EXPORT_SYMBOL(__page_frag_cache_drain);
 void *page_frag_alloc(struct page_frag_cache *nc,
 		      unsigned int fragsz, gfp_t gfp_mask)
 {
-	unsigned int size = PAGE_SIZE;
 	struct page *page;
 	int offset;
 
@@ -4384,42 +4413,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
 		page = __page_frag_cache_refill(nc, gfp_mask);
 		if (!page)
 			return NULL;
-
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
-		/* Even if we own the page, we do not use atomic_set().
-		 * This would break get_page_unless_zero() users.
-		 */
-		page_ref_add(page, size - 1);
-
-		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size;
-		if (page_is_pfmemalloc(page))
-			nc->pagecnt_bias |= PFC_MEMALLOC;
-		nc->offset = size;
 	}
 
 	offset = nc->offset - fragsz;
-	if (unlikely(offset < 0)) {
-		unsigned int pagecnt_bias = nc->pagecnt_bias & ~PFC_MEMALLOC;
-		page = virt_to_page(nc->va);
-
-		if (!page_ref_sub_and_test(page, pagecnt_bias))
-			goto refill;
-
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
-#endif
-		/* OK, page count is 0, we can safely set it */
-		set_page_count(page, size);
-
-		/* reset page count bias and offset to start of new frag */
-		nc->pagecnt_bias = size | (nc->pagecnt_bias - pagecnt_bias);
-		offset = size - fragsz;
-	}
+	if (unlikely(offset < 0))
+		goto refill;
 
 	nc->pagecnt_bias--;
 	nc->offset = offset;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/8] page_frag_cache: Rename 'nc' to 'pfc'
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 2/8] page_frag_cache: Move slowpath code from page_frag_alloc Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 4/8] page_frag_cache: Rename fragsz to size Matthew Wilcox
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

This name was a legacy from the 'netdev_alloc_cache' days.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/page_alloc.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d2c106f4e5d..c9fc76135dd8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4336,21 +4336,21 @@ EXPORT_SYMBOL(free_pages);
  * drivers to provide a backing region of memory for use as either an
  * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
  */
-static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
+static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
 					     gfp_t gfp_mask)
 {
 	unsigned int size = PAGE_SIZE;
 	struct page *page = NULL;
-	struct page *old = nc->va ? virt_to_page(nc->va) : NULL;
+	struct page *old = pfc->va ? virt_to_page(pfc->va) : NULL;
 	gfp_t gfp = gfp_mask;
-	unsigned int pagecnt_bias = nc->pagecnt_bias & ~PFC_MEMALLOC;
+	unsigned int pagecnt_bias = pfc->pagecnt_bias & ~PFC_MEMALLOC;
 
 	/* If all allocations have been freed, we can reuse this page */
 	if (old && page_ref_sub_and_test(old, pagecnt_bias)) {
 		page = old;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
 		/* if size can vary use size else just use PAGE_SIZE */
-		size = nc->size;
+		size = pfc->size;
 #endif
 		/* Page count is 0, we can safely set it */
 		set_page_count(page, size);
@@ -4364,25 +4364,25 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc,
 				PAGE_FRAG_CACHE_MAX_ORDER);
 	if (page)
 		size = PAGE_FRAG_CACHE_MAX_SIZE;
-	nc->size = size;
+	pfc->size = size;
 #endif
 	if (unlikely(!page))
 		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
 	if (!page) {
-		nc->va = NULL;
+		pfc->va = NULL;
 		return NULL;
 	}
 
-	nc->va = page_address(page);
+	pfc->va = page_address(page);
 
 	/* Using atomic_set() would break get_page_unless_zero() users. */
 	page_ref_add(page, size - 1);
 reset:
 	/* reset page count bias and offset to start of new frag */
-	nc->pagecnt_bias = size;
+	pfc->pagecnt_bias = size;
 	if (page_is_pfmemalloc(page))
-		nc->pagecnt_bias |= PFC_MEMALLOC;
-	nc->offset = size;
+		pfc->pagecnt_bias |= PFC_MEMALLOC;
+	pfc->offset = size;
 
 	return page;
 }
@@ -4402,27 +4402,27 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 }
 EXPORT_SYMBOL(__page_frag_cache_drain);
 
-void *page_frag_alloc(struct page_frag_cache *nc,
+void *page_frag_alloc(struct page_frag_cache *pfc,
 		      unsigned int fragsz, gfp_t gfp_mask)
 {
 	struct page *page;
 	int offset;
 
-	if (unlikely(!nc->va)) {
+	if (unlikely(!pfc->va)) {
 refill:
-		page = __page_frag_cache_refill(nc, gfp_mask);
+		page = __page_frag_cache_refill(pfc, gfp_mask);
 		if (!page)
 			return NULL;
 	}
 
-	offset = nc->offset - fragsz;
+	offset = pfc->offset - fragsz;
 	if (unlikely(offset < 0))
 		goto refill;
 
-	nc->pagecnt_bias--;
-	nc->offset = offset;
+	pfc->pagecnt_bias--;
+	pfc->offset = offset;
 
-	return nc->va + offset;
+	return pfc->va + offset;
 }
 EXPORT_SYMBOL(page_frag_alloc);
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/8] page_frag_cache: Rename fragsz to size
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
                   ` (2 preceding siblings ...)
  2018-03-22 15:31 ` [PATCH v2 3/8] page_frag_cache: Rename 'nc' to 'pfc' Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 5/8] page_frag_cache: Save memory on small machines Matthew Wilcox
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

The 'size' variable name used to be used for the page size.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 mm/page_alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c9fc76135dd8..5a2e3e293079 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4403,7 +4403,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 EXPORT_SYMBOL(__page_frag_cache_drain);
 
 void *page_frag_alloc(struct page_frag_cache *pfc,
-		      unsigned int fragsz, gfp_t gfp_mask)
+		      unsigned int size, gfp_t gfp_mask)
 {
 	struct page *page;
 	int offset;
@@ -4415,7 +4415,7 @@ void *page_frag_alloc(struct page_frag_cache *pfc,
 			return NULL;
 	}
 
-	offset = pfc->offset - fragsz;
+	offset = pfc->offset - size;
 	if (unlikely(offset < 0))
 		goto refill;
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/8] page_frag_cache: Save memory on small machines
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
                   ` (3 preceding siblings ...)
  2018-03-22 15:31 ` [PATCH v2 4/8] page_frag_cache: Rename fragsz to size Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset Matthew Wilcox
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

Only allocate a single page if CONFIG_BASE_SMALL is set.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a63b138ad1a4..0defff9e3c0e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -216,7 +216,11 @@ struct page {
 #endif
 } _struct_page_alignment;
 
+#if CONFIG_BASE_SMALL
+#define PAGE_FRAG_CACHE_MAX_SIZE	PAGE_SIZE
+#else
 #define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
+#endif
 #define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
 #define PFC_MEMALLOC			(1U << 31)
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
                   ` (4 preceding siblings ...)
  2018-03-22 15:31 ` [PATCH v2 5/8] page_frag_cache: Save memory on small machines Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 16:22   ` Alexander Duyck
  2018-03-22 15:31 ` [PATCH v2 7/8] page_frag: Update documentation Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 8/8] page_frag: Account allocations Matthew Wilcox
  7 siblings, 1 reply; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

By combining 'va' and 'offset' into 'addr' and using a mask instead,
we can save a compare-and-branch in the fast-path of the allocator.
This removes 4 instructions on x86 (both 32 and 64 bit).

We can avoid storing the mask at all if we know that we're only allocating
a single page.  This shrinks page_frag_cache from 12 to 8 bytes on 32-bit
CONFIG_BASE_SMALL build.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm_types.h | 12 +++++++-----
 mm/page_alloc.c          | 40 +++++++++++++++-------------------------
 2 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 0defff9e3c0e..ebe93edec752 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -225,12 +225,9 @@ struct page {
 #define PFC_MEMALLOC			(1U << 31)
 
 struct page_frag_cache {
-	void * va;
+	void *addr;
 #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-	__u16 offset;
-	__u16 size;
-#else
-	__u32 offset;
+	unsigned int mask;
 #endif
 	/* we maintain a pagecount bias, so that we dont dirty cache line
 	 * containing page->_refcount every time we allocate a fragment.
@@ -239,6 +236,11 @@ struct page_frag_cache {
 };
 
 #define page_frag_cache_pfmemalloc(pfc)	((pfc)->pagecnt_bias & PFC_MEMALLOC)
+#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
+#define page_frag_cache_mask(pfc)	(pfc)->mask
+#else
+#define page_frag_cache_mask(pfc)	(~PAGE_MASK)
+#endif
 
 typedef unsigned long vm_flags_t;
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5a2e3e293079..d15a5348a8e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4336,22 +4336,19 @@ EXPORT_SYMBOL(free_pages);
  * drivers to provide a backing region of memory for use as either an
  * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
  */
-static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
+static void *__page_frag_cache_refill(struct page_frag_cache *pfc,
 					     gfp_t gfp_mask)
 {
 	unsigned int size = PAGE_SIZE;
 	struct page *page = NULL;
-	struct page *old = pfc->va ? virt_to_page(pfc->va) : NULL;
+	struct page *old = pfc->addr ? virt_to_head_page(pfc->addr) : NULL;
 	gfp_t gfp = gfp_mask;
 	unsigned int pagecnt_bias = pfc->pagecnt_bias & ~PFC_MEMALLOC;
 
 	/* If all allocations have been freed, we can reuse this page */
 	if (old && page_ref_sub_and_test(old, pagecnt_bias)) {
 		page = old;
-#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
-		/* if size can vary use size else just use PAGE_SIZE */
-		size = pfc->size;
-#endif
+		size = page_frag_cache_mask(pfc) + 1;
 		/* Page count is 0, we can safely set it */
 		set_page_count(page, size);
 		goto reset;
@@ -4364,27 +4361,24 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
 				PAGE_FRAG_CACHE_MAX_ORDER);
 	if (page)
 		size = PAGE_FRAG_CACHE_MAX_SIZE;
-	pfc->size = size;
+	pfc->mask = size - 1;
 #endif
 	if (unlikely(!page))
 		page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
 	if (!page) {
-		pfc->va = NULL;
+		pfc->addr = NULL;
 		return NULL;
 	}
 
-	pfc->va = page_address(page);
-
 	/* Using atomic_set() would break get_page_unless_zero() users. */
 	page_ref_add(page, size - 1);
 reset:
-	/* reset page count bias and offset to start of new frag */
 	pfc->pagecnt_bias = size;
 	if (page_is_pfmemalloc(page))
 		pfc->pagecnt_bias |= PFC_MEMALLOC;
-	pfc->offset = size;
+	pfc->addr = page_address(page) + size;
 
-	return page;
+	return pfc->addr;
 }
 
 void __page_frag_cache_drain(struct page *page, unsigned int count)
@@ -4405,24 +4399,20 @@ EXPORT_SYMBOL(__page_frag_cache_drain);
 void *page_frag_alloc(struct page_frag_cache *pfc,
 		      unsigned int size, gfp_t gfp_mask)
 {
-	struct page *page;
-	int offset;
+	void *addr = pfc->addr;
+	unsigned int offset = (unsigned long)addr & page_frag_cache_mask(pfc);
 
-	if (unlikely(!pfc->va)) {
-refill:
-		page = __page_frag_cache_refill(pfc, gfp_mask);
-		if (!page)
+	if (unlikely(offset < size)) {
+		addr = __page_frag_cache_refill(pfc, gfp_mask);
+		if (!addr)
 			return NULL;
 	}
 
-	offset = pfc->offset - size;
-	if (unlikely(offset < 0))
-		goto refill;
-
+	addr -= size;
+	pfc->addr = addr;
 	pfc->pagecnt_bias--;
-	pfc->offset = offset;
 
-	return pfc->va + offset;
+	return addr;
 }
 EXPORT_SYMBOL(page_frag_alloc);
 
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/8] page_frag: Update documentation
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
                   ` (5 preceding siblings ...)
  2018-03-22 15:31 ` [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  2018-03-22 15:31 ` [PATCH v2 8/8] page_frag: Account allocations Matthew Wilcox
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

 - Rename Documentation/vm/page_frags to page_frags.rst
 - Change page_frags.rst to be a user's guide rather than implementation
   detail.
 - Add kernel-doc for the page_frag allocator
 - Move implementation details to the comments in page_alloc.c

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 Documentation/vm/page_frags     | 42 ---------------------------
 Documentation/vm/page_frags.rst | 24 ++++++++++++++++
 mm/page_alloc.c                 | 63 ++++++++++++++++++++++++++++++++---------
 3 files changed, 74 insertions(+), 55 deletions(-)
 delete mode 100644 Documentation/vm/page_frags
 create mode 100644 Documentation/vm/page_frags.rst

diff --git a/Documentation/vm/page_frags b/Documentation/vm/page_frags
deleted file mode 100644
index a6714565dbf9..000000000000
--- a/Documentation/vm/page_frags
+++ /dev/null
@@ -1,42 +0,0 @@
-Page fragments
---------------
-
-A page fragment is an arbitrary-length arbitrary-offset area of memory
-which resides within a 0 or higher order compound page.  Multiple
-fragments within that page are individually refcounted, in the page's
-reference counter.
-
-The page_frag functions, page_frag_alloc and page_frag_free, provide a
-simple allocation framework for page fragments.  This is used by the
-network stack and network device drivers to provide a backing region of
-memory for use as either an sk_buff->head, or to be used in the "frags"
-portion of skb_shared_info.
-
-In order to make use of the page fragment APIs a backing page fragment
-cache is needed.  This provides a central point for the fragment allocation
-and tracks allows multiple calls to make use of a cached page.  The
-advantage to doing this is that multiple calls to get_page can be avoided
-which can be expensive at allocation time.  However due to the nature of
-this caching it is required that any calls to the cache be protected by
-either a per-cpu limitation, or a per-cpu limitation and forcing interrupts
-to be disabled when executing the fragment allocation.
-
-The network stack uses two separate caches per CPU to handle fragment
-allocation.  The netdev_alloc_cache is used by callers making use of the
-__netdev_alloc_frag and __netdev_alloc_skb calls.  The napi_alloc_cache is
-used by callers of the __napi_alloc_frag and __napi_alloc_skb calls.  The
-main difference between these two calls is the context in which they may be
-called.  The "netdev" prefixed functions are usable in any context as these
-functions will disable interrupts, while the "napi" prefixed functions are
-only usable within the softirq context.
-
-Many network device drivers use a similar methodology for allocating page
-fragments, but the page fragments are cached at the ring or descriptor
-level.  In order to enable these cases it is necessary to provide a generic
-way of tearing down a page cache.  For this reason __page_frag_cache_drain
-was implemented.  It allows for freeing multiple references from a single
-page via a single call.  The advantage to doing this is that it allows for
-cleaning up the multiple references that were added to a page in order to
-avoid calling get_page per allocation.
-
-Alexander Duyck, Nov 29, 2016.
diff --git a/Documentation/vm/page_frags.rst b/Documentation/vm/page_frags.rst
new file mode 100644
index 000000000000..e675bfad6710
--- /dev/null
+++ b/Documentation/vm/page_frags.rst
@@ -0,0 +1,24 @@
+==============
+Page fragments
+==============
+
+:Author: Alexander Duyck
+
+A page fragment is a physically contiguous area of memory that is smaller
+than a page.  It may cross a page boundary, and may be allocated at
+an arbitrary alignment.
+
+The page fragment allocator is optimised for very fast allocation
+of arbitrary-sized objects which will likely be freed soon.  It does
+not take any locks, relying on the caller to ensure that simultaneous
+allocations from the same page_frag_cache cannot occur.  The allocator
+does not support red zones or poisoning.  If the user has alignment
+requirements, rounding the size of each object allocated from the cache
+will ensure that all objects are aligned.  Do not attempt to allocate
+0 bytes; it is not checked for and will end badly.
+
+Functions
+=========
+
+.. kernel-doc:: mm/page_alloc.c
+   :functions: page_frag_alloc page_frag_free __page_frag_cache_drain
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d15a5348a8e4..b9beafa5d2a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4326,15 +4326,27 @@ void free_pages(unsigned long addr, unsigned int order)
 EXPORT_SYMBOL(free_pages);
 
 /*
- * Page Fragment:
- *  An arbitrary-length arbitrary-offset area of memory which resides
- *  within a 0 or higher order page.  Multiple fragments within that page
- *  are individually refcounted, in the page's reference counter.
- *
- * The page_frag functions below provide a simple allocation framework for
- * page fragments.  This is used by the network stack and network device
- * drivers to provide a backing region of memory for use as either an
+ * The page_frag functions below are used by the network stack and network
+ * device drivers to provide a backing region of memory for use as either an
  * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
+ *
+ * We attempt to use a compound page (unless the machine has a large
+ * PAGE_SIZE) in order to minimise trips into the page allocator.  Allocation
+ * starts at the end of the page and proceeds towards the beginning of the
+ * page.  Once there is insufficient space in the page to satisfy the
+ * next allocation, we call into __page_frag_cache_refill() in order to
+ * either recycle the existing page or start allocation from a new page.
+ *
+ * The allocation side maintains a count of the number of allocations it
+ * has made while frees are counted in the struct page reference count.
+ * We reconcile the two when there is no space left in the page.  This
+ * minimises cache line bouncing when page frags are freed on a different
+ * CPU from the one they were allocated on.
+ *
+ * Several network drivers use a similar approach to the page_frag_cache,
+ * but specialise their allocator to return a dma_addr_t instead of a
+ * virtual address.  They can also use page_frag_free(), and will use
+ * __page_frag_cache_drain() in order to destroy their caches.
  */
 static void *__page_frag_cache_refill(struct page_frag_cache *pfc,
 					     gfp_t gfp_mask)
@@ -4381,6 +4393,18 @@ static void *__page_frag_cache_refill(struct page_frag_cache *pfc,
 	return pfc->addr;
 }
 
+/**
+ * __page_frag_cache_drain() - Stop using a page.
+ * @page: Current page in use.
+ * @count: Number of allocations remaining.
+ *
+ * When a page fragment cache is being destroyed, this function prepares
+ * the page to be freed.  It will actually be freed if there are no
+ * outstanding allocations on that page; otherwise it will be freed when
+ * the last allocation is freed.
+ *
+ * Context: Any context.
+ */
 void __page_frag_cache_drain(struct page *page, unsigned int count)
 {
 	VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
@@ -4396,14 +4420,22 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 }
 EXPORT_SYMBOL(__page_frag_cache_drain);
 
-void *page_frag_alloc(struct page_frag_cache *pfc,
-		      unsigned int size, gfp_t gfp_mask)
+/**
+ * page_frag_alloc() - Allocate a page fragment.
+ * @pfc: page_frag cache.
+ * @size: Number of bytes to allocate.
+ * @gfp: Memory allocation flags.
+ *
+ * Context: Any context.
+ * Return: Address of allocated memory or %NULL.
+ */
+void *page_frag_alloc(struct page_frag_cache *pfc, unsigned int size, gfp_t gfp)
 {
 	void *addr = pfc->addr;
 	unsigned int offset = (unsigned long)addr & page_frag_cache_mask(pfc);
 
 	if (unlikely(offset < size)) {
-		addr = __page_frag_cache_refill(pfc, gfp_mask);
+		addr = __page_frag_cache_refill(pfc, gfp);
 		if (!addr)
 			return NULL;
 	}
@@ -4416,8 +4448,13 @@ void *page_frag_alloc(struct page_frag_cache *pfc,
 }
 EXPORT_SYMBOL(page_frag_alloc);
 
-/*
- * Frees a page fragment allocated out of either a compound or order 0 page.
+/**
+ * page_frag_free() - Free a page fragment.
+ * @addr: Address of page fragment.
+ *
+ * Free memory previously allocated by page_frag_alloc().
+ *
+ * Context: Any context.
  */
 void page_frag_free(void *addr)
 {
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 8/8] page_frag: Account allocations
  2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
                   ` (6 preceding siblings ...)
  2018-03-22 15:31 ` [PATCH v2 7/8] page_frag: Update documentation Matthew Wilcox
@ 2018-03-22 15:31 ` Matthew Wilcox
  7 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 15:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

From: Matthew Wilcox <mawilcox@microsoft.com>

Note the number of pages currently used in page_frag allocations.
This may help diagnose leaks in page_frag users.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mmzone.h |  3 ++-
 mm/page_alloc.c        | 10 +++++++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7522a6987595..ed6be33dcc7a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -139,10 +139,10 @@ enum zone_stat_item {
 	NR_ZONE_ACTIVE_FILE,
 	NR_ZONE_UNEVICTABLE,
 	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
+	/* Second 128 byte cacheline */
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
 	NR_PAGETABLE,		/* used for pagetables */
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
-	/* Second 128 byte cacheline */
 	NR_BOUNCE,
 #if IS_ENABLED(CONFIG_ZSMALLOC)
 	NR_ZSPAGES,		/* allocated in zsmalloc */
@@ -175,6 +175,7 @@ enum node_stat_item {
 	NR_SHMEM_THPS,
 	NR_SHMEM_PMDMAPPED,
 	NR_ANON_THPS,
+	NR_PAGE_FRAG,
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_VMSCAN_WRITE,
 	NR_VMSCAN_IMMEDIATE,	/* Prioritise for reclaim when writeback ends */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b9beafa5d2a5..5a9441b46604 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4382,6 +4382,7 @@ static void *__page_frag_cache_refill(struct page_frag_cache *pfc,
 		return NULL;
 	}
 
+	inc_node_page_state(page, NR_PAGE_FRAG);
 	/* Using atomic_set() would break get_page_unless_zero() users. */
 	page_ref_add(page, size - 1);
 reset:
@@ -4460,8 +4461,10 @@ void page_frag_free(void *addr)
 {
 	struct page *page = virt_to_head_page(addr);
 
-	if (unlikely(put_page_testzero(page)))
+	if (unlikely(put_page_testzero(page))) {
+		dec_node_page_state(page, NR_PAGE_FRAG);
 		__free_pages_ok(page, compound_order(page));
+	}
 }
 EXPORT_SYMBOL(page_frag_free);
 
@@ -4769,7 +4772,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
 		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
-		" free:%lu free_pcp:%lu free_cma:%lu\n",
+		" free:%lu free_pcp:%lu free_cma:%lu page_frag:%lu\n",
 		global_node_page_state(NR_ACTIVE_ANON),
 		global_node_page_state(NR_INACTIVE_ANON),
 		global_node_page_state(NR_ISOLATED_ANON),
@@ -4788,7 +4791,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_zone_page_state(NR_BOUNCE),
 		global_zone_page_state(NR_FREE_PAGES),
 		free_pcp,
-		global_zone_page_state(NR_FREE_CMA_PAGES));
+		global_zone_page_state(NR_FREE_CMA_PAGES),
+		global_node_page_state(NR_PAGE_FRAG));
 
 	for_each_online_pgdat(pgdat) {
 		if (show_mem_node_skip(filter, pgdat->node_id, nodemask))
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
  2018-03-22 15:31 ` [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset Matthew Wilcox
@ 2018-03-22 16:22   ` Alexander Duyck
  2018-03-22 16:41     ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2018-03-22 16:22 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 8:31 AM, Matthew Wilcox <willy@infradead.org> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> By combining 'va' and 'offset' into 'addr' and using a mask instead,
> we can save a compare-and-branch in the fast-path of the allocator.
> This removes 4 instructions on x86 (both 32 and 64 bit).

What is the point of renaming "va"? I'm seeing a lot of unneeded
renaming in these patches that doesn't really seem needed and is just
making things harder to review.

> We can avoid storing the mask at all if we know that we're only allocating
> a single page.  This shrinks page_frag_cache from 12 to 8 bytes on 32-bit
> CONFIG_BASE_SMALL build.
>
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

So I am not really a fan of CONFIG_BASE_SMALL in general, so
advertising gains in size is just going back down the reducing size at
the expense of performance train of thought.

Do we know for certain that a higher order page is always aligned to
the size of the higher order page itself? That is one thing I have
never been certain about. I know for example there are head and tail
pages so I was never certain if it was possible to create a higher
order page that is not aligned to to the size of the page itself.

If we can get away with making that assumption then yes I would say
this is probably an optimization, though I think we would be better
off without the CONFIG_BASE_SMALL bits.

> ---
>  include/linux/mm_types.h | 12 +++++++-----
>  mm/page_alloc.c          | 40 +++++++++++++++-------------------------
>  2 files changed, 22 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 0defff9e3c0e..ebe93edec752 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -225,12 +225,9 @@ struct page {
>  #define PFC_MEMALLOC                   (1U << 31)
>
>  struct page_frag_cache {
> -       void * va;
> +       void *addr;
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -       __u16 offset;
> -       __u16 size;
> -#else
> -       __u32 offset;
> +       unsigned int mask;

So this is just an akward layout. You now have essentially:
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
#else
    unsigned int mask;
#endif

>  #endif
>         /* we maintain a pagecount bias, so that we dont dirty cache line
>          * containing page->_refcount every time we allocate a fragment.
> @@ -239,6 +236,11 @@ struct page_frag_cache {
>  };
>
>  #define page_frag_cache_pfmemalloc(pfc)        ((pfc)->pagecnt_bias & PFC_MEMALLOC)
> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +#define page_frag_cache_mask(pfc)      (pfc)->mask
> +#else
> +#define page_frag_cache_mask(pfc)      (~PAGE_MASK)
> +#endif
>
>  typedef unsigned long vm_flags_t;
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5a2e3e293079..d15a5348a8e4 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4336,22 +4336,19 @@ EXPORT_SYMBOL(free_pages);
>   * drivers to provide a backing region of memory for use as either an
>   * sk_buff->head, or to be used in the "frags" portion of skb_shared_info.
>   */
> -static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
> +static void *__page_frag_cache_refill(struct page_frag_cache *pfc,
>                                              gfp_t gfp_mask)
>  {
>         unsigned int size = PAGE_SIZE;
>         struct page *page = NULL;
> -       struct page *old = pfc->va ? virt_to_page(pfc->va) : NULL;
> +       struct page *old = pfc->addr ? virt_to_head_page(pfc->addr) : NULL;
>         gfp_t gfp = gfp_mask;
>         unsigned int pagecnt_bias = pfc->pagecnt_bias & ~PFC_MEMALLOC;
>
>         /* If all allocations have been freed, we can reuse this page */
>         if (old && page_ref_sub_and_test(old, pagecnt_bias)) {
>                 page = old;
> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> -               /* if size can vary use size else just use PAGE_SIZE */
> -               size = pfc->size;
> -#endif
> +               size = page_frag_cache_mask(pfc) + 1;
>                 /* Page count is 0, we can safely set it */
>                 set_page_count(page, size);
>                 goto reset;
> @@ -4364,27 +4361,24 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
>                                 PAGE_FRAG_CACHE_MAX_ORDER);PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>         if (page)
>                 size = PAGE_FRAG_CACHE_MAX_SIZE;
> -       pfc->size = size;
> +       pfc->mask = size - 1;
>  #endif
>         if (unlikely(!page))
>                 page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>         if (!page) {
> -               pfc->va = NULL;
> +               pfc->addr = NULL;
>                 return NULL;
>         }
>
> -       pfc->va = page_address(page);
> -
>         /* Using atomic_set() would break get_page_unless_zero() users. */
>         page_ref_add(page, size - 1);

You could just use the pfc->mask here instead of size - 1 just to
avoid having to do the subtraction more than once assuming the
compiler doesn't optimize it.

>  reset:
> -       /* reset page count bias and offset to start of new frag */
>         pfc->pagecnt_bias = size;
>         if (page_is_pfmemalloc(page))
>                 pfc->pagecnt_bias |= PFC_MEMALLOC;
> -       pfc->offset = size;
> +       pfc->addr = page_address(page) + size;
>
> -       return page;
> +       return pfc->addr;
>  }
>
>  void __page_frag_cache_drain(struct page *page, unsigned int count)
> @@ -4405,24 +4399,20 @@ EXPORT_SYMBOL(__page_frag_cache_drain);
>  void *page_frag_alloc(struct page_frag_cache *pfc,
>                       unsigned int size, gfp_t gfp_mask)
>  {
> -       struct page *page;
> -       int offset;
> +       void *addr = pfc->addr;
> +       unsigned int offset = (unsigned long)addr & page_frag_cache_mask(pfc);
>
> -       if (unlikely(!pfc->va)) {
> -refill:
> -               page = __page_frag_cache_refill(pfc, gfp_mask);
> -               if (!page)
> +       if (unlikely(offset < size)) {
> +               addr = __page_frag_cache_refill(pfc, gfp_mask);
> +               if (!addr)
>                         return NULL;
>         }
>
> -       offset = pfc->offset - size;
> -       if (unlikely(offset < 0))
> -               goto refill;
> -
> +       addr -= size;
> +       pfc->addr = addr;
>         pfc->pagecnt_bias--;
> -       pfc->offset = offset;
>
> -       return pfc->va + offset;
> +       return addr;
>  }
>  EXPORT_SYMBOL(page_frag_alloc);
>
> --
> 2.16.2
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool
  2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
@ 2018-03-22 16:39   ` Alexander Duyck
  2018-03-22 17:08     ` Matthew Wilcox
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2018-03-22 16:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 8:31 AM, Matthew Wilcox <willy@infradead.org> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> Save 4/8 bytes by moving the pfmemalloc indicator from its own bool
> to the top bit of pagecnt_bias.  This has no effect on the fastpath
> of the allocator since the pagecnt_bias cannot go negative.  It's
> a couple of extra instructions in the slowpath.
>
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>

So I was just thinking about this and it would probably make more
sense to look at addressing this after you take care of your
conversion from size/offset to a mask. One thing with the mask is that
it should never reach 64K since that is the largest page size if I
recall. With that being the case we could look at dropping mask to a
u16 value and then add a u16 flags field where you could store things
like this. Then you could avoid having to do the masking and math you
are having to do below.

> ---
>  include/linux/mm_types.h | 4 +++-
>  mm/page_alloc.c          | 8 +++++---
>  net/core/skbuff.c        | 5 ++---
>  3 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index fd1af6b9591d..a63b138ad1a4 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -218,6 +218,7 @@ struct page {
>
>  #define PAGE_FRAG_CACHE_MAX_SIZE       __ALIGN_MASK(32768, ~PAGE_MASK)
>  #define PAGE_FRAG_CACHE_MAX_ORDER      get_order(PAGE_FRAG_CACHE_MAX_SIZE)
> +#define PFC_MEMALLOC                   (1U << 31)
>
>  struct page_frag_cache {
>         void * va;
> @@ -231,9 +232,10 @@ struct page_frag_cache {
>          * containing page->_refcount every time we allocate a fragment.
>          */
>         unsigned int            pagecnt_bias;
> -       bool pfmemalloc;
>  };
>
> +#define page_frag_cache_pfmemalloc(pfc)        ((pfc)->pagecnt_bias & PFC_MEMALLOC)
> +
>  typedef unsigned long vm_flags_t;
>
>  /*
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 635d7dd29d7f..61366f23e8c8 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4395,16 +4395,18 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>                 page_ref_add(page, size - 1);
>
>                 /* reset page count bias and offset to start of new frag */
> -               nc->pfmemalloc = page_is_pfmemalloc(page);
>                 nc->pagecnt_bias = size;
> +               if (page_is_pfmemalloc(page))
> +                       nc->pagecnt_bias |= PFC_MEMALLOC;
>                 nc->offset = size;
>         }
>
>         offset = nc->offset - fragsz;
>         if (unlikely(offset < 0)) {
> +               unsigned int pagecnt_bias = nc->pagecnt_bias & ~PFC_MEMALLOC;
>                 page = virt_to_page(nc->va);
>
> -               if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
> +               if (!page_ref_sub_and_test(page, pagecnt_bias))
>                         goto refill;
>
>  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> @@ -4415,7 +4417,7 @@ void *page_frag_alloc(struct page_frag_cache *nc,
>                 set_page_count(page, size);
>
>                 /* reset page count bias and offset to start of new frag */
> -               nc->pagecnt_bias = size;
> +               nc->pagecnt_bias = size | (nc->pagecnt_bias - pagecnt_bias);
>                 offset = size - fragsz;
>         }
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 0bb0d8877954..54bbde8f7541 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -412,7 +412,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int len,
>
>         nc = this_cpu_ptr(&netdev_alloc_cache);
>         data = page_frag_alloc(nc, len, gfp_mask);
> -       pfmemalloc = nc->pfmemalloc;
> +       pfmemalloc = page_frag_cache_pfmemalloc(nc);
>
>         local_irq_restore(flags);
>
> @@ -485,8 +485,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>                 return NULL;
>         }
>
> -       /* use OR instead of assignment to avoid clearing of bits in mask */
> -       if (nc->page.pfmemalloc)
> +       if (page_frag_cache_pfmemalloc(&nc->page))
>                 skb->pfmemalloc = 1;
>         skb->head_frag = 1;
>
> --
> 2.16.2
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
  2018-03-22 16:22   ` Alexander Duyck
@ 2018-03-22 16:41     ` Matthew Wilcox
  2018-03-22 17:31       ` Alexander Duyck
  2018-03-22 17:34       ` Matthew Wilcox
  0 siblings, 2 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 16:41 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 09:22:31AM -0700, Alexander Duyck wrote:
> On Thu, Mar 22, 2018 at 8:31 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > By combining 'va' and 'offset' into 'addr' and using a mask instead,
> > we can save a compare-and-branch in the fast-path of the allocator.
> > This removes 4 instructions on x86 (both 32 and 64 bit).
> 
> What is the point of renaming "va"? I'm seeing a lot of unneeded
> renaming in these patches that doesn't really seem needed and is just
> making things harder to review.

By renaming 'va', I made sure that I saw everywhere that 'va' was touched,
and reviewed it to be sure it was still correct with the new meaning.
The advantage of keeping that in the patch submission, rather than
renaming it back again, is that you can see everywhere that it's been
touched and verify that for yourself.

> > We can avoid storing the mask at all if we know that we're only allocating
> > a single page.  This shrinks page_frag_cache from 12 to 8 bytes on 32-bit
> > CONFIG_BASE_SMALL build.
> >
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> 
> So I am not really a fan of CONFIG_BASE_SMALL in general, so
> advertising gains in size is just going back down the reducing size at
> the expense of performance train of thought.

There's no tradeoff for performance *in this patch* with
CONFIG_BASE_SMALL.  Indeed, being able to assume that the cache contains a
single PAGE_SIZE page reduces the number of instructions by two on x86-64
(and is neutral on x86-32).  IIRC it saves a register, so there's one fewer
'push' at the beginning of the function and one fewer 'pop' at the end.

I think the more compelling argument for conditioning the number of pages
allocated on CONFIG_BASE_SMALL is that a machine which needs to shrink
its data structures so badly isn't going to have 32k of memory available,
nor want to spend it on a networking allocation.  Eric's commit which
introduced NETDEV_FRAG_PAGE_MAX_ORDER back in 2012 (69b08f62e174) didn't
mention small machines as a consideration.

> Do we know for certain that a higher order page is always aligned to
> the size of the higher order page itself? That is one thing I have
> never been certain about. I know for example there are head and tail
> pages so I was never certain if it was possible to create a higher
> order page that is not aligned to to the size of the page itself.

It's intrinsic to the buddy allocator that pages are naturally aligned
to their order.  There's a lot of code in the kernel which relies on
it, including much of the mm (particularly THP).  I suppose one could
construct a non-aligned compound page, but it'd be really weird, and you'd
have to split it up manually before handing it back to the page allocator.
I don't see this ever changing.

> >  struct page_frag_cache {
> > -       void * va;
> > +       void *addr;
> >  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> > -       __u16 offset;
> > -       __u16 size;
> > -#else
> > -       __u32 offset;
> > +       unsigned int mask;
> 
> So this is just an akward layout. You now have essentially:
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> #else
>     unsigned int mask;
> #endif

Huh?  There's a '-' in front of the '#else'.  It looks like this:

struct page_frag_cache {
        void *addr;
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
        unsigned int mask;
#endif
        /* we maintain a pagecount bias, so that we dont dirty cache line
         * containing page->_refcount every time we allocate a fragment.
         */
        unsigned int            pagecnt_bias;
};

> > @@ -4364,27 +4361,24 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
> >                                 PAGE_FRAG_CACHE_MAX_ORDER);PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> >         if (page)
> >                 size = PAGE_FRAG_CACHE_MAX_SIZE;
> > -       pfc->size = size;
> > +       pfc->mask = size - 1;
> >  #endif
> >         if (unlikely(!page))
> >                 page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
> >         if (!page) {
> > -               pfc->va = NULL;
> > +               pfc->addr = NULL;
> >                 return NULL;
> >         }
> >
> > -       pfc->va = page_address(page);
> > -
> >         /* Using atomic_set() would break get_page_unless_zero() users. */
> >         page_ref_add(page, size - 1);
> 
> You could just use the pfc->mask here instead of size - 1 just to
> avoid having to do the subtraction more than once assuming the
> compiler doesn't optimize it.

Either way I'm assuming a compiler optimisation -- that it won't reload
from memory, or that it'll remember the subtraction.  I don't much care
which, and I'll happily use the page_frag_cache_mask() if that reads better
for you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool
  2018-03-22 16:39   ` Alexander Duyck
@ 2018-03-22 17:08     ` Matthew Wilcox
  0 siblings, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 17:08 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 09:39:40AM -0700, Alexander Duyck wrote:
> So I was just thinking about this and it would probably make more
> sense to look at addressing this after you take care of your
> conversion from size/offset to a mask. One thing with the mask is that
> it should never reach 64K since that is the largest page size if I
> recall. With that being the case we could look at dropping mask to a
> u16 value and then add a u16 flags field where you could store things
> like this. Then you could avoid having to do the masking and math you
> are having to do below.

With the bit being in the top bit, it's actually no maths at all in the
caller; it only looks like it in C.  Here's what GCC ends up doing:

     e66:       e8 00 00 00 00          callq  e6b <__netdev_alloc_skb+0x7b>
                        e67: R_X86_64_PC32      page_frag_alloc-0x4
     e6b:       44 8b 3d 00 00 00 00    mov    0x0(%rip),%r15d
...
     e8c:       45 85 ff                test   %r15d,%r15d
     e8f:       79 04                   jns    e95 <__netdev_alloc_skb+0xa5>
     e91:       80 48 78 08             orb    $0x8,0x78(%rax)
     e95:       80 48 76 20             orb    $0x20,0x76(%rax)

ie it's testing the top bit by looking at the sign bit.  If I move it to
the second-top bit (1 << 30), it does this instead:

     e66:       e8 00 00 00 00          callq  e6b <__netdev_alloc_skb+0x7b>
                        e67: R_X86_64_PC32      page_frag_alloc-0x4
     e6b:       44 8b 2d 00 00 00 00    mov    0x0(%rip),%r13d
...
     e75:       41 81 e5 00 00 00 40    and    $0x40000000,%r13d
...
     e93:       45 85 ed                test   %r13d,%r13d
     e96:       74 04                   je     e9c <__netdev_alloc_skb+0xac>
     e98:       80 48 78 08             orb    $0x8,0x78(%rax)
     e9c:       80 48 76 20             orb    $0x20,0x76(%rax)

Changing mask to an unsigned short and adding a bool pfmemalloc to the
struct, I get:

     e66:       e8 00 00 00 00          callq  e6b <__netdev_alloc_skb+0x7b>
                        e67: R_X86_64_PC32      page_frag_alloc-0x4
     e6b:       44 0f b6 3d 00 00 00    movzbl 0x0(%rip),%r15d
     e72:       00 
...
     e8d:       45 84 ff                test   %r15b,%r15b
     e90:       74 04                   je     e96 <__netdev_alloc_skb+0xa6>
     e92:       80 48 78 08             orb    $0x8,0x78(%rax)
     e96:       80 48 76 20             orb    $0x20,0x76(%rax)

actually one byte less efficient code due to movzbl being one byte longer.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
  2018-03-22 16:41     ` Matthew Wilcox
@ 2018-03-22 17:31       ` Alexander Duyck
  2018-03-22 17:34       ` Matthew Wilcox
  1 sibling, 0 replies; 15+ messages in thread
From: Alexander Duyck @ 2018-03-22 17:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 9:41 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Thu, Mar 22, 2018 at 09:22:31AM -0700, Alexander Duyck wrote:
>> On Thu, Mar 22, 2018 at 8:31 AM, Matthew Wilcox <willy@infradead.org> wrote:
>> > By combining 'va' and 'offset' into 'addr' and using a mask instead,
>> > we can save a compare-and-branch in the fast-path of the allocator.
>> > This removes 4 instructions on x86 (both 32 and 64 bit).
>>
>> What is the point of renaming "va"? I'm seeing a lot of unneeded
>> renaming in these patches that doesn't really seem needed and is just
>> making things harder to review.
>
> By renaming 'va', I made sure that I saw everywhere that 'va' was touched,
> and reviewed it to be sure it was still correct with the new meaning.
> The advantage of keeping that in the patch submission, rather than
> renaming it back again, is that you can see everywhere that it's been
> touched and verify that for yourself.

Okay, I guess that makes some sense. I was just mentally lumping it in
with the fragsz -> size and nc -> pfc changes.

>> > We can avoid storing the mask at all if we know that we're only allocating
>> > a single page.  This shrinks page_frag_cache from 12 to 8 bytes on 32-bit
>> > CONFIG_BASE_SMALL build.
>> >
>> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
>>
>> So I am not really a fan of CONFIG_BASE_SMALL in general, so
>> advertising gains in size is just going back down the reducing size at
>> the expense of performance train of thought.
>
> There's no tradeoff for performance *in this patch* with
> CONFIG_BASE_SMALL.  Indeed, being able to assume that the cache contains a
> single PAGE_SIZE page reduces the number of instructions by two on x86-64
> (and is neutral on x86-32).  IIRC it saves a register, so there's one fewer
> 'push' at the beginning of the function and one fewer 'pop' at the end.

The issue is that you are now going to have to perform 8x as many
allocations and take the slow path that many more times.

> I think the more compelling argument for conditioning the number of pages
> allocated on CONFIG_BASE_SMALL is that a machine which needs to shrink
> its data structures so badly isn't going to have 32k of memory available,
> nor want to spend it on a networking allocation.  Eric's commit which
> introduced NETDEV_FRAG_PAGE_MAX_ORDER back in 2012 (69b08f62e174) didn't
> mention small machines as a consideration.

The problem is that is assuming that something is doing small enough
allocations that there is an advantage to using 4K. In the case of
this API I'm not certain that is the case. More often then not this is
used when allocating an skb in the Rx path. The typical Rx skb size is
headroom + 1514 + skb_shared_info. If you take a look that is
dangerously close to 2K. With your change you now get 2 allocations
per page instead of the 16 that was seen with a 32K page. If you have
a device that cannot control the Rx along that boundary things get
worse since you are looking at something like headroom + 2K +
skb_shared_info. In such a case there wouldn't be any point using the
API anymore since you might as well just use the page allocator.

>> Do we know for certain that a higher order page is always aligned to
>> the size of the higher order page itself? That is one thing I have
>> never been certain about. I know for example there are head and tail
>> pages so I was never certain if it was possible to create a higher
>> order page that is not aligned to to the size of the page itself.
>
> It's intrinsic to the buddy allocator that pages are naturally aligned
> to their order.  There's a lot of code in the kernel which relies on
> it, including much of the mm (particularly THP).  I suppose one could
> construct a non-aligned compound page, but it'd be really weird, and you'd
> have to split it up manually before handing it back to the page allocator.
> I don't see this ever changing.
>
>> >  struct page_frag_cache {
>> > -       void * va;
>> > +       void *addr;
>> >  #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> > -       __u16 offset;
>> > -       __u16 size;
>> > -#else
>> > -       __u32 offset;
>> > +       unsigned int mask;
>>
>> So this is just an akward layout. You now have essentially:
>> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> #else
>>     unsigned int mask;
>> #endif
>
> Huh?  There's a '-' in front of the '#else'.  It looks like this:

Yeah, I might need to increase the font size on my email client. The
"-" had blended into the "#".

>
> struct page_frag_cache {
>         void *addr;
> #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>         unsigned int mask;

We might need to check here and see if this needs to be an "unsigned
int" or if we can get away with an "unsigned short". If the maximum
page size supported for most architectures is 64K or less. For those
cases we could just use an unsigned short. There are a rare few that
are larger where this would be forced into an "unsigned int".

> #endif
>         /* we maintain a pagecount bias, so that we dont dirty cache line
>          * containing page->_refcount every time we allocate a fragment.
>          */
>         unsigned int            pagecnt_bias;

We could probably look at doing something similar for pagecnt_bias.
For real use cases we currently force this to be aligned to something
like the L1 cache bytes. That usually reduces the number of actual
uses for this. If we put a limitation where the fragsz has to be
aligned to some value we could use that to limit this so the upper
limit for pagecnt_bias would be PAGE_FRAG_CACHE_MAX_SIZE / (required
alignment size).

> };
>
>> > @@ -4364,27 +4361,24 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *pfc,
>> >                                 PAGE_FRAG_CACHE_MAX_ORDER);PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
>> >         if (page)
>> >                 size = PAGE_FRAG_CACHE_MAX_SIZE;
>> > -       pfc->size = size;
>> > +       pfc->mask = size - 1;
>> >  #endif
>> >         if (unlikely(!page))
>> >                 page = alloc_pages_node(NUMA_NO_NODE, gfp, 0);
>> >         if (!page) {
>> > -               pfc->va = NULL;
>> > +               pfc->addr = NULL;
>> >                 return NULL;
>> >         }
>> >
>> > -       pfc->va = page_address(page);
>> > -
>> >         /* Using atomic_set() would break get_page_unless_zero() users. */
>> >         page_ref_add(page, size - 1);
>>
>> You could just use the pfc->mask here instead of size - 1 just to
>> avoid having to do the subtraction more than once assuming the
>> compiler doesn't optimize it.
>
> Either way I'm assuming a compiler optimisation -- that it won't reload
> from memory, or that it'll remember the subtraction.  I don't much care
> which, and I'll happily use the page_frag_cache_mask() if that reads better
> for you.

If the compiler is doing it then you are probably fine.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset
  2018-03-22 16:41     ` Matthew Wilcox
  2018-03-22 17:31       ` Alexander Duyck
@ 2018-03-22 17:34       ` Matthew Wilcox
  1 sibling, 0 replies; 15+ messages in thread
From: Matthew Wilcox @ 2018-03-22 17:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Matthew Wilcox, Netdev, linux-mm, Jesper Dangaard Brouer, Eric Dumazet

On Thu, Mar 22, 2018 at 09:41:57AM -0700, Matthew Wilcox wrote:
> On Thu, Mar 22, 2018 at 09:22:31AM -0700, Alexander Duyck wrote:
> > You could just use the pfc->mask here instead of size - 1 just to
> > avoid having to do the subtraction more than once assuming the
> > compiler doesn't optimize it.
> 
> Either way I'm assuming a compiler optimisation -- that it won't reload
> from memory, or that it'll remember the subtraction.  I don't much care
> which, and I'll happily use the page_frag_cache_mask() if that reads better
> for you.

Looks like it does reload from memory if I make that change.  Before:

    37e7:       c7 43 08 ff 7f 00 00    movl   $0x7fff,0x8(%rbx)
    37ee:       b9 00 80 00 00          mov    $0x8000,%ecx
    37f3:       be ff 7f 00 00          mov    $0x7fff,%esi
    37f8:       ba 00 80 00 00          mov    $0x8000,%edx
...
    380b:       01 70 1c                add    %esi,0x1c(%rax)

After:

    37e7:       c7 43 08 ff 7f 00 00    movl   $0x7fff,0x8(%rbx)
    37ee:       b9 00 80 00 00          mov    $0x8000,%ecx
    37f3:       ba 00 80 00 00          mov    $0x8000,%edx
...
    3806:       8b 73 08                mov    0x8(%rbx),%esi
    3809:       01 70 1c                add    %esi,0x1c(%rax)

Of course, it's shorter because it's fewer bytes to reload from memory
than it is to put a 32-bit immediate in the instruction stream, but
it's one additional memory reference (cache-hot, of course).  I don't
really care because it's the cold path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-22 17:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 15:31 [PATCH v2 0/8] page_frag_cache improvements Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 1/8] page_frag_cache: Remove pfmemalloc bool Matthew Wilcox
2018-03-22 16:39   ` Alexander Duyck
2018-03-22 17:08     ` Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 2/8] page_frag_cache: Move slowpath code from page_frag_alloc Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 3/8] page_frag_cache: Rename 'nc' to 'pfc' Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 4/8] page_frag_cache: Rename fragsz to size Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 5/8] page_frag_cache: Save memory on small machines Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 6/8] page_frag_cache: Use a mask instead of offset Matthew Wilcox
2018-03-22 16:22   ` Alexander Duyck
2018-03-22 16:41     ` Matthew Wilcox
2018-03-22 17:31       ` Alexander Duyck
2018-03-22 17:34       ` Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 7/8] page_frag: Update documentation Matthew Wilcox
2018-03-22 15:31 ` [PATCH v2 8/8] page_frag: Account allocations Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).