linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND 0/2] free order-0 pages through PCP in page_frag_free() and cleanup
@ 2018-11-19 13:48 Aaron Lu
  2018-11-19 13:48 ` [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
  2018-11-19 13:48 ` [PATCH v3 RESEND 2/2] mm/page_alloc: use a single function to free page Aaron Lu
  0 siblings, 2 replies; 8+ messages in thread
From: Aaron Lu @ 2018-11-19 13:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, netdev
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck, Ian Kumlien

This is a resend of the two patches.

Patch 1 is the same as:
[PATCH v2 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
https://lkml.kernel.org/r/20181106052833.GC6203@intel.com
With one more ack from Tariq Toukan.

Patch 2 is the same as:
[PATCH v3 2/2] mm/page_alloc: use a single function to free page
https://lkml.kernel.org/r/20181106113149.GC24198@intel.com
With some changelog rewording.

Applies on top of v4.20-rc2-mmotm-2018-11-16-14-52.

Aaron Lu (2):
  mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  mm/page_alloc: use a single function to free page

 mm/page_alloc.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-19 13:48 [PATCH RESEND 0/2] free order-0 pages through PCP in page_frag_free() and cleanup Aaron Lu
@ 2018-11-19 13:48 ` Aaron Lu
  2018-11-19 15:00   ` Tariq Toukan
  2018-11-20  1:45   ` [PATCH v2 RESEND update " Aaron Lu
  2018-11-19 13:48 ` [PATCH v3 RESEND 2/2] mm/page_alloc: use a single function to free page Aaron Lu
  1 sibling, 2 replies; 8+ messages in thread
From: Aaron Lu @ 2018-11-19 13:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, netdev
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck, Ian Kumlien

page_frag_free() calls __free_pages_ok() to free the page back to
Buddy. This is OK for high order page, but for order-0 pages, it
misses the optimization opportunity of using Per-Cpu-Pages and can
cause zone lock contention when called frequently.

Paweł Staszewski recently shared his result of 'how Linux kernel
handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
found the lock contention comes from page allocator:

  mlx5e_poll_tx_cq
  |
   --16.34%--napi_consume_skb
             |
             |--12.65%--__free_pages_ok
             |          |
             |           --11.86%--free_one_page
             |                     |
             |                     |--10.10%--queued_spin_lock_slowpath
             |                     |
             |                      --0.65%--_raw_spin_lock
             |
             |--1.55%--page_frag_free
             |
              --1.44%--skb_release_data

Jesper explained how it happened: mlx5 driver RX-page recycle
mechanism is not effective in this workload and pages have to go
through the page allocator. The lock contention happens during
mlx5 DMA TX completion cycle. And the page allocator cannot keep
up at these speeds.[2]

I thought that __free_pages_ok() are mostly freeing high order
pages and thought this is an lock contention for high order pages
but Jesper explained in detail that __free_pages_ok() here are
actually freeing order-0 pages because mlx5 is using order-0 pages
to satisfy its page pool allocation request.[3]

The free path as pointed out by Jesper is:
skb_free_head()
  -> skb_free_frag()
    -> page_frag_free()
And the pages being freed on this path are order-0 pages.

Fix this by doing similar things as in __page_frag_cache_drain() -
send the being freed page to PCP if it's an order-0 page, or
directly to Buddy if it is a high order page.

With this change, Paweł hasn't noticed lock contention yet in
his workload and Jesper has noticed a 7% performance improvement
using a micro benchmark and lock contention is gone. Ilias' test
on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
performance boost testing with 64byte packets and __free_pages_ok()
disappeared from perf top.

[1]: https://www.spinics.net/lists/netdev/msg531362.html
[2]: https://www.spinics.net/lists/netdev/msg531421.html
[3]: https://www.spinics.net/lists/netdev/msg531556.html

Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/page_alloc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 421c5b652708..8f8c6b33b637 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4677,8 +4677,14 @@ void page_frag_free(void *addr)
 {
 	struct page *page = virt_to_head_page(addr);
 
-	if (unlikely(put_page_testzero(page)))
-		__free_pages_ok(page, compound_order(page));
+	if (unlikely(put_page_testzero(page))) {
+		unsigned int order = compound_order(page);
+
+		if (order == 0)
+			free_unref_page(page);
+		else
+			__free_pages_ok(page, order);
+	}
 }
 EXPORT_SYMBOL(page_frag_free);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 RESEND 2/2] mm/page_alloc: use a single function to free page
  2018-11-19 13:48 [PATCH RESEND 0/2] free order-0 pages through PCP in page_frag_free() and cleanup Aaron Lu
  2018-11-19 13:48 ` [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
@ 2018-11-19 13:48 ` Aaron Lu
  1 sibling, 0 replies; 8+ messages in thread
From: Aaron Lu @ 2018-11-19 13:48 UTC (permalink / raw)
  To: linux-mm, linux-kernel, netdev
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck, Ian Kumlien

There are multiple places of freeing a page, they all do the same
things so a common function can be used to reduce code duplicate.

It also avoids bug fixed in one function but left in another.

Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/page_alloc.c | 37 ++++++++++++++-----------------------
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8f8c6b33b637..93cc8e686eca 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4547,16 +4547,19 @@ unsigned long get_zeroed_page(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(get_zeroed_page);
 
-void __free_pages(struct page *page, unsigned int order)
+static inline void free_the_page(struct page *page, unsigned int order)
 {
-	if (put_page_testzero(page)) {
-		if (order == 0)
-			free_unref_page(page);
-		else
-			__free_pages_ok(page, order);
-	}
+	if (order == 0)
+		free_unref_page(page);
+	else
+		__free_pages_ok(page, order);
 }
 
+void __free_pages(struct page *page, unsigned int order)
+{
+	if (put_page_testzero(page))
+		free_the_page(page, order);
+}
 EXPORT_SYMBOL(__free_pages);
 
 void free_pages(unsigned long addr, unsigned int order)
@@ -4605,14 +4608,8 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 {
 	VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
 
-	if (page_ref_sub_and_test(page, count)) {
-		unsigned int order = compound_order(page);
-
-		if (order == 0)
-			free_unref_page(page);
-		else
-			__free_pages_ok(page, order);
-	}
+	if (page_ref_sub_and_test(page, count))
+		free_the_page(page, compound_order(page));
 }
 EXPORT_SYMBOL(__page_frag_cache_drain);
 
@@ -4677,14 +4674,8 @@ void page_frag_free(void *addr)
 {
 	struct page *page = virt_to_head_page(addr);
 
-	if (unlikely(put_page_testzero(page))) {
-		unsigned int order = compound_order(page);
-
-		if (order == 0)
-			free_unref_page(page);
-		else
-			__free_pages_ok(page, order);
-	}
+	if (unlikely(put_page_testzero(page)))
+		free_the_page(page, compound_order(page));
 }
 EXPORT_SYMBOL(page_frag_free);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-19 13:48 ` [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
@ 2018-11-19 15:00   ` Tariq Toukan
  2018-11-20  1:43     ` Aaron Lu
  2018-11-20  1:45   ` [PATCH v2 RESEND update " Aaron Lu
  1 sibling, 1 reply; 8+ messages in thread
From: Tariq Toukan @ 2018-11-19 15:00 UTC (permalink / raw)
  To: Aaron Lu, linux-mm, linux-kernel, netdev
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck, Ian Kumlien



On 19/11/2018 3:48 PM, Aaron Lu wrote:
> page_frag_free() calls __free_pages_ok() to free the page back to
> Buddy. This is OK for high order page, but for order-0 pages, it
> misses the optimization opportunity of using Per-Cpu-Pages and can
> cause zone lock contention when called frequently.
> 
> Paweł Staszewski recently shared his result of 'how Linux kernel
> handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
> found the lock contention comes from page allocator:
> 
>    mlx5e_poll_tx_cq
>    |
>     --16.34%--napi_consume_skb
>               |
>               |--12.65%--__free_pages_ok
>               |          |
>               |           --11.86%--free_one_page
>               |                     |
>               |                     |--10.10%--queued_spin_lock_slowpath
>               |                     |
>               |                      --0.65%--_raw_spin_lock
>               |
>               |--1.55%--page_frag_free
>               |
>                --1.44%--skb_release_data
> 
> Jesper explained how it happened: mlx5 driver RX-page recycle
> mechanism is not effective in this workload and pages have to go
> through the page allocator. The lock contention happens during
> mlx5 DMA TX completion cycle. And the page allocator cannot keep
> up at these speeds.[2]
> 
> I thought that __free_pages_ok() are mostly freeing high order
> pages and thought this is an lock contention for high order pages
> but Jesper explained in detail that __free_pages_ok() here are
> actually freeing order-0 pages because mlx5 is using order-0 pages
> to satisfy its page pool allocation request.[3]
> 
> The free path as pointed out by Jesper is:
> skb_free_head()
>    -> skb_free_frag()
>      -> page_frag_free()
> And the pages being freed on this path are order-0 pages.
> 
> Fix this by doing similar things as in __page_frag_cache_drain() -
> send the being freed page to PCP if it's an order-0 page, or
> directly to Buddy if it is a high order page.
> 
> With this change, Paweł hasn't noticed lock contention yet in
> his workload and Jesper has noticed a 7% performance improvement
> using a micro benchmark and lock contention is gone. Ilias' test
> on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
> performance boost testing with 64byte packets and __free_pages_ok()
> disappeared from perf top.
> 
> [1]: https://www.spinics.net/lists/netdev/msg531362.html
> [2]: https://www.spinics.net/lists/netdev/msg531421.html
> [3]: https://www.spinics.net/lists/netdev/msg531556.html
> 
> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Acked-by: Tariq Toukan <tariqt@mellanox.com

missing '>' sign in my email tag.

> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-19 15:00   ` Tariq Toukan
@ 2018-11-20  1:43     ` Aaron Lu
  0 siblings, 0 replies; 8+ messages in thread
From: Aaron Lu @ 2018-11-20  1:43 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: linux-mm, linux-kernel, netdev, Andrew Morton,
	Paweł Staszewski, Jesper Dangaard Brouer, Eric Dumazet,
	Ilias Apalodimas, Yoel Caspersen, Mel Gorman, Saeed Mahameed,
	Michal Hocko, Vlastimil Babka, Dave Hansen, Alexander Duyck,
	Ian Kumlien

On Mon, Nov 19, 2018 at 03:00:53PM +0000, Tariq Toukan wrote:
> 
> 
> On 19/11/2018 3:48 PM, Aaron Lu wrote:
> > page_frag_free() calls __free_pages_ok() to free the page back to
> > Buddy. This is OK for high order page, but for order-0 pages, it
> > misses the optimization opportunity of using Per-Cpu-Pages and can
> > cause zone lock contention when called frequently.
> > 
> > Paweł Staszewski recently shared his result of 'how Linux kernel
> > handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
> > found the lock contention comes from page allocator:
> > 
> >    mlx5e_poll_tx_cq
> >    |
> >     --16.34%--napi_consume_skb
> >               |
> >               |--12.65%--__free_pages_ok
> >               |          |
> >               |           --11.86%--free_one_page
> >               |                     |
> >               |                     |--10.10%--queued_spin_lock_slowpath
> >               |                     |
> >               |                      --0.65%--_raw_spin_lock
> >               |
> >               |--1.55%--page_frag_free
> >               |
> >                --1.44%--skb_release_data
> > 
> > Jesper explained how it happened: mlx5 driver RX-page recycle
> > mechanism is not effective in this workload and pages have to go
> > through the page allocator. The lock contention happens during
> > mlx5 DMA TX completion cycle. And the page allocator cannot keep
> > up at these speeds.[2]
> > 
> > I thought that __free_pages_ok() are mostly freeing high order
> > pages and thought this is an lock contention for high order pages
> > but Jesper explained in detail that __free_pages_ok() here are
> > actually freeing order-0 pages because mlx5 is using order-0 pages
> > to satisfy its page pool allocation request.[3]
> > 
> > The free path as pointed out by Jesper is:
> > skb_free_head()
> >    -> skb_free_frag()
> >      -> page_frag_free()
> > And the pages being freed on this path are order-0 pages.
> > 
> > Fix this by doing similar things as in __page_frag_cache_drain() -
> > send the being freed page to PCP if it's an order-0 page, or
> > directly to Buddy if it is a high order page.
> > 
> > With this change, Paweł hasn't noticed lock contention yet in
> > his workload and Jesper has noticed a 7% performance improvement
> > using a micro benchmark and lock contention is gone. Ilias' test
> > on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
> > performance boost testing with 64byte packets and __free_pages_ok()
> > disappeared from perf top.
> > 
> > [1]: https://www.spinics.net/lists/netdev/msg531362.html
> > [2]: https://www.spinics.net/lists/netdev/msg531421.html
> > [3]: https://www.spinics.net/lists/netdev/msg531556.html
> > 
> > Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> > Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Acked-by: Vlastimil Babka <vbabka@suse.cz>
> > Acked-by: Mel Gorman <mgorman@techsingularity.net>
> > Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > Acked-by: Tariq Toukan <tariqt@mellanox.com
> 
> missing '>' sign in my email tag.

Sorry about that, will fix this and resend.
 
> > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > ---

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 RESEND update 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-19 13:48 ` [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
  2018-11-19 15:00   ` Tariq Toukan
@ 2018-11-20  1:45   ` Aaron Lu
  2018-11-20  9:42     ` Pankaj Gupta
  2018-11-22  3:15     ` Andrew Morton
  1 sibling, 2 replies; 8+ messages in thread
From: Aaron Lu @ 2018-11-20  1:45 UTC (permalink / raw)
  To: linux-mm, linux-kernel, netdev
  Cc: Andrew Morton, Paweł Staszewski, Jesper Dangaard Brouer,
	Eric Dumazet, Tariq Toukan, Ilias Apalodimas, Yoel Caspersen,
	Mel Gorman, Saeed Mahameed, Michal Hocko, Vlastimil Babka,
	Dave Hansen, Alexander Duyck, Ian Kumlien

page_frag_free() calls __free_pages_ok() to free the page back to
Buddy. This is OK for high order page, but for order-0 pages, it
misses the optimization opportunity of using Per-Cpu-Pages and can
cause zone lock contention when called frequently.

Paweł Staszewski recently shared his result of 'how Linux kernel
handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
found the lock contention comes from page allocator:

  mlx5e_poll_tx_cq
  |
   --16.34%--napi_consume_skb
             |
             |--12.65%--__free_pages_ok
             |          |
             |           --11.86%--free_one_page
             |                     |
             |                     |--10.10%--queued_spin_lock_slowpath
             |                     |
             |                      --0.65%--_raw_spin_lock
             |
             |--1.55%--page_frag_free
             |
              --1.44%--skb_release_data

Jesper explained how it happened: mlx5 driver RX-page recycle
mechanism is not effective in this workload and pages have to go
through the page allocator. The lock contention happens during
mlx5 DMA TX completion cycle. And the page allocator cannot keep
up at these speeds.[2]

I thought that __free_pages_ok() are mostly freeing high order
pages and thought this is an lock contention for high order pages
but Jesper explained in detail that __free_pages_ok() here are
actually freeing order-0 pages because mlx5 is using order-0 pages
to satisfy its page pool allocation request.[3]

The free path as pointed out by Jesper is:
skb_free_head()
  -> skb_free_frag()
    -> page_frag_free()
And the pages being freed on this path are order-0 pages.

Fix this by doing similar things as in __page_frag_cache_drain() -
send the being freed page to PCP if it's an order-0 page, or
directly to Buddy if it is a high order page.

With this change, Paweł hasn't noticed lock contention yet in
his workload and Jesper has noticed a 7% performance improvement
using a micro benchmark and lock contention is gone. Ilias' test
on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
performance boost testing with 64byte packets and __free_pages_ok()
disappeared from perf top.

[1]: https://www.spinics.net/lists/netdev/msg531362.html
[2]: https://www.spinics.net/lists/netdev/msg531421.html
[3]: https://www.spinics.net/lists/netdev/msg531556.html

Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
update: fix Tariq's email tag.

 mm/page_alloc.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 421c5b652708..8f8c6b33b637 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4677,8 +4677,14 @@ void page_frag_free(void *addr)
 {
 	struct page *page = virt_to_head_page(addr);
 
-	if (unlikely(put_page_testzero(page)))
-		__free_pages_ok(page, compound_order(page));
+	if (unlikely(put_page_testzero(page))) {
+		unsigned int order = compound_order(page);
+
+		if (order == 0)
+			free_unref_page(page);
+		else
+			__free_pages_ok(page, order);
+	}
 }
 EXPORT_SYMBOL(page_frag_free);
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 RESEND update 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-20  1:45   ` [PATCH v2 RESEND update " Aaron Lu
@ 2018-11-20  9:42     ` Pankaj Gupta
  2018-11-22  3:15     ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Pankaj Gupta @ 2018-11-20  9:42 UTC (permalink / raw)
  To: Aaron Lu
  Cc: linux-mm, linux-kernel, netdev, Andrew Morton,
	Paweł Staszewski, Jesper Dangaard Brouer, Eric Dumazet,
	Tariq Toukan, Ilias Apalodimas, Yoel Caspersen, Mel Gorman,
	Saeed Mahameed, Michal Hocko, Vlastimil Babka, Dave Hansen,
	Alexander Duyck, Ian Kumlien


> 
> page_frag_free() calls __free_pages_ok() to free the page back to
> Buddy. This is OK for high order page, but for order-0 pages, it
> misses the optimization opportunity of using Per-Cpu-Pages and can
> cause zone lock contention when called frequently.
> 
> Paweł Staszewski recently shared his result of 'how Linux kernel
> handles normal traffic'[1] and from perf data, Jesper Dangaard Brouer
> found the lock contention comes from page allocator:
> 
>   mlx5e_poll_tx_cq
>   |
>    --16.34%--napi_consume_skb
>              |
>              |--12.65%--__free_pages_ok
>              |          |
>              |           --11.86%--free_one_page
>              |                     |
>              |                     |--10.10%--queued_spin_lock_slowpath
>              |                     |
>              |                      --0.65%--_raw_spin_lock
>              |
>              |--1.55%--page_frag_free
>              |
>               --1.44%--skb_release_data
> 
> Jesper explained how it happened: mlx5 driver RX-page recycle
> mechanism is not effective in this workload and pages have to go
> through the page allocator. The lock contention happens during
> mlx5 DMA TX completion cycle. And the page allocator cannot keep
> up at these speeds.[2]
> 
> I thought that __free_pages_ok() are mostly freeing high order
> pages and thought this is an lock contention for high order pages
> but Jesper explained in detail that __free_pages_ok() here are
> actually freeing order-0 pages because mlx5 is using order-0 pages
> to satisfy its page pool allocation request.[3]
> 
> The free path as pointed out by Jesper is:
> skb_free_head()
>   -> skb_free_frag()
>     -> page_frag_free()
> And the pages being freed on this path are order-0 pages.
> 
> Fix this by doing similar things as in __page_frag_cache_drain() -
> send the being freed page to PCP if it's an order-0 page, or
> directly to Buddy if it is a high order page.
> 
> With this change, Paweł hasn't noticed lock contention yet in
> his workload and Jesper has noticed a 7% performance improvement
> using a micro benchmark and lock contention is gone. Ilias' test
> on a 'low' speed 1Gbit interface on an cortex-a53 shows ~11%
> performance boost testing with 64byte packets and __free_pages_ok()
> disappeared from perf top.
> 
> [1]: https://www.spinics.net/lists/netdev/msg531362.html
> [2]: https://www.spinics.net/lists/netdev/msg531421.html
> [3]: https://www.spinics.net/lists/netdev/msg531556.html
> 
> Reported-by: Paweł Staszewski <pstaszewski@itcare.pl>
> Analysed-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> Acked-by: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
> update: fix Tariq's email tag.
> 
>  mm/page_alloc.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 421c5b652708..8f8c6b33b637 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4677,8 +4677,14 @@ void page_frag_free(void *addr)
>  {
>  	struct page *page = virt_to_head_page(addr);
>  
> -	if (unlikely(put_page_testzero(page)))
> -		__free_pages_ok(page, compound_order(page));
> +	if (unlikely(put_page_testzero(page))) {
> +		unsigned int order = compound_order(page);
> +
> +		if (order == 0)
> +			free_unref_page(page);
> +		else
> +			__free_pages_ok(page, order);
> +	}
>  }
>  EXPORT_SYMBOL(page_frag_free);
>  
> --
> 2.17.2

A good optimization for zero order allocations.  
Acked-by: Pankaj gupta <pagupta@redhat.com>
 
Thanks,
Pankaj

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 RESEND update 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free()
  2018-11-20  1:45   ` [PATCH v2 RESEND update " Aaron Lu
  2018-11-20  9:42     ` Pankaj Gupta
@ 2018-11-22  3:15     ` Andrew Morton
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2018-11-22  3:15 UTC (permalink / raw)
  To: Aaron Lu
  Cc: linux-mm, linux-kernel, netdev, Paweł Staszewski,
	Jesper Dangaard Brouer, Eric Dumazet, Tariq Toukan,
	Ilias Apalodimas, Yoel Caspersen, Mel Gorman, Saeed Mahameed,
	Michal Hocko, Vlastimil Babka, Dave Hansen, Alexander Duyck,
	Ian Kumlien

On Tue, 20 Nov 2018 09:45:44 +0800 Aaron Lu <aaron.lu@intel.com> wrote:

> page_frag_free() calls __free_pages_ok() to free the page back to
> Buddy. This is OK for high order page, but for order-0 pages, it
> misses the optimization opportunity of using Per-Cpu-Pages and can
> cause zone lock contention when called frequently.
> 

Looks nice to me.  Let's tell our readers why we're doing this.

--- a/mm/page_alloc.c~mm-page_alloc-free-order-0-pages-through-pcp-in-page_frag_free-fix
+++ a/mm/page_alloc.c
@@ -4684,7 +4684,7 @@ void page_frag_free(void *addr)
 	if (unlikely(put_page_testzero(page))) {
 		unsigned int order = compound_order(page);
 
-		if (order == 0)
+		if (order == 0)		/* Via pcp? */
 			free_unref_page(page);
 		else
 			__free_pages_ok(page, order);
_


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-11-22  3:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-19 13:48 [PATCH RESEND 0/2] free order-0 pages through PCP in page_frag_free() and cleanup Aaron Lu
2018-11-19 13:48 ` [PATCH v2 RESEND 1/2] mm/page_alloc: free order-0 pages through PCP in page_frag_free() Aaron Lu
2018-11-19 15:00   ` Tariq Toukan
2018-11-20  1:43     ` Aaron Lu
2018-11-20  1:45   ` [PATCH v2 RESEND update " Aaron Lu
2018-11-20  9:42     ` Pankaj Gupta
2018-11-22  3:15     ` Andrew Morton
2018-11-19 13:48 ` [PATCH v3 RESEND 2/2] mm/page_alloc: use a single function to free page Aaron Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).