linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] mm: improve zone->lock scalability
@ 2018-02-26 13:53 Aaron Lu
  2018-02-26 13:53 ` [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Aaron Lu @ 2018-02-26 13:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Huang Ying, Dave Hansen, Kemi Wang, Tim Chen,
	Andi Kleen, Michal Hocko, Vlastimil Babka, Mel Gorman,
	Matthew Wilcox

Patch 1/3 is a small cleanup suggested by Matthew Wilcox which doesn't
affect scalability or performance;
Patch 2/3 moves some code in free_pcppages_bulk() outside of zone->lock
and has Mel Gorman's ack;
Patch 3/3 uses prefetch in free_pcppages_bulk() outside of zone->lock to
speedup page merging under zone->lock but Mel Gorman has some concerns.

For details, please see their changelogs.

Changes from v2:
Patch 1/3 is newly added;
Patch 2/3 is patch 1/2 in v2 and doesn't have any change except resolving
conflicts due to the newly added patch 1/3;
Patch 3/3 is patch 2/2 in v2 and only has some changelog updates on
concerns part.

v1 and v2 was here:
https://lkml.org/lkml/2018/1/23/879

Aaron Lu (3):
  mm/free_pcppages_bulk: update pcp->count inside
  mm/free_pcppages_bulk: do not hold lock when picking pages to free
  mm/free_pcppages_bulk: prefetch buddy while not holding lock

 mm/page_alloc.c | 56 +++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 35 insertions(+), 21 deletions(-)

-- 
2.14.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside
  2018-02-26 13:53 [PATCH v3 0/3] mm: improve zone->lock scalability Aaron Lu
@ 2018-02-26 13:53 ` Aaron Lu
  2018-02-26 21:48   ` David Rientjes
  2018-02-26 13:53 ` [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
  2018-02-26 13:53 ` [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu
  2 siblings, 1 reply; 10+ messages in thread
From: Aaron Lu @ 2018-02-26 13:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Huang Ying, Dave Hansen, Kemi Wang, Tim Chen,
	Andi Kleen, Michal Hocko, Vlastimil Babka, Mel Gorman,
	Matthew Wilcox

Matthew Wilcox found that all callers of free_pcppages_bulk() currently
update pcp->count immediately after so it's natural to do it inside
free_pcppages_bulk().

No functionality or performance change is expected from this patch.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/page_alloc.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..3154859cccd6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int batch_free = 0;
 	bool isolated_pageblocks;
 
+	pcp->count -= count;
 	spin_lock(&zone->lock);
 	isolated_pageblocks = has_isolate_pageblock(zone);
 
@@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 	local_irq_save(flags);
 	batch = READ_ONCE(pcp->batch);
 	to_drain = min(pcp->count, batch);
-	if (to_drain > 0) {
+	if (to_drain > 0)
 		free_pcppages_bulk(zone, to_drain, pcp);
-		pcp->count -= to_drain;
-	}
 	local_irq_restore(flags);
 }
 #endif
@@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 	pset = per_cpu_ptr(zone->pageset, cpu);
 
 	pcp = &pset->pcp;
-	if (pcp->count) {
+	if (pcp->count)
 		free_pcppages_bulk(zone, pcp->count, pcp);
-		pcp->count = 0;
-	}
 	local_irq_restore(flags);
 }
 
@@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);
 		free_pcppages_bulk(zone, batch, pcp);
-		pcp->count -= batch;
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
  2018-02-26 13:53 [PATCH v3 0/3] mm: improve zone->lock scalability Aaron Lu
  2018-02-26 13:53 ` [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
@ 2018-02-26 13:53 ` Aaron Lu
  2018-02-26 21:53   ` David Rientjes
  2018-02-26 13:53 ` [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu
  2 siblings, 1 reply; 10+ messages in thread
From: Aaron Lu @ 2018-02-26 13:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Huang Ying, Dave Hansen, Kemi Wang, Tim Chen,
	Andi Kleen, Michal Hocko, Vlastimil Babka, Mel Gorman,
	Matthew Wilcox

When freeing a batch of pages from Per-CPU-Pages(PCP) back to buddy,
the zone->lock is held and then pages are chosen from PCP's migratetype
list. While there is actually no need to do this 'choose part' under
lock since it's PCP pages, the only CPU that can touch them is us and
irq is also disabled.

Moving this part outside could reduce lock held time and improve
performance. Test with will-it-scale/page_fault1 full load:

kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
v4.16-rc2+  9034215        7971818       13667135       15677465
this patch  9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%

What the test does is: starts $nr_cpu processes and each will repeatedly
do the following for 5 minutes:
1 mmap 128M anonymouse space;
2 write access to that space;
3 munmap.
The score is the aggregated iteration.

https://github.com/antonblanchard/will-it-scale/blob/master/tests/page_fault1.c

Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/page_alloc.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3154859cccd6..35576da0a6c9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	int migratetype = 0;
 	int batch_free = 0;
 	bool isolated_pageblocks;
+	struct page *page, *tmp;
+	LIST_HEAD(head);
 
 	pcp->count -= count;
-	spin_lock(&zone->lock);
-	isolated_pageblocks = has_isolate_pageblock(zone);
-
 	while (count) {
-		struct page *page;
 		struct list_head *list;
 
 		/*
@@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			batch_free = count;
 
 		do {
-			int mt;	/* migratetype of the to-be-freed page */
-
 			page = list_last_entry(list, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
 
-			mt = get_pcppage_migratetype(page);
-			/* MIGRATE_ISOLATE page should not go to pcplists */
-			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
-			/* Pageblock could have been isolated meanwhile */
-			if (unlikely(isolated_pageblocks))
-				mt = get_pageblock_migratetype(page);
-
 			if (bulkfree_pcp_prepare(page))
 				continue;
 
-			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
-			trace_mm_page_pcpu_drain(page, 0, mt);
+			list_add_tail(&page->lru, &head);
 		} while (--count && --batch_free && !list_empty(list));
 	}
+
+	spin_lock(&zone->lock);
+	isolated_pageblocks = has_isolate_pageblock(zone);
+
+	list_for_each_entry_safe(page, tmp, &head, lru) {
+		int mt = get_pcppage_migratetype(page);
+		/* MIGRATE_ISOLATE page should not go to pcplists */
+		VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
+		/* Pageblock could have been isolated meanwhile */
+		if (unlikely(isolated_pageblocks))
+			mt = get_pageblock_migratetype(page);
+
+		__free_one_page(page, page_to_pfn(page), zone, 0, mt);
+		trace_mm_page_pcpu_drain(page, 0, mt);
+	}
 	spin_unlock(&zone->lock);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock
  2018-02-26 13:53 [PATCH v3 0/3] mm: improve zone->lock scalability Aaron Lu
  2018-02-26 13:53 ` [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
  2018-02-26 13:53 ` [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
@ 2018-02-26 13:53 ` Aaron Lu
  2 siblings, 0 replies; 10+ messages in thread
From: Aaron Lu @ 2018-02-26 13:53 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Andrew Morton, Huang Ying, Dave Hansen, Kemi Wang, Tim Chen,
	Andi Kleen, Michal Hocko, Vlastimil Babka, Mel Gorman,
	Matthew Wilcox

When a page is freed back to the global pool, its buddy will be checked
to see if it's possible to do a merge. This requires accessing buddy's
page structure and that access could take a long time if it's cache cold.

This patch adds a prefetch to the to-be-freed page's buddy outside of
zone->lock in hope of accessing buddy's page structure later under
zone->lock will be faster. Since we *always* do buddy merging and check
an order-0 page's buddy to try to merge it when it goes into the main
allocator, the cacheline will always come in, i.e. the prefetched data
will never be unused.

In the meantime, there are two concerns:
1 the prefetch could potentially evict existing cachelines, especially
  for L1D cache since it is not huge;
2 there is some additional instruction overhead, namely calculating
  buddy pfn twice.

For 1, it's hard to say, this microbenchmark though shows good result but
the actual benefit of this patch will be workload/CPU dependant;
For 2, since the calculation is a XOR on two local variables, it's expected
in many cases that cycles spent will be offset by reduced memory latency
later. This is especially true for NUMA machines where multiple CPUs are
contending on zone->lock and the most time consuming part under zone->lock
is the wait of 'struct page' cacheline of the to-be-freed pages and their
buddies.

Test with will-it-scale/page_fault1 full load:

kernel      Broadwell(2S)  Skylake(2S)   Broadwell(4S)  Skylake(4S)
v4.16-rc2+  9034215        7971818       13667135       15677465
patch2/3    9536374 +5.6%  8314710 +4.3% 14070408 +3.0% 16675866 +6.4%
this patch 10338868 +8.4%  8544477 +2.8% 14839808 +5.5% 17155464 +2.9%
Note: this patch's performance improvement percent is against patch2/3.

[changelog stole from Dave Hansen and Mel Gorman's comments]
https://lkml.org/lkml/2018/1/24/551
Suggested-by: Ying Huang <ying.huang@intel.com>
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
---
 mm/page_alloc.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35576da0a6c9..dc3b89894f2c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1142,6 +1142,9 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			batch_free = count;
 
 		do {
+			unsigned long pfn, buddy_pfn;
+			struct page *buddy;
+
 			page = list_last_entry(list, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
@@ -1150,6 +1153,18 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 				continue;
 
 			list_add_tail(&page->lru, &head);
+
+			/*
+			 * We are going to put the page back to the global
+			 * pool, prefetch its buddy to speed up later access
+			 * under zone->lock. It is believed the overhead of
+			 * calculating buddy_pfn here can be offset by reduced
+			 * memory latency later.
+			 */
+			pfn = page_to_pfn(page);
+			buddy_pfn = __find_buddy_pfn(pfn, 0);
+			buddy = page + (buddy_pfn - pfn);
+			prefetch(buddy);
 		} while (--count && --batch_free && !list_empty(list));
 	}
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside
  2018-02-26 13:53 ` [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
@ 2018-02-26 21:48   ` David Rientjes
  2018-02-27  1:56     ` Aaron Lu
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2018-02-26 21:48 UTC (permalink / raw)
  To: Aaron Lu
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Mon, 26 Feb 2018, Aaron Lu wrote:

> Matthew Wilcox found that all callers of free_pcppages_bulk() currently
> update pcp->count immediately after so it's natural to do it inside
> free_pcppages_bulk().
> 
> No functionality or performance change is expected from this patch.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> ---
>  mm/page_alloc.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index cb416723538f..3154859cccd6 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	int batch_free = 0;
>  	bool isolated_pageblocks;
>  
> +	pcp->count -= count;
>  	spin_lock(&zone->lock);
>  	isolated_pageblocks = has_isolate_pageblock(zone);
>  

Why modify pcp->count before the pages have actually been freed?

I doubt that it matters too much, but at least /proc/zoneinfo uses 
zone->lock.  I think it should be done after the lock is dropped.

Otherwise, looks good.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
  2018-02-26 13:53 ` [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
@ 2018-02-26 21:53   ` David Rientjes
  2018-02-27  2:00     ` Aaron Lu
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2018-02-26 21:53 UTC (permalink / raw)
  To: Aaron Lu
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Mon, 26 Feb 2018, Aaron Lu wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3154859cccd6..35576da0a6c9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  	int migratetype = 0;
>  	int batch_free = 0;
>  	bool isolated_pageblocks;
> +	struct page *page, *tmp;
> +	LIST_HEAD(head);
>  
>  	pcp->count -= count;
> -	spin_lock(&zone->lock);
> -	isolated_pageblocks = has_isolate_pageblock(zone);
> -
>  	while (count) {
> -		struct page *page;
>  		struct list_head *list;
>  
>  		/*
> @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count,
>  			batch_free = count;
>  
>  		do {
> -			int mt;	/* migratetype of the to-be-freed page */
> -
>  			page = list_last_entry(list, struct page, lru);
>  			/* must delete as __free_one_page list manipulates */

Looks good in general, but I'm not sure how I reconcile this comment with 
the new implementation that later links page->lru again.

>  			list_del(&page->lru);
>  
> -			mt = get_pcppage_migratetype(page);
> -			/* MIGRATE_ISOLATE page should not go to pcplists */
> -			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> -			/* Pageblock could have been isolated meanwhile */
> -			if (unlikely(isolated_pageblocks))
> -				mt = get_pageblock_migratetype(page);
> -
>  			if (bulkfree_pcp_prepare(page))
>  				continue;
>  
> -			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> -			trace_mm_page_pcpu_drain(page, 0, mt);
> +			list_add_tail(&page->lru, &head);
>  		} while (--count && --batch_free && !list_empty(list));
>  	}
> +
> +	spin_lock(&zone->lock);
> +	isolated_pageblocks = has_isolate_pageblock(zone);
> +
> +	list_for_each_entry_safe(page, tmp, &head, lru) {
> +		int mt = get_pcppage_migratetype(page);
> +		/* MIGRATE_ISOLATE page should not go to pcplists */
> +		VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> +		/* Pageblock could have been isolated meanwhile */
> +		if (unlikely(isolated_pageblocks))
> +			mt = get_pageblock_migratetype(page);
> +
> +		__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> +		trace_mm_page_pcpu_drain(page, 0, mt);
> +	}
>  	spin_unlock(&zone->lock);
>  }
>  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside
  2018-02-26 21:48   ` David Rientjes
@ 2018-02-27  1:56     ` Aaron Lu
  2018-02-27  3:12       ` Aaron Lu
  0 siblings, 1 reply; 10+ messages in thread
From: Aaron Lu @ 2018-02-27  1:56 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote:
> On Mon, 26 Feb 2018, Aaron Lu wrote:
> 
> > Matthew Wilcox found that all callers of free_pcppages_bulk() currently
> > update pcp->count immediately after so it's natural to do it inside
> > free_pcppages_bulk().
> > 
> > No functionality or performance change is expected from this patch.
> > 
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > ---
> >  mm/page_alloc.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index cb416723538f..3154859cccd6 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  	int batch_free = 0;
> >  	bool isolated_pageblocks;
> >  
> > +	pcp->count -= count;
> >  	spin_lock(&zone->lock);
> >  	isolated_pageblocks = has_isolate_pageblock(zone);
> >  
> 
> Why modify pcp->count before the pages have actually been freed?

When count is still count and not zero after pages have actually been
freed :-)

> 
> I doubt that it matters too much, but at least /proc/zoneinfo uses 
> zone->lock.  I think it should be done after the lock is dropped.

Agree that it looks a bit weird to do it beforehand and I just want to
avoid adding one more local variable here.

pcp->count is not protected by zone->lock though so even we do it after
dropping the lock, it could still happen that zoneinfo shows a wrong
value of pcp->count while it should be zero(this isn't a problem since
zoneinfo doesn't need to be precise).

Anyway, I'll follow your suggestion here to avoid confusion.
 
> Otherwise, looks good.

Thanks for taking a look at this.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
  2018-02-26 21:53   ` David Rientjes
@ 2018-02-27  2:00     ` Aaron Lu
  2018-02-27  3:17       ` Aaron Lu
  0 siblings, 1 reply; 10+ messages in thread
From: Aaron Lu @ 2018-02-27  2:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote:
> On Mon, 26 Feb 2018, Aaron Lu wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3154859cccd6..35576da0a6c9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1116,13 +1116,11 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  	int migratetype = 0;
> >  	int batch_free = 0;
> >  	bool isolated_pageblocks;
> > +	struct page *page, *tmp;
> > +	LIST_HEAD(head);
> >  
> >  	pcp->count -= count;
> > -	spin_lock(&zone->lock);
> > -	isolated_pageblocks = has_isolate_pageblock(zone);
> > -
> >  	while (count) {
> > -		struct page *page;
> >  		struct list_head *list;
> >  
> >  		/*
> > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> >  			batch_free = count;
> >  
> >  		do {
> > -			int mt;	/* migratetype of the to-be-freed page */
> > -
> >  			page = list_last_entry(list, struct page, lru);
> >  			/* must delete as __free_one_page list manipulates */
> 
> Looks good in general, but I'm not sure how I reconcile this comment with 
> the new implementation that later links page->lru again.

Thanks for pointing this out.

I think the comment is useless now since there is a list_add_tail right
below so it's obvious we need to take the page off its original list.
I'll remove the comment in an update.

> 
> >  			list_del(&page->lru);
> >  
> > -			mt = get_pcppage_migratetype(page);
> > -			/* MIGRATE_ISOLATE page should not go to pcplists */
> > -			VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
> > -			/* Pageblock could have been isolated meanwhile */
> > -			if (unlikely(isolated_pageblocks))
> > -				mt = get_pageblock_migratetype(page);
> > -
> >  			if (bulkfree_pcp_prepare(page))
> >  				continue;
> >  
> > -			__free_one_page(page, page_to_pfn(page), zone, 0, mt);
> > -			trace_mm_page_pcpu_drain(page, 0, mt);
> > +			list_add_tail(&page->lru, &head);
> >  		} while (--count && --batch_free && !list_empty(list));
> >  	}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside
  2018-02-27  1:56     ` Aaron Lu
@ 2018-02-27  3:12       ` Aaron Lu
  0 siblings, 0 replies; 10+ messages in thread
From: Aaron Lu @ 2018-02-27  3:12 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Tue, Feb 27, 2018 at 09:56:13AM +0800, Aaron Lu wrote:
> On Mon, Feb 26, 2018 at 01:48:14PM -0800, David Rientjes wrote:
> > On Mon, 26 Feb 2018, Aaron Lu wrote:
> > 
> > > Matthew Wilcox found that all callers of free_pcppages_bulk() currently
> > > update pcp->count immediately after so it's natural to do it inside
> > > free_pcppages_bulk().
> > > 
> > > No functionality or performance change is expected from this patch.
> > > 
> > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > > ---
> > >  mm/page_alloc.c | 10 +++-------
> > >  1 file changed, 3 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index cb416723538f..3154859cccd6 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1117,6 +1117,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > >  	int batch_free = 0;
> > >  	bool isolated_pageblocks;
> > >  
> > > +	pcp->count -= count;
> > >  	spin_lock(&zone->lock);
> > >  	isolated_pageblocks = has_isolate_pageblock(zone);
> > >  
> > 
> > Why modify pcp->count before the pages have actually been freed?
> 
> When count is still count and not zero after pages have actually been
> freed :-)
> 
> > 
> > I doubt that it matters too much, but at least /proc/zoneinfo uses 
> > zone->lock.  I think it should be done after the lock is dropped.
> 
> Agree that it looks a bit weird to do it beforehand and I just want to
> avoid adding one more local variable here.
> 
> pcp->count is not protected by zone->lock though so even we do it after
> dropping the lock, it could still happen that zoneinfo shows a wrong
> value of pcp->count while it should be zero(this isn't a problem since
> zoneinfo doesn't need to be precise).
> 
> Anyway, I'll follow your suggestion here to avoid confusion.

What about this: update pcp->count when page is dropped off pcp list.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb416723538f..faa33eac1635 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1148,6 +1148,7 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 			page = list_last_entry(list, struct page, lru);
 			/* must delete as __free_one_page list manipulates */
 			list_del(&page->lru);
+			pcp->count--;
 
 			mt = get_pcppage_migratetype(page);
 			/* MIGRATE_ISOLATE page should not go to pcplists */
@@ -2416,10 +2417,8 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 	local_irq_save(flags);
 	batch = READ_ONCE(pcp->batch);
 	to_drain = min(pcp->count, batch);
-	if (to_drain > 0) {
+	if (to_drain > 0)
 		free_pcppages_bulk(zone, to_drain, pcp);
-		pcp->count -= to_drain;
-	}
 	local_irq_restore(flags);
 }
 #endif
@@ -2441,10 +2440,8 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 	pset = per_cpu_ptr(zone->pageset, cpu);
 
 	pcp = &pset->pcp;
-	if (pcp->count) {
+	if (pcp->count)
 		free_pcppages_bulk(zone, pcp->count, pcp);
-		pcp->count = 0;
-	}
 	local_irq_restore(flags);
 }
 
@@ -2668,7 +2665,6 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);
 		free_pcppages_bulk(zone, batch, pcp);
-		pcp->count -= batch;
 	}
 }

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free
  2018-02-27  2:00     ` Aaron Lu
@ 2018-02-27  3:17       ` Aaron Lu
  0 siblings, 0 replies; 10+ messages in thread
From: Aaron Lu @ 2018-02-27  3:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, linux-kernel, Andrew Morton, Huang Ying, Dave Hansen,
	Kemi Wang, Tim Chen, Andi Kleen, Michal Hocko, Vlastimil Babka,
	Mel Gorman, Matthew Wilcox

On Tue, Feb 27, 2018 at 10:00:58AM +0800, Aaron Lu wrote:
> On Mon, Feb 26, 2018 at 01:53:10PM -0800, David Rientjes wrote:
> > On Mon, 26 Feb 2018, Aaron Lu wrote:
> > 
> > > @@ -1144,26 +1142,31 @@ static void free_pcppages_bulk(struct zone *zone, int count,
> > >  			batch_free = count;
> > >  
> > >  		do {
> > > -			int mt;	/* migratetype of the to-be-freed page */
> > > -
> > >  			page = list_last_entry(list, struct page, lru);
> > >  			/* must delete as __free_one_page list manipulates */
> > 
> > Looks good in general, but I'm not sure how I reconcile this comment with 
> > the new implementation that later links page->lru again.
> 
> Thanks for pointing this out.
> 
> I think the comment is useless now since there is a list_add_tail right
> below so it's obvious we need to take the page off its original list.
> I'll remove the comment in an update.

Thinking again, I think I'll change the comment to:

-			/* must delete as __free_one_page list manipulates */
+			/* must delete to avoid corrupting pcp list */
 			list_del(&page->lru);
 			pcp->count--;
 
Meanwhile, I'll add one more comment about why list_for_each_entry_safe
is used:

+	/*
+	 * Use safe version since after __free_one_page(),
+	 * page->lru.next will not point to original list.
+	 */
+	list_for_each_entry_safe(page, tmp, &head, lru) {
+		int mt = get_pcppage_migratetype(page);
+		/* MIGRATE_ISOLATE page should not go to pcplists */
+		VM_BUG_ON_PAGE(is_migrate_isolate(mt), page);
+		/* Pageblock could have been isolated meanwhile */
+		if (unlikely(isolated_pageblocks))
+			mt = get_pageblock_migratetype(page);
+
+		__free_one_page(page, page_to_pfn(page), zone, 0, mt);
+		trace_mm_page_pcpu_drain(page, 0, mt);
+	}
 	spin_unlock(&zone->lock);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-02-27  3:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-26 13:53 [PATCH v3 0/3] mm: improve zone->lock scalability Aaron Lu
2018-02-26 13:53 ` [PATCH v3 1/3] mm/free_pcppages_bulk: update pcp->count inside Aaron Lu
2018-02-26 21:48   ` David Rientjes
2018-02-27  1:56     ` Aaron Lu
2018-02-27  3:12       ` Aaron Lu
2018-02-26 13:53 ` [PATCH v3 2/3] mm/free_pcppages_bulk: do not hold lock when picking pages to free Aaron Lu
2018-02-26 21:53   ` David Rientjes
2018-02-27  2:00     ` Aaron Lu
2018-02-27  3:17       ` Aaron Lu
2018-02-26 13:53 ` [PATCH v3 3/3] mm/free_pcppages_bulk: prefetch buddy while not holding lock Aaron Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).