linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
@ 2021-04-09 12:09 Mel Gorman
  2021-04-09 12:42 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2021-04-09 12:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, LKML, Oscar Salvador, Michal Hocko, Michael S. Tsirkin,
	David Hildenbrand, Vlastimil Babka, Alexander Duyck, Minchan Kim

zone_pcp_reset allegedly protects against a race with drain_pages
using local_irq_save but this is bogus. local_irq_save only operates
on the local CPU. If memory hotplug is running on CPU A and drain_pages
is running on CPU B, disabling IRQs on CPU A does not affect CPU B and
offers no protection.

This patch reorders memory hotremove such that the PCP structures
relevant to the zone are no longer reachable by the time the structures
are freed.  With this reordering, no protection is required to prevent
a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
renamed to zone_pcp_destroy to make it clear that the per-cpu structures
are deleted when the function returns.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/internal.h       |  2 +-
 mm/memory_hotplug.c | 10 +++++++---
 mm/page_alloc.c     | 22 ++++++++++++++++------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 09adf152a10b..cc34ce4461b7 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -203,7 +203,7 @@ extern void free_unref_page(struct page *page);
 extern void free_unref_page_list(struct list_head *list);
 
 extern void zone_pcp_update(struct zone *zone);
-extern void zone_pcp_reset(struct zone *zone);
+extern void zone_pcp_destroy(struct zone *zone);
 extern void zone_pcp_disable(struct zone *zone);
 extern void zone_pcp_enable(struct zone *zone);
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 0cdbbfbc5757..3d059c9f9c2d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1687,12 +1687,16 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages;
 	spin_unlock_irqrestore(&zone->lock, flags);
 
-	zone_pcp_enable(zone);
-
 	/* removal success */
 	adjust_managed_page_count(pfn_to_page(start_pfn), -nr_pages);
 	zone->present_pages -= nr_pages;
 
+	/*
+	 * Restore PCP after managed pages has been updated. Unpopulated
+	 * zones PCP structures will remain unusable.
+	 */
+	zone_pcp_enable(zone);
+
 	pgdat_resize_lock(zone->zone_pgdat, &flags);
 	zone->zone_pgdat->node_present_pages -= nr_pages;
 	pgdat_resize_unlock(zone->zone_pgdat, &flags);
@@ -1700,8 +1704,8 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	init_per_zone_wmark_min();
 
 	if (!populated_zone(zone)) {
-		zone_pcp_reset(zone);
 		build_all_zonelists(NULL);
+		zone_pcp_destroy(zone);
 	} else
 		zone_pcp_update(zone);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5e8aedb64b57..d6c3db853552 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8946,18 +8946,29 @@ void zone_pcp_disable(struct zone *zone)
 
 void zone_pcp_enable(struct zone *zone)
 {
-	__zone_set_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch);
+	/*
+	 * If the zone is populated, restore the high and batch counts.
+	 * If unpopulated, leave the high and batch count as 0 and 1
+	 * respectively as done by zone_pcp_disable. The per-cpu
+	 * structures will later be freed by zone_pcp_destroy.
+	 */
+	if (populated_zone(zone))
+		__zone_set_pageset_high_and_batch(zone, zone->pageset_high, zone->pageset_batch);
+
 	mutex_unlock(&pcp_batch_high_lock);
 }
 
-void zone_pcp_reset(struct zone *zone)
+/*
+ * Called when a zone has been hot-removed. At this point, the PCP has been
+ * drained, disabled and the zone is removed from the zonelists so the
+ * structures are no longer in use. PCP was disabled/drained by
+ * zone_pcp_disable. This function will drain any remaining vmstat deltas.
+ */
+void zone_pcp_destroy(struct zone *zone)
 {
-	unsigned long flags;
 	int cpu;
 	struct per_cpu_pageset *pset;
 
-	/* avoid races with drain_pages()  */
-	local_irq_save(flags);
 	if (zone->pageset != &boot_pageset) {
 		for_each_online_cpu(cpu) {
 			pset = per_cpu_ptr(zone->pageset, cpu);
@@ -8966,7 +8977,6 @@ void zone_pcp_reset(struct zone *zone)
 		free_percpu(zone->pageset);
 		zone->pageset = &boot_pageset;
 	}
-	local_irq_restore(flags);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 12:09 [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove Mel Gorman
@ 2021-04-09 12:42 ` Michal Hocko
  2021-04-09 12:48   ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2021-04-09 12:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> zone_pcp_reset allegedly protects against a race with drain_pages
> using local_irq_save but this is bogus. local_irq_save only operates
> on the local CPU. If memory hotplug is running on CPU A and drain_pages
> is running on CPU B, disabling IRQs on CPU A does not affect CPU B and
> offers no protection.

Yes, the synchronization aspect is bogus indeed.

> This patch reorders memory hotremove such that the PCP structures
> relevant to the zone are no longer reachable by the time the structures
> are freed.  With this reordering, no protection is required to prevent
> a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
> renamed to zone_pcp_destroy to make it clear that the per-cpu structures
> are deleted when the function returns.

Wouldn't it be much easier to simply not destroy/reset pcp of an empty
zone at all? The whole point of this exercise seems to be described in
340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
and simply reinitialize it. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 12:42 ` Michal Hocko
@ 2021-04-09 12:48   ` Michal Hocko
  2021-04-09 13:42     ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2021-04-09 12:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri 09-04-21 14:42:58, Michal Hocko wrote:
> On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> > zone_pcp_reset allegedly protects against a race with drain_pages
> > using local_irq_save but this is bogus. local_irq_save only operates
> > on the local CPU. If memory hotplug is running on CPU A and drain_pages
> > is running on CPU B, disabling IRQs on CPU A does not affect CPU B and
> > offers no protection.
> 
> Yes, the synchronization aspect is bogus indeed.
> 
> > This patch reorders memory hotremove such that the PCP structures
> > relevant to the zone are no longer reachable by the time the structures
> > are freed.  With this reordering, no protection is required to prevent
> > a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
> > renamed to zone_pcp_destroy to make it clear that the per-cpu structures
> > are deleted when the function returns.
> 
> Wouldn't it be much easier to simply not destroy/reset pcp of an empty
> zone at all? The whole point of this exercise seems to be described in
> 340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
> and simply reinitialize it. 

I meant this

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 754026a9164d..7169342f5474 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1810,7 +1818,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages)
 	init_per_zone_wmark_min();
 
 	if (!populated_zone(zone)) {
-		zone_pcp_reset(zone);
 		build_all_zonelists(NULL);
 	} else
 		zone_pcp_update(zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e6a602e82860..b0fdda77e570 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6496,7 +6496,13 @@ void __meminit setup_zone_pageset(struct zone *zone)
 	struct per_cpu_pageset *p;
 	int cpu;
 
-	zone->pageset = alloc_percpu(struct per_cpu_pageset);
+	/*
+	 * zone could have gone completely offline during memory hotplug
+	 * when the pgdat is left behind for simplicity. On a next onlining
+	 * we do not need to reallocate pcp state.
+	 */
+	if (!zone->pageset)
+		zone->pageset = alloc_percpu(struct per_cpu_pageset);
 	for_each_possible_cpu(cpu) {
 		p = per_cpu_ptr(zone->pageset, cpu);
 		pageset_init(p);
@@ -8803,25 +8809,6 @@ void zone_pcp_enable(struct zone *zone)
 	mutex_unlock(&pcp_batch_high_lock);
 }
 
-void zone_pcp_reset(struct zone *zone)
-{
-	unsigned long flags;
-	int cpu;
-	struct per_cpu_pageset *pset;
-
-	/* avoid races with drain_pages()  */
-	local_irq_save(flags);
-	if (zone->pageset != &boot_pageset) {
-		for_each_online_cpu(cpu) {
-			pset = per_cpu_ptr(zone->pageset, cpu);
-			drain_zonestat(zone, pset);
-		}
-		free_percpu(zone->pageset);
-		zone->pageset = &boot_pageset;
-	}
-	local_irq_restore(flags);
-}
-
 #ifdef CONFIG_MEMORY_HOTREMOVE
 /*
  * All pages in the range must be in a single zone, must not contain holes,
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 12:48   ` Michal Hocko
@ 2021-04-09 13:42     ` Mel Gorman
  2021-04-09 14:37       ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Mel Gorman @ 2021-04-09 13:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri, Apr 09, 2021 at 02:48:12PM +0200, Michal Hocko wrote:
> On Fri 09-04-21 14:42:58, Michal Hocko wrote:
> > On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> > > zone_pcp_reset allegedly protects against a race with drain_pages
> > > using local_irq_save but this is bogus. local_irq_save only operates
> > > on the local CPU. If memory hotplug is running on CPU A and drain_pages
> > > is running on CPU B, disabling IRQs on CPU A does not affect CPU B and
> > > offers no protection.
> > 
> > Yes, the synchronization aspect is bogus indeed.
> > 
> > > This patch reorders memory hotremove such that the PCP structures
> > > relevant to the zone are no longer reachable by the time the structures
> > > are freed.  With this reordering, no protection is required to prevent
> > > a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
> > > renamed to zone_pcp_destroy to make it clear that the per-cpu structures
> > > are deleted when the function returns.
> > 
> > Wouldn't it be much easier to simply not destroy/reset pcp of an empty
> > zone at all? The whole point of this exercise seems to be described in
> > 340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
> > and simply reinitialize it. 
> 
> I meant this
> 

It might be simplier but if the intention is to free as much memory
as possible during hot-remove, it seems wasteful to leave the per-cpu
structures behind if we do not have to. If a problem with my patch can
be spotted then I'm happy to go with an alternative fix but there are
two minor issues with your proposed fix.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e6a602e82860..b0fdda77e570 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6496,7 +6496,13 @@ void __meminit setup_zone_pageset(struct zone *zone)
>  	struct per_cpu_pageset *p;
>  	int cpu;
>  
> -	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> +	/*
> +	 * zone could have gone completely offline during memory hotplug
> +	 * when the pgdat is left behind for simplicity. On a next onlining
> +	 * we do not need to reallocate pcp state.
> +	 */
> +	if (!zone->pageset)
> +		zone->pageset = alloc_percpu(struct per_cpu_pageset);

Should be "if (zone->pageset != &boot_pageset)" ?


>  	for_each_possible_cpu(cpu) {
>  		p = per_cpu_ptr(zone->pageset, cpu);
>  		pageset_init(p);
> @@ -8803,25 +8809,6 @@ void zone_pcp_enable(struct zone *zone)
>  	mutex_unlock(&pcp_batch_high_lock);
>  }
>  
> -void zone_pcp_reset(struct zone *zone)
> -{
> -	unsigned long flags;
> -	int cpu;
> -	struct per_cpu_pageset *pset;
> -
> -	/* avoid races with drain_pages()  */
> -	local_irq_save(flags);
> -	if (zone->pageset != &boot_pageset) {
> -		for_each_online_cpu(cpu) {
> -			pset = per_cpu_ptr(zone->pageset, cpu);
> -			drain_zonestat(zone, pset);
> -		}
> -		free_percpu(zone->pageset);
> -		zone->pageset = &boot_pageset;
> -	}
> -	local_irq_restore(flags);
> -}
> -

zone_pcp_reset still needs to exist to drain the remaining vmstats or
it'll break 5a883813845a ("memory-hotplug: fix zone stat
mismatch").

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 13:42     ` Mel Gorman
@ 2021-04-09 14:37       ` Michal Hocko
  2021-04-09 15:12         ` Mel Gorman
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2021-04-09 14:37 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri 09-04-21 14:42:21, Mel Gorman wrote:
> On Fri, Apr 09, 2021 at 02:48:12PM +0200, Michal Hocko wrote:
> > On Fri 09-04-21 14:42:58, Michal Hocko wrote:
> > > On Fri 09-04-21 13:09:57, Mel Gorman wrote:
> > > > zone_pcp_reset allegedly protects against a race with drain_pages
> > > > using local_irq_save but this is bogus. local_irq_save only operates
> > > > on the local CPU. If memory hotplug is running on CPU A and drain_pages
> > > > is running on CPU B, disabling IRQs on CPU A does not affect CPU B and
> > > > offers no protection.
> > > 
> > > Yes, the synchronization aspect is bogus indeed.
> > > 
> > > > This patch reorders memory hotremove such that the PCP structures
> > > > relevant to the zone are no longer reachable by the time the structures
> > > > are freed.  With this reordering, no protection is required to prevent
> > > > a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
> > > > renamed to zone_pcp_destroy to make it clear that the per-cpu structures
> > > > are deleted when the function returns.
> > > 
> > > Wouldn't it be much easier to simply not destroy/reset pcp of an empty
> > > zone at all? The whole point of this exercise seems to be described in
> > > 340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
> > > and simply reinitialize it. 
> > 
> > I meant this
> > 
> 
> It might be simplier but if the intention is to free as much memory
> as possible during hot-remove, it seems wasteful to leave the per-cpu
> structures behind if we do not have to.

We do leave the whole pgdat behind. I do not think pagesets really do
matter.

> If a problem with my patch can
> be spotted then I'm happy to go with an alternative fix but there are
> two minor issues with your proposed fix.

I will not insist but this code has proven to bitrot and I just find it
much simpler to drop it altogether rather than conserve it in some form.
Not something I would insist though.

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index e6a602e82860..b0fdda77e570 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6496,7 +6496,13 @@ void __meminit setup_zone_pageset(struct zone *zone)
> >  	struct per_cpu_pageset *p;
> >  	int cpu;
> >  
> > -	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> > +	/*
> > +	 * zone could have gone completely offline during memory hotplug
> > +	 * when the pgdat is left behind for simplicity. On a next onlining
> > +	 * we do not need to reallocate pcp state.
> > +	 */
> > +	if (!zone->pageset)
> > +		zone->pageset = alloc_percpu(struct per_cpu_pageset);
> 
> Should be "if (zone->pageset != &boot_pageset)" ?

Memory hotplug really wants the NULL
check. it doesn't use boot_pageset (if we drop rest to boot_pageset).
But you are right that the boot time initialization first sets
boot_pageset (zone_pcp_init) and initializes real pagesets later
(setup_per_cpu_pageset). But this can be handled at the memory hotplug
layer I believe

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 754026a9164d..1cadfec323fc 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -883,7 +883,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
 	 */
 	if (!populated_zone(zone)) {
 		need_zonelists_rebuild = 1;
-		setup_zone_pageset(zone);
+		if (!zone->pageset)
+			setup_zone_pageset(zone);
 	}

> >  	for_each_possible_cpu(cpu) {
> >  		p = per_cpu_ptr(zone->pageset, cpu);
> >  		pageset_init(p);
> > @@ -8803,25 +8809,6 @@ void zone_pcp_enable(struct zone *zone)
> >  	mutex_unlock(&pcp_batch_high_lock);
> >  }
> >  
> > -void zone_pcp_reset(struct zone *zone)
> > -{
> > -	unsigned long flags;
> > -	int cpu;
> > -	struct per_cpu_pageset *pset;
> > -
> > -	/* avoid races with drain_pages()  */
> > -	local_irq_save(flags);
> > -	if (zone->pageset != &boot_pageset) {
> > -		for_each_online_cpu(cpu) {
> > -			pset = per_cpu_ptr(zone->pageset, cpu);
> > -			drain_zonestat(zone, pset);
> > -		}
> > -		free_percpu(zone->pageset);
> > -		zone->pageset = &boot_pageset;
> > -	}
> > -	local_irq_restore(flags);
> > -}
> > -
> 
> zone_pcp_reset still needs to exist to drain the remaining vmstats or
> it'll break 5a883813845a ("memory-hotplug: fix zone stat
> mismatch").

Are you sure we are reseting vmstats in the hotremove. I do not see
anything like that. Maybe this was needed at the time. I will double
check.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 14:37       ` Michal Hocko
@ 2021-04-09 15:12         ` Mel Gorman
  2021-04-09 19:05           ` David Hildenbrand
  2021-04-10  7:25           ` Michal Hocko
  0 siblings, 2 replies; 8+ messages in thread
From: Mel Gorman @ 2021-04-09 15:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri, Apr 09, 2021 at 04:37:59PM +0200, Michal Hocko wrote:
> > > > Yes, the synchronization aspect is bogus indeed.
> > > > 
> > > > > This patch reorders memory hotremove such that the PCP structures
> > > > > relevant to the zone are no longer reachable by the time the structures
> > > > > are freed.  With this reordering, no protection is required to prevent
> > > > > a use-after-free and the IRQs can be left enabled. zone_pcp_reset is
> > > > > renamed to zone_pcp_destroy to make it clear that the per-cpu structures
> > > > > are deleted when the function returns.
> > > > 
> > > > Wouldn't it be much easier to simply not destroy/reset pcp of an empty
> > > > zone at all? The whole point of this exercise seems to be described in
> > > > 340175b7d14d5. setup_zone_pageset can check for an already allocated pcp
> > > > and simply reinitialize it. 
> > > 
> > > I meant this
> > > 
> > 
> > It might be simplier but if the intention is to free as much memory
> > as possible during hot-remove, it seems wasteful to leave the per-cpu
> > structures behind if we do not have to.
> 
> We do leave the whole pgdat behind. I do not think pagesets really do
> matter.
> 

Probably not given that zone and node hotplug is fragile to say the least
with a low success rate. Still, it seems wasteful if we can preserve the
freeing part and the alternative fix is more subtle than it appears.

> > If a problem with my patch can
> > be spotted then I'm happy to go with an alternative fix but there are
> > two minor issues with your proposed fix.
> 
> I will not insist but this code has proven to bitrot and I just find it
> much simpler to drop it altogether rather than conserve it in some form.
> Not something I would insist though.
> 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index e6a602e82860..b0fdda77e570 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -6496,7 +6496,13 @@ void __meminit setup_zone_pageset(struct zone *zone)
> > >  	struct per_cpu_pageset *p;
> > >  	int cpu;
> > >  
> > > -	zone->pageset = alloc_percpu(struct per_cpu_pageset);
> > > +	/*
> > > +	 * zone could have gone completely offline during memory hotplug
> > > +	 * when the pgdat is left behind for simplicity. On a next onlining
> > > +	 * we do not need to reallocate pcp state.
> > > +	 */
> > > +	if (!zone->pageset)
> > > +		zone->pageset = alloc_percpu(struct per_cpu_pageset);
> > 
> > Should be "if (zone->pageset != &boot_pageset)" ?
> 
> Memory hotplug really wants the NULL
> check. it doesn't use boot_pageset (if we drop rest to boot_pageset).
> But you are right that the boot time initialization first sets
> boot_pageset (zone_pcp_init) and initializes real pagesets later
> (setup_per_cpu_pageset). But this can be handled at the memory hotplug
> layer I believe
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 754026a9164d..1cadfec323fc 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -883,7 +883,8 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages,
>  	 */
>  	if (!populated_zone(zone)) {
>  		need_zonelists_rebuild = 1;
> -		setup_zone_pageset(zone);
> +		if (!zone->pageset)
> +			setup_zone_pageset(zone);
>  	}
> 

I think this will be fragile because it makes assumptions on how boot
works -- specifically that zone->pageset is left alone for unpopulated
zones. This will only work for zones in unpopulated *nodes* during boot
because of this sequence

free_area_init_core(pgdat)
  for (j = 0; j < MAX_NR_ZONES; j++) {
        ...
        zone_init_internals(zone)
  }
  -> zone_init_internals(zone)
     -> zone_pcp_init(zone)
        zone->pageset = &boot_pageset;

I think a NULL check will mean that hotplugging a previously unpopulated
zone will run the risk of using boot_pageset. That will likely explode
because IRQs disabled does not protect boot_pageset, it only works early
in boot or during the initial hotplug when it's not visible yet.

> > >  	for_each_possible_cpu(cpu) {
> > >  		p = per_cpu_ptr(zone->pageset, cpu);
> > >  		pageset_init(p);
> > > @@ -8803,25 +8809,6 @@ void zone_pcp_enable(struct zone *zone)
> > >  	mutex_unlock(&pcp_batch_high_lock);
> > >  }
> > >  
> > > -void zone_pcp_reset(struct zone *zone)
> > > -{
> > > -	unsigned long flags;
> > > -	int cpu;
> > > -	struct per_cpu_pageset *pset;
> > > -
> > > -	/* avoid races with drain_pages()  */
> > > -	local_irq_save(flags);
> > > -	if (zone->pageset != &boot_pageset) {
> > > -		for_each_online_cpu(cpu) {
> > > -			pset = per_cpu_ptr(zone->pageset, cpu);
> > > -			drain_zonestat(zone, pset);
> > > -		}
> > > -		free_percpu(zone->pageset);
> > > -		zone->pageset = &boot_pageset;
> > > -	}
> > > -	local_irq_restore(flags);
> > > -}
> > > -
> > 
> > zone_pcp_reset still needs to exist to drain the remaining vmstats or
> > it'll break 5a883813845a ("memory-hotplug: fix zone stat
> > mismatch").
> 
> Are you sure we are reseting vmstats in the hotremove. I do not see
> anything like that. Maybe this was needed at the time. I will double
> check.

zone_pcp_reset calls drain_zonestat to apply the per-cpu vmstat deltas
to the atomic per-zone and global stats.

If anything, the minimal "fix" is to simply delete IRQ disable/enable on
the grounds that IRQs protect nothing and assume the existing hotplug
paths guarantees the PCP cannot be used after zone_pcp_enable(). That
should be the case already because all the pages have been freed and
there is nothing to even put into the PCPs but I worried that the PCP
structure itself might still be reachable even if it's useless which is
why I freed the structure once they could not be reached via zonelists.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 15:12         ` Mel Gorman
@ 2021-04-09 19:05           ` David Hildenbrand
  2021-04-10  7:25           ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: David Hildenbrand @ 2021-04-09 19:05 UTC (permalink / raw)
  To: Mel Gorman, Michal Hocko
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, Vlastimil Babka, Alexander Duyck,
	Minchan Kim

>>> zone_pcp_reset still needs to exist to drain the remaining vmstats or
>>> it'll break 5a883813845a ("memory-hotplug: fix zone stat
>>> mismatch").
>>
>> Are you sure we are reseting vmstats in the hotremove. I do not see
>> anything like that. Maybe this was needed at the time. I will double
>> check.
> 
> zone_pcp_reset calls drain_zonestat to apply the per-cpu vmstat deltas
> to the atomic per-zone and global stats.
> 
> If anything, the minimal "fix" is to simply delete IRQ disable/enable on
> the grounds that IRQs protect nothing and assume the existing hotplug
> paths guarantees the PCP cannot be used after zone_pcp_enable(). That

^ that sounds sane to me

> should be the case already because all the pages have been freed and
> there is nothing to even put into the PCPs but I worried that the PCP
> structure itself might still be reachable even if it's useless which is
> why I freed the structure once they could not be reached via zonelists.
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove
  2021-04-09 15:12         ` Mel Gorman
  2021-04-09 19:05           ` David Hildenbrand
@ 2021-04-10  7:25           ` Michal Hocko
  1 sibling, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2021-04-10  7:25 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, LKML, Oscar Salvador,
	Michael S. Tsirkin, David Hildenbrand, Vlastimil Babka,
	Alexander Duyck, Minchan Kim

On Fri 09-04-21 16:12:59, Mel Gorman wrote:
[...]
> If anything, the minimal "fix" is to simply delete IRQ disable/enable on
> the grounds that IRQs protect nothing and assume the existing hotplug
> paths guarantees the PCP cannot be used after zone_pcp_enable(). That
> should be the case already because all the pages have been freed and
> there is nothing to even put into the PCPs but I worried that the PCP
> structure itself might still be reachable even if it's useless which is
> why I freed the structure once they could not be reached via zonelists.

OK. Let's do that for now and I will put a follow up on my todo list.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-04-10  7:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09 12:09 [PATCH] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove Mel Gorman
2021-04-09 12:42 ` Michal Hocko
2021-04-09 12:48   ` Michal Hocko
2021-04-09 13:42     ` Mel Gorman
2021-04-09 14:37       ` Michal Hocko
2021-04-09 15:12         ` Mel Gorman
2021-04-09 19:05           ` David Hildenbrand
2021-04-10  7:25           ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).