All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
@ 2018-11-20  1:48 Wei Yang
  2018-11-20  7:31 ` Michal Hocko
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
  0 siblings, 2 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-20  1:48 UTC (permalink / raw)
  To: mhocko, osalvador; +Cc: akpm, linux-mm, Wei Yang

After memory hot-added, users could online pages through sysfs, and this
could be done in parallel.

In case two threads online pages in two different empty zones at the
same time, there would be a contention to update the nr_zones.

The patch use pgdat_resize_lock() to protect this critical section.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e13987c2e1c4..525a5344a13b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5796,9 +5796,12 @@ void __meminit init_currently_empty_zone(struct zone *zone,
 {
 	struct pglist_data *pgdat = zone->zone_pgdat;
 	int zone_idx = zone_idx(zone) + 1;
+	unsigned long flags;
 
+	pgdat_resize_lock(pgdat, &flags);
 	if (zone_idx > pgdat->nr_zones)
 		pgdat->nr_zones = zone_idx;
+	pgdat_resize_unlock(pgdat, &flags);
 
 	zone->zone_start_pfn = zone_start_pfn;
 
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-20  1:48 [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock() Wei Yang
@ 2018-11-20  7:31 ` Michal Hocko
  2018-11-20  7:58   ` osalvador
  2018-11-21  2:44   ` Wei Yang
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
  1 sibling, 2 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-20  7:31 UTC (permalink / raw)
  To: Wei Yang; +Cc: osalvador, akpm, linux-mm

On Tue 20-11-18 09:48:22, Wei Yang wrote:
> After memory hot-added, users could online pages through sysfs, and this
> could be done in parallel.
> 
> In case two threads online pages in two different empty zones at the
> same time, there would be a contention to update the nr_zones.

No, this shouldn't be the case as I've explained in the original thread.
We use memory hotplug lock over the online phase. So there shouldn't be
any race possible.

On the other hand I would like to see the global lock to go away because
it causes scalability issues and I would like to change it to a range
lock. This would make this race possible.

That being said this is more of a preparatory work than a fix. One could
argue that pgdat resize lock is abused here but I am not convinced a
dedicated lock is much better. We do take this lock already and spanning
its scope seems reasonable. An update to the documentation is due.

> The patch use pgdat_resize_lock() to protect this critical section.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

After the changelog is updated to reflect the above, feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e13987c2e1c4..525a5344a13b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5796,9 +5796,12 @@ void __meminit init_currently_empty_zone(struct zone *zone,
>  {
>  	struct pglist_data *pgdat = zone->zone_pgdat;
>  	int zone_idx = zone_idx(zone) + 1;
> +	unsigned long flags;
>  
> +	pgdat_resize_lock(pgdat, &flags);
>  	if (zone_idx > pgdat->nr_zones)
>  		pgdat->nr_zones = zone_idx;
> +	pgdat_resize_unlock(pgdat, &flags);
>  
>  	zone->zone_start_pfn = zone_start_pfn;
>  
> -- 
> 2.15.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-20  7:31 ` Michal Hocko
@ 2018-11-20  7:58   ` osalvador
  2018-11-20  8:48     ` Michal Hocko
  2018-11-21  2:52     ` Wei Yang
  2018-11-21  2:44   ` Wei Yang
  1 sibling, 2 replies; 40+ messages in thread
From: osalvador @ 2018-11-20  7:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, akpm, linux-mm

> On the other hand I would like to see the global lock to go away 
> because
> it causes scalability issues and I would like to change it to a range
> lock. This would make this race possible.
> 
> That being said this is more of a preparatory work than a fix. One 
> could
> argue that pgdat resize lock is abused here but I am not convinced a
> dedicated lock is much better. We do take this lock already and 
> spanning
> its scope seems reasonable. An update to the documentation is due.

Would not make more sense to move it within the pgdat lock
in move_pfn_range_to_zone?
The call from free_area_init_core is safe as we are single-thread there.

And if we want to move towards a range locking, I even think it would be 
more
consistent if we move it within the zone's span lock (which is already 
wrapped with a pgdat lock).

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-20  7:58   ` osalvador
@ 2018-11-20  8:48     ` Michal Hocko
  2018-11-21  2:52     ` Wei Yang
  1 sibling, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-20  8:48 UTC (permalink / raw)
  To: osalvador; +Cc: Wei Yang, akpm, linux-mm

On Tue 20-11-18 08:58:11, osalvador@suse.de wrote:
> > On the other hand I would like to see the global lock to go away because
> > it causes scalability issues and I would like to change it to a range
> > lock. This would make this race possible.
> > 
> > That being said this is more of a preparatory work than a fix. One could
> > argue that pgdat resize lock is abused here but I am not convinced a
> > dedicated lock is much better. We do take this lock already and spanning
> > its scope seems reasonable. An update to the documentation is due.
> 
> Would not make more sense to move it within the pgdat lock
> in move_pfn_range_to_zone?

yes, that was what I meant originally and I haven't really looked closer
to the diff itself because I've stopped right at the description.

> The call from free_area_init_core is safe as we are single-thread there.
> 
> And if we want to move towards a range locking, I even think it would be
> more
> consistent if we move it within the zone's span lock (which is already
> wrapped with a pgdat lock).

Agreed!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-20  7:31 ` Michal Hocko
  2018-11-20  7:58   ` osalvador
@ 2018-11-21  2:44   ` Wei Yang
  2018-11-21  7:14     ` Michal Hocko
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-21  2:44 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, osalvador, akpm, linux-mm

On Tue, Nov 20, 2018 at 08:31:41AM +0100, Michal Hocko wrote:
>On Tue 20-11-18 09:48:22, Wei Yang wrote:
>> After memory hot-added, users could online pages through sysfs, and this
>> could be done in parallel.
>> 
>> In case two threads online pages in two different empty zones at the
>> same time, there would be a contention to update the nr_zones.
>
>No, this shouldn't be the case as I've explained in the original thread.
>We use memory hotplug lock over the online phase. So there shouldn't be
>any race possible.

Sorry for misunderstanding your point.

>
>On the other hand I would like to see the global lock to go away because
>it causes scalability issues and I would like to change it to a range
>lock. This would make this race possible.

The global lock you want to remove is mem_hotplug_begin() ?

Hmm... my understanding may not correct. While mem_hotplug_begin() use
percpu lock, which means if there are two threads running on two
different cpus to online pages at the same time, they could get their
own lock?

If this is the case, will we face the race condition here?

>
>That being said this is more of a preparatory work than a fix. One could
>argue that pgdat resize lock is abused here but I am not convinced a
>dedicated lock is much better. We do take this lock already and spanning
>its scope seems reasonable. An update to the documentation is due.

Agree, I will try to update the documentation in next verstion. 

>
>> The patch use pgdat_resize_lock() to protect this critical section.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>
>After the changelog is updated to reflect the above, feel free to add
>Acked-by: Michal Hocko <mhocko@suse.com>
>
>> ---
>>  mm/page_alloc.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index e13987c2e1c4..525a5344a13b 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5796,9 +5796,12 @@ void __meminit init_currently_empty_zone(struct zone *zone,
>>  {
>>  	struct pglist_data *pgdat = zone->zone_pgdat;
>>  	int zone_idx = zone_idx(zone) + 1;
>> +	unsigned long flags;
>>  
>> +	pgdat_resize_lock(pgdat, &flags);
>>  	if (zone_idx > pgdat->nr_zones)
>>  		pgdat->nr_zones = zone_idx;
>> +	pgdat_resize_unlock(pgdat, &flags);
>>  
>>  	zone->zone_start_pfn = zone_start_pfn;
>>  
>> -- 
>> 2.15.1
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-20  7:58   ` osalvador
  2018-11-20  8:48     ` Michal Hocko
@ 2018-11-21  2:52     ` Wei Yang
  2018-11-21  7:15       ` Michal Hocko
  2018-11-21  8:24       ` osalvador
  1 sibling, 2 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-21  2:52 UTC (permalink / raw)
  To: osalvador; +Cc: Michal Hocko, Wei Yang, akpm, linux-mm

On Tue, Nov 20, 2018 at 08:58:11AM +0100, osalvador@suse.de wrote:
>> On the other hand I would like to see the global lock to go away because
>> it causes scalability issues and I would like to change it to a range
>> lock. This would make this race possible.
>> 
>> That being said this is more of a preparatory work than a fix. One could
>> argue that pgdat resize lock is abused here but I am not convinced a
>> dedicated lock is much better. We do take this lock already and spanning
>> its scope seems reasonable. An update to the documentation is due.
>
>Would not make more sense to move it within the pgdat lock
>in move_pfn_range_to_zone?
>The call from free_area_init_core is safe as we are single-thread there.
>

Agree. This would be better.

>And if we want to move towards a range locking, I even think it would be more
>consistent if we move it within the zone's span lock (which is already
>wrapped with a pgdat lock).
>

I lost a little here, just want to confirm with you.

Instead of call pgdat_resize_lock() around init_currently_empty_zone()
in move_pfn_range_to_zone(), we move init_currently_empty_zone() before
resize_zone_range()?

This sounds reasonable.

>
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-21  2:44   ` Wei Yang
@ 2018-11-21  7:14     ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-21  7:14 UTC (permalink / raw)
  To: Wei Yang; +Cc: osalvador, akpm, linux-mm

On Wed 21-11-18 02:44:35, Wei Yang wrote:
> On Tue, Nov 20, 2018 at 08:31:41AM +0100, Michal Hocko wrote:
> >On Tue 20-11-18 09:48:22, Wei Yang wrote:
> >> After memory hot-added, users could online pages through sysfs, and this
> >> could be done in parallel.
> >> 
> >> In case two threads online pages in two different empty zones at the
> >> same time, there would be a contention to update the nr_zones.
> >
> >No, this shouldn't be the case as I've explained in the original thread.
> >We use memory hotplug lock over the online phase. So there shouldn't be
> >any race possible.
> 
> Sorry for misunderstanding your point.
> 
> >
> >On the other hand I would like to see the global lock to go away because
> >it causes scalability issues and I would like to change it to a range
> >lock. This would make this race possible.
> 
> The global lock you want to remove is mem_hotplug_begin() ?

Yes

> 
> Hmm... my understanding may not correct. While mem_hotplug_begin() use
> percpu lock, which means if there are two threads running on two
> different cpus to online pages at the same time, they could get their
> own lock?

No. The per-cpu is a mere implementation detail on how the
synchronization is done. Only one path might aquire the exclusive lock.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-21  2:52     ` Wei Yang
@ 2018-11-21  7:15       ` Michal Hocko
  2018-11-22  1:52         ` Wei Yang
  2018-11-26  2:28         ` Wei Yang
  2018-11-21  8:24       ` osalvador
  1 sibling, 2 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-21  7:15 UTC (permalink / raw)
  To: Wei Yang; +Cc: osalvador, akpm, linux-mm

On Wed 21-11-18 02:52:31, Wei Yang wrote:
> On Tue, Nov 20, 2018 at 08:58:11AM +0100, osalvador@suse.de wrote:
> >> On the other hand I would like to see the global lock to go away because
> >> it causes scalability issues and I would like to change it to a range
> >> lock. This would make this race possible.
> >> 
> >> That being said this is more of a preparatory work than a fix. One could
> >> argue that pgdat resize lock is abused here but I am not convinced a
> >> dedicated lock is much better. We do take this lock already and spanning
> >> its scope seems reasonable. An update to the documentation is due.
> >
> >Would not make more sense to move it within the pgdat lock
> >in move_pfn_range_to_zone?
> >The call from free_area_init_core is safe as we are single-thread there.
> >
> 
> Agree. This would be better.
> 
> >And if we want to move towards a range locking, I even think it would be more
> >consistent if we move it within the zone's span lock (which is already
> >wrapped with a pgdat lock).
> >
> 
> I lost a little here, just want to confirm with you.
> 
> Instead of call pgdat_resize_lock() around init_currently_empty_zone()
> in move_pfn_range_to_zone(), we move init_currently_empty_zone() before
> resize_zone_range()?
> 
> This sounds reasonable.

Btw. resolving the existing TODO would be nice as well, now that you are
looking that direction...

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c6c42a7425e5..c75fca900044 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -743,13 +743,12 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	int nid = pgdat->node_id;
 	unsigned long flags;
 
+	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
+	pgdat_resize_lock(pgdat, &flags);
 	if (zone_is_empty(zone))
 		init_currently_empty_zone(zone, start_pfn, nr_pages);
 
 	clear_zone_contiguous(zone);
-
-	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
-	pgdat_resize_lock(pgdat, &flags);
 	zone_span_writelock(zone);
 	resize_zone_range(zone, start_pfn, nr_pages);
 	zone_span_writeunlock(zone);
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-21  2:52     ` Wei Yang
  2018-11-21  7:15       ` Michal Hocko
@ 2018-11-21  8:24       ` osalvador
  1 sibling, 0 replies; 40+ messages in thread
From: osalvador @ 2018-11-21  8:24 UTC (permalink / raw)
  To: Wei Yang; +Cc: Michal Hocko, akpm, linux-mm

On Wed, 2018-11-21 at 02:52 +0000, Wei Yang wrote:
> On Tue, Nov 20, 2018 at 08:58:11AM +0100, osalvador@suse.de wrote:
> > > On the other hand I would like to see the global lock to go away
> > > because
> > > it causes scalability issues and I would like to change it to a
> > > range
> > > lock. This would make this race possible.
> > > 
> > > That being said this is more of a preparatory work than a fix.
> > > One could
> > > argue that pgdat resize lock is abused here but I am not
> > > convinced a
> > > dedicated lock is much better. We do take this lock already and
> > > spanning
> > > its scope seems reasonable. An update to the documentation is
> > > due.
> > 
> > Would not make more sense to move it within the pgdat lock
> > in move_pfn_range_to_zone?
> > The call from free_area_init_core is safe as we are single-thread
> > there.
> > 
> 
> Agree. This would be better.
> 
> > And if we want to move towards a range locking, I even think it
> > would be more
> > consistent if we move it within the zone's span lock (which is
> > already
> > wrapped with a pgdat lock).
> > 
> 
> I lost a little here, just want to confirm with you.
> 
> Instead of call pgdat_resize_lock() around
> init_currently_empty_zone()
> in move_pfn_range_to_zone(), we move init_currently_empty_zone()
> before
> resize_zone_range()?
> 
> This sounds reasonable.

Yeah.
spanned pages are being touched in:

- shrink_pgdat_span
- resize_zone_range
- init_currently_emty_zone

The first two are already protected by the span lock.

In init_currently_empty_zone, we also touch zone_start_pfn, which is
part of the spanned pages (beginning), so I think it makes sense to
also protect it with the span lock.
We just call init_currently_empty_zone in case the zone is empty, so
the race should be not existent to be honest.

But I just think it is more consistent, and since moving it under
spanlock would imply to also have it under pgdat lock, which was the
main point of this, I think we do not have anything to lose.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-21  7:15       ` Michal Hocko
@ 2018-11-22  1:52         ` Wei Yang
  2018-11-22  8:39           ` Michal Hocko
  2018-11-26  2:28         ` Wei Yang
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-22  1:52 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, osalvador, akpm, linux-mm

On Wed, Nov 21, 2018 at 08:15:49AM +0100, Michal Hocko wrote:
>On Wed 21-11-18 02:52:31, Wei Yang wrote:
>> On Tue, Nov 20, 2018 at 08:58:11AM +0100, osalvador@suse.de wrote:
>> >> On the other hand I would like to see the global lock to go away because
>> >> it causes scalability issues and I would like to change it to a range
>> >> lock. This would make this race possible.
>> >> 
>> >> That being said this is more of a preparatory work than a fix. One could
>> >> argue that pgdat resize lock is abused here but I am not convinced a
>> >> dedicated lock is much better. We do take this lock already and spanning
>> >> its scope seems reasonable. An update to the documentation is due.
>> >
>> >Would not make more sense to move it within the pgdat lock
>> >in move_pfn_range_to_zone?
>> >The call from free_area_init_core is safe as we are single-thread there.
>> >
>> 
>> Agree. This would be better.
>> 
>> >And if we want to move towards a range locking, I even think it would be more
>> >consistent if we move it within the zone's span lock (which is already
>> >wrapped with a pgdat lock).
>> >
>> 
>> I lost a little here, just want to confirm with you.
>> 
>> Instead of call pgdat_resize_lock() around init_currently_empty_zone()
>> in move_pfn_range_to_zone(), we move init_currently_empty_zone() before
>> resize_zone_range()?
>> 
>> This sounds reasonable.
>
>Btw. resolving the existing TODO would be nice as well, now that you are
>looking that direction...

I took a look at that commit, seems I need some time to understand this
TODO. :-)

>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index c6c42a7425e5..c75fca900044 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -743,13 +743,12 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> 	int nid = pgdat->node_id;
> 	unsigned long flags;
> 
>+	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>+	pgdat_resize_lock(pgdat, &flags);
> 	if (zone_is_empty(zone))
> 		init_currently_empty_zone(zone, start_pfn, nr_pages);
> 
> 	clear_zone_contiguous(zone);
>-
>-	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>-	pgdat_resize_lock(pgdat, &flags);
> 	zone_span_writelock(zone);

Just want to make sure, Oscar suggests to move the code here to protect
this under zone_span_lock.

If this the correct, I would spin a v2 and try to address the TODO.

> 	resize_zone_range(zone, start_pfn, nr_pages);
> 	zone_span_writeunlock(zone);
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-22  1:52         ` Wei Yang
@ 2018-11-22  8:39           ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-22  8:39 UTC (permalink / raw)
  To: Wei Yang; +Cc: osalvador, akpm, linux-mm

On Thu 22-11-18 01:52:39, Wei Yang wrote:
> On Wed, Nov 21, 2018 at 08:15:49AM +0100, Michal Hocko wrote:
[...]
> >diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >index c6c42a7425e5..c75fca900044 100644
> >--- a/mm/memory_hotplug.c
> >+++ b/mm/memory_hotplug.c
> >@@ -743,13 +743,12 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> > 	int nid = pgdat->node_id;
> > 	unsigned long flags;
> > 
> >+	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
> >+	pgdat_resize_lock(pgdat, &flags);
> > 	if (zone_is_empty(zone))
> > 		init_currently_empty_zone(zone, start_pfn, nr_pages);
> > 
> > 	clear_zone_contiguous(zone);
> >-
> >-	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
> >-	pgdat_resize_lock(pgdat, &flags);
> > 	zone_span_writelock(zone);
> 
> Just want to make sure, Oscar suggests to move the code here to protect
> this under zone_span_lock.

Yes, both locks held is probably safer. Because there is both pgdat and
zone state updated.

Sorry to confuse you

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-20  1:48 [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock() Wei Yang
  2018-11-20  7:31 ` Michal Hocko
@ 2018-11-22 10:12 ` Wei Yang
  2018-11-22 10:15   ` Wei Yang
                     ` (3 more replies)
  1 sibling, 4 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-22 10:12 UTC (permalink / raw)
  To: mhocko, osalvador; +Cc: akpm, linux-mm, Wei Yang

During online_pages phase, pgdat->nr_zones will be updated in case this
zone is empty.

Currently the online_pages phase is protected by the global lock
mem_hotplug_begin(), which ensures there is no contention during the
update of nr_zones. But this global lock introduces scalability issues.

This patch is a preparation for removing the global lock during
online_pages phase. Also this patch changes the documentation of
node_size_lock to include the protectioin of nr_zones.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
v2:
  * commit log changes
  * modify the code in move_pfn_range_to_zone() instead of in
    init_currently_empty_zone()
  * documentation change

---
 include/linux/mmzone.h | 7 ++++---
 mm/memory_hotplug.c    | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 68d7b558924b..1bb749bee284 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -636,9 +636,10 @@ typedef struct pglist_data {
 #endif
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 	/*
-	 * Must be held any time you expect node_start_pfn, node_present_pages
-	 * or node_spanned_pages stay constant.  Holding this will also
-	 * guarantee that any pfn_valid() stays that way.
+	 * Must be held any time you expect node_start_pfn,
+	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
+	 * Holding this will also guarantee that any pfn_valid() stays that
+	 * way.
 	 *
 	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
 	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 61972da38d93..f626e7e5f57b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	int nid = pgdat->node_id;
 	unsigned long flags;
 
-	if (zone_is_empty(zone))
-		init_currently_empty_zone(zone, start_pfn, nr_pages);
-
 	clear_zone_contiguous(zone);
 
 	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
 	pgdat_resize_lock(pgdat, &flags);
 	zone_span_writelock(zone);
+	if (zone_is_empty(zone))
+		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
 	zone_span_writeunlock(zone);
 	resize_pgdat_range(pgdat, start_pfn, nr_pages);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
@ 2018-11-22 10:15   ` Wei Yang
  2018-11-22 10:29     ` Michal Hocko
  2018-11-22 10:37   ` osalvador
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-22 10:15 UTC (permalink / raw)
  To: Wei Yang; +Cc: mhocko, osalvador, akpm, linux-mm

On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote:
>During online_pages phase, pgdat->nr_zones will be updated in case this
>zone is empty.
>
>Currently the online_pages phase is protected by the global lock
>mem_hotplug_begin(), which ensures there is no contention during the
>update of nr_zones. But this global lock introduces scalability issues.
>
>This patch is a preparation for removing the global lock during
>online_pages phase. Also this patch changes the documentation of
>node_size_lock to include the protectioin of nr_zones.
>
>Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Missed this, if I am correct. :-)

Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:15   ` Wei Yang
@ 2018-11-22 10:29     ` Michal Hocko
  2018-11-22 14:27       ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2018-11-22 10:29 UTC (permalink / raw)
  To: Wei Yang; +Cc: osalvador, akpm, linux-mm

On Thu 22-11-18 10:15:57, Wei Yang wrote:
> On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote:
> >During online_pages phase, pgdat->nr_zones will be updated in case this
> >zone is empty.
> >
> >Currently the online_pages phase is protected by the global lock
> >mem_hotplug_begin(), which ensures there is no contention during the
> >update of nr_zones. But this global lock introduces scalability issues.
> >
> >This patch is a preparation for removing the global lock during
> >online_pages phase. Also this patch changes the documentation of
> >node_size_lock to include the protectioin of nr_zones.

I would just add that the patch moves init_currently_empty_zone under
both zone_span_writelock and pgdat_resize_lock because both the pgdat
state is changed (nr_zones) and the zone's start_pfn

> >
> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> 
> Missed this, if I am correct. :-)
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Yes, thank you.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
  2018-11-22 10:15   ` Wei Yang
@ 2018-11-22 10:37   ` osalvador
  2018-11-22 14:28     ` Wei Yang
  2018-11-22 15:26   ` David Hildenbrand
  2018-11-30  6:58   ` [PATCH v3] " Wei Yang
  3 siblings, 1 reply; 40+ messages in thread
From: osalvador @ 2018-11-22 10:37 UTC (permalink / raw)
  To: Wei Yang, mhocko; +Cc: akpm, linux-mm

On Thu, 2018-11-22 at 18:12 +0800, Wei Yang wrote:
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>

Thanks ;-)

Reviewed-by: Oscar Salvador <osalvador@suse.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:29     ` Michal Hocko
@ 2018-11-22 14:27       ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-22 14:27 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, osalvador, akpm, linux-mm

On Thu, Nov 22, 2018 at 11:29:22AM +0100, Michal Hocko wrote:
>On Thu 22-11-18 10:15:57, Wei Yang wrote:
>> On Thu, Nov 22, 2018 at 06:12:41PM +0800, Wei Yang wrote:
>> >During online_pages phase, pgdat->nr_zones will be updated in case this
>> >zone is empty.
>> >
>> >Currently the online_pages phase is protected by the global lock
>> >mem_hotplug_begin(), which ensures there is no contention during the
>> >update of nr_zones. But this global lock introduces scalability issues.
>> >
>> >This patch is a preparation for removing the global lock during
>> >online_pages phase. Also this patch changes the documentation of
>> >node_size_lock to include the protectioin of nr_zones.
>
>I would just add that the patch moves init_currently_empty_zone under
>both zone_span_writelock and pgdat_resize_lock because both the pgdat
>state is changed (nr_zones) and the zone's start_pfn
>
>> >
>> >Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> 
>> Missed this, if I am correct. :-)
>> 
>> Acked-by: Michal Hocko <mhocko@suse.com>
>
>Yes, thank you.

My pleasure :-)

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:37   ` osalvador
@ 2018-11-22 14:28     ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-22 14:28 UTC (permalink / raw)
  To: osalvador; +Cc: Wei Yang, mhocko, akpm, linux-mm

On Thu, Nov 22, 2018 at 11:37:38AM +0100, osalvador wrote:
>On Thu, 2018-11-22 at 18:12 +0800, Wei Yang wrote:
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>
>Thanks ;-)
>

Thanks for your suggestions :-)

>Reviewed-by: Oscar Salvador <osalvador@suse.de>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
  2018-11-22 10:15   ` Wei Yang
  2018-11-22 10:37   ` osalvador
@ 2018-11-22 15:26   ` David Hildenbrand
  2018-11-22 21:28     ` Wei Yang
  2018-11-23  8:42     ` Michal Hocko
  2018-11-30  6:58   ` [PATCH v3] " Wei Yang
  3 siblings, 2 replies; 40+ messages in thread
From: David Hildenbrand @ 2018-11-22 15:26 UTC (permalink / raw)
  To: Wei Yang, mhocko, osalvador; +Cc: akpm, linux-mm

On 22.11.18 11:12, Wei Yang wrote:
> During online_pages phase, pgdat->nr_zones will be updated in case this
> zone is empty.
> 
> Currently the online_pages phase is protected by the global lock
> mem_hotplug_begin(), which ensures there is no contention during the
> update of nr_zones. But this global lock introduces scalability issues.
> 
> This patch is a preparation for removing the global lock during
> online_pages phase. Also this patch changes the documentation of
> node_size_lock to include the protectioin of nr_zones.

I looked into locking recently, and there is more to it.

Please read:

commit dee6da22efac451d361f5224a60be2796d847b51
Author: David Hildenbrand <david@redhat.com>
Date:   Tue Oct 30 15:10:44 2018 -0700

    memory-hotplug.rst: add some details about locking internals
    
    Let's document the magic a bit, especially why device_hotplug_lock is
    required when adding/removing memory and how it all play together with
    requests to online/offline memory from user space.

Short summary: Onlining/offlining of memory requires the device_hotplug_lock
as of now.

mem_hotplug_begin() is just an internal optimization. (we don't want
 everybody to take the device lock)



> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
> v2:
>   * commit log changes
>   * modify the code in move_pfn_range_to_zone() instead of in
>     init_currently_empty_zone()
>   * documentation change
> 
> ---
>  include/linux/mmzone.h | 7 ++++---
>  mm/memory_hotplug.c    | 5 ++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 68d7b558924b..1bb749bee284 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -636,9 +636,10 @@ typedef struct pglist_data {
>  #endif
>  #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>  	/*
> -	 * Must be held any time you expect node_start_pfn, node_present_pages
> -	 * or node_spanned_pages stay constant.  Holding this will also
> -	 * guarantee that any pfn_valid() stays that way.
> +	 * Must be held any time you expect node_start_pfn,
> +	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
> +	 * Holding this will also guarantee that any pfn_valid() stays that
> +	 * way.
>  	 *
>  	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
>  	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 61972da38d93..f626e7e5f57b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	int nid = pgdat->node_id;
>  	unsigned long flags;
>  
> -	if (zone_is_empty(zone))
> -		init_currently_empty_zone(zone, start_pfn, nr_pages);
> -
>  	clear_zone_contiguous(zone);
>  
>  	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>  	pgdat_resize_lock(pgdat, &flags);
>  	zone_span_writelock(zone);
> +	if (zone_is_empty(zone))
> +		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
>  	zone_span_writeunlock(zone);
>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 15:26   ` David Hildenbrand
@ 2018-11-22 21:28     ` Wei Yang
  2018-11-22 21:53       ` David Hildenbrand
  2018-11-23  8:42     ` Michal Hocko
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-22 21:28 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, mhocko, osalvador, akpm, linux-mm

On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote:
>On 22.11.18 11:12, Wei Yang wrote:
>> During online_pages phase, pgdat->nr_zones will be updated in case this
>> zone is empty.
>> 
>> Currently the online_pages phase is protected by the global lock
>> mem_hotplug_begin(), which ensures there is no contention during the
>> update of nr_zones. But this global lock introduces scalability issues.
>> 
>> This patch is a preparation for removing the global lock during
>> online_pages phase. Also this patch changes the documentation of
>> node_size_lock to include the protectioin of nr_zones.
>
>I looked into locking recently, and there is more to it.
>
>Please read:
>
>commit dee6da22efac451d361f5224a60be2796d847b51
>Author: David Hildenbrand <david@redhat.com>
>Date:   Tue Oct 30 15:10:44 2018 -0700
>
>    memory-hotplug.rst: add some details about locking internals
>    
>    Let's document the magic a bit, especially why device_hotplug_lock is
>    required when adding/removing memory and how it all play together with
>    requests to online/offline memory from user space.
>
>Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>as of now.
>
>mem_hotplug_begin() is just an internal optimization. (we don't want
> everybody to take the device lock)
>

Hi, David

Thanks for your comment.

Hmm... I didn't catch your point.

Related to memory hot-plug, there are (at least) three locks,

  * device_hotplug_lock    (global)
  * device lock            (device scope)
  * mem_hotplug_lock       (global)

But with two different hold sequence in two cases:

  * device_online()

    device_hotplug_lock
    device_lock
    mem_hotplug_lock

  * add_memory_resource()

    device_hotplug_lock
    mem_hotplug_lock
    device_lock
       ^
       |
       I don't find where this is hold in add_memory_resource(). 
       Would you mind giving me a hint?

If my understanding is correct, what is your point?

I guess your point is : just remove mem_hotplug_lock is not enough to
resolve the scalability issue?

Please correct me, if I am not. :-)

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 21:28     ` Wei Yang
@ 2018-11-22 21:53       ` David Hildenbrand
  2018-11-22 23:53         ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2018-11-22 21:53 UTC (permalink / raw)
  To: Wei Yang; +Cc: mhocko, osalvador, akpm, linux-mm

On 22.11.18 22:28, Wei Yang wrote:
> On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote:
>> On 22.11.18 11:12, Wei Yang wrote:
>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>> zone is empty.
>>>
>>> Currently the online_pages phase is protected by the global lock
>>> mem_hotplug_begin(), which ensures there is no contention during the
>>> update of nr_zones. But this global lock introduces scalability issues.
>>>
>>> This patch is a preparation for removing the global lock during
>>> online_pages phase. Also this patch changes the documentation of
>>> node_size_lock to include the protectioin of nr_zones.
>>
>> I looked into locking recently, and there is more to it.
>>
>> Please read:
>>
>> commit dee6da22efac451d361f5224a60be2796d847b51
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>
>>    memory-hotplug.rst: add some details about locking internals
>>    
>>    Let's document the magic a bit, especially why device_hotplug_lock is
>>    required when adding/removing memory and how it all play together with
>>    requests to online/offline memory from user space.
>>
>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>> as of now.
>>
>> mem_hotplug_begin() is just an internal optimization. (we don't want
>> everybody to take the device lock)
>>
> 
> Hi, David
> 
> Thanks for your comment.

My last sentence should have been "we don't want everybody to take the
device hotplug lock" :) That caused confusion.

> 
> Hmm... I didn't catch your point.
> 
> Related to memory hot-plug, there are (at least) three locks,
> 
>   * device_hotplug_lock    (global)
>   * device lock            (device scope)
>   * mem_hotplug_lock       (global)
> 
> But with two different hold sequence in two cases:
> 
>   * device_online()
> 
>     device_hotplug_lock
>     device_lock
>     mem_hotplug_lock
> 
>   * add_memory_resource()
> 
>     device_hotplug_lock
>     mem_hotplug_lock
>     device_lock
>        ^
>        |
>        I don't find where this is hold in add_memory_resource(). 
>        Would you mind giving me a hint?
> 
> If my understanding is correct, what is your point?
> 

The point I was trying to make:

Right now all onlining/offlining/adding/removing is protected by the
device_hotplug_lock (and that's a good thing, things are fragile enough
already).

mem_hotplug_lock() is used in addition for get_online_mems().

"This patch is a preparation for removing the global lock during
online_pages phase." - is more like "one global lock".

> I guess your point is : just remove mem_hotplug_lock is not enough to
> resolve the scalability issue?

Depends on which scalability issue :)

Getting rid of / removing the impact of mem_hotplug_lock is certainly a
very good idea. And improves scalability of all callers of
get_online_mems(). If that is the intention, very good :)

If the intention is to make onlining/offlining more scalable (e.g. in
parallel or such), then scalability is limited by device_hotplug_lock.


> 
> Please correct me, if I am not. :-)
> 

Guess I was just wondering which scalability issue we are trying to solve :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 21:53       ` David Hildenbrand
@ 2018-11-22 23:53         ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-22 23:53 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, mhocko, osalvador, akpm, linux-mm

On Thu, Nov 22, 2018 at 10:53:31PM +0100, David Hildenbrand wrote:
>On 22.11.18 22:28, Wei Yang wrote:
>> On Thu, Nov 22, 2018 at 04:26:40PM +0100, David Hildenbrand wrote:
>>> On 22.11.18 11:12, Wei Yang wrote:
>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>> zone is empty.
>>>>
>>>> Currently the online_pages phase is protected by the global lock
>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>
>>>> This patch is a preparation for removing the global lock during
>>>> online_pages phase. Also this patch changes the documentation of
>>>> node_size_lock to include the protectioin of nr_zones.
>>>
>>> I looked into locking recently, and there is more to it.
>>>
>>> Please read:
>>>
>>> commit dee6da22efac451d361f5224a60be2796d847b51
>>> Author: David Hildenbrand <david@redhat.com>
>>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>>
>>>    memory-hotplug.rst: add some details about locking internals
>>>    
>>>    Let's document the magic a bit, especially why device_hotplug_lock is
>>>    required when adding/removing memory and how it all play together with
>>>    requests to online/offline memory from user space.
>>>
>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>>> as of now.
>>>
>>> mem_hotplug_begin() is just an internal optimization. (we don't want
>>> everybody to take the device lock)
>>>
>> 
>> Hi, David
>> 
>> Thanks for your comment.
>
>My last sentence should have been "we don't want everybody to take the
>device hotplug lock" :) That caused confusion.
>
>> 
>> Hmm... I didn't catch your point.
>> 
>> Related to memory hot-plug, there are (at least) three locks,
>> 
>>   * device_hotplug_lock    (global)
>>   * device lock            (device scope)
>>   * mem_hotplug_lock       (global)
>> 
>> But with two different hold sequence in two cases:
>> 
>>   * device_online()
>> 
>>     device_hotplug_lock
>>     device_lock
>>     mem_hotplug_lock
>> 
>>   * add_memory_resource()
>> 
>>     device_hotplug_lock
>>     mem_hotplug_lock
>>     device_lock
>>        ^
>>        |
>>        I don't find where this is hold in add_memory_resource(). 
>>        Would you mind giving me a hint?
>> 
>> If my understanding is correct, what is your point?
>> 
>
>The point I was trying to make:
>
>Right now all onlining/offlining/adding/removing is protected by the
>device_hotplug_lock (and that's a good thing, things are fragile enough
>already).
>
>mem_hotplug_lock() is used in addition for get_online_mems().
>
>"This patch is a preparation for removing the global lock during
>online_pages phase." - is more like "one global lock".
>

Thanks for reminding. You are right.

>> I guess your point is : just remove mem_hotplug_lock is not enough to
>> resolve the scalability issue?
>
>Depends on which scalability issue :)
>
>Getting rid of / removing the impact of mem_hotplug_lock is certainly a
>very good idea. And improves scalability of all callers of
>get_online_mems(). If that is the intention, very good :)
>

Maybe not exact.

The intention is to get rid of mem_hotplug_begin/done, if I am correct.

>If the intention is to make onlining/offlining more scalable (e.g. in
>parallel or such), then scalability is limited by device_hotplug_lock.
>

I didn't notice this lock.

While this is a step by step improvement.

>
>> 
>> Please correct me, if I am not. :-)
>> 
>
>Guess I was just wondering which scalability issue we are trying to solve :)
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 15:26   ` David Hildenbrand
  2018-11-22 21:28     ` Wei Yang
@ 2018-11-23  8:42     ` Michal Hocko
  2018-11-23  8:46       ` David Hildenbrand
  1 sibling, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2018-11-23  8:42 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, osalvador, akpm, linux-mm

On Thu 22-11-18 16:26:40, David Hildenbrand wrote:
> On 22.11.18 11:12, Wei Yang wrote:
> > During online_pages phase, pgdat->nr_zones will be updated in case this
> > zone is empty.
> > 
> > Currently the online_pages phase is protected by the global lock
> > mem_hotplug_begin(), which ensures there is no contention during the
> > update of nr_zones. But this global lock introduces scalability issues.
> > 
> > This patch is a preparation for removing the global lock during
> > online_pages phase. Also this patch changes the documentation of
> > node_size_lock to include the protectioin of nr_zones.
> 
> I looked into locking recently, and there is more to it.
> 
> Please read:
> 
> commit dee6da22efac451d361f5224a60be2796d847b51
> Author: David Hildenbrand <david@redhat.com>
> Date:   Tue Oct 30 15:10:44 2018 -0700
> 
>     memory-hotplug.rst: add some details about locking internals
>     
>     Let's document the magic a bit, especially why device_hotplug_lock is
>     required when adding/removing memory and how it all play together with
>     requests to online/offline memory from user space.
> 
> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
> as of now.

Well, I would tend to disagree here. You might be describing the current
state of art but the device_hotplug_lock doesn't make much sense for the
memory hotplug in principle. There is absolutely nothing in the core MM
that would require this lock. The current state just uses a BKL in some
sense and we really want to get rid of that longterm. This patch is a tiny
step in that direction and I suspect many more will need to come on the
way. We really want to end up with a clear scope of each lock being
taken. A project for a brave soul...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-23  8:42     ` Michal Hocko
@ 2018-11-23  8:46       ` David Hildenbrand
  2018-11-26  1:44         ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2018-11-23  8:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, osalvador, akpm, linux-mm

On 23.11.18 09:42, Michal Hocko wrote:
> On Thu 22-11-18 16:26:40, David Hildenbrand wrote:
>> On 22.11.18 11:12, Wei Yang wrote:
>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>> zone is empty.
>>>
>>> Currently the online_pages phase is protected by the global lock
>>> mem_hotplug_begin(), which ensures there is no contention during the
>>> update of nr_zones. But this global lock introduces scalability issues.
>>>
>>> This patch is a preparation for removing the global lock during
>>> online_pages phase. Also this patch changes the documentation of
>>> node_size_lock to include the protectioin of nr_zones.
>>
>> I looked into locking recently, and there is more to it.
>>
>> Please read:
>>
>> commit dee6da22efac451d361f5224a60be2796d847b51
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>
>>     memory-hotplug.rst: add some details about locking internals
>>     
>>     Let's document the magic a bit, especially why device_hotplug_lock is
>>     required when adding/removing memory and how it all play together with
>>     requests to online/offline memory from user space.
>>
>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>> as of now.
> 
> Well, I would tend to disagree here. You might be describing the current
> state of art but the device_hotplug_lock doesn't make much sense for the
> memory hotplug in principle. There is absolutely nothing in the core MM

There are collisions with CPU hotplug that require this lock (when nodes
come and go as far as I remember). And there is the problematic lock
inversion that can happen when adding/remving memory. This all has to be
sorted out, we'll have to see if we really need it for
onlining/offlining, though, however ...

> that would require this lock. The current state just uses a BKL in some
> sense and we really want to get rid of that longterm. This patch is a tiny
> step in that direction and I suspect many more will need to come on the
> way. We really want to end up with a clear scope of each lock being
> taken. A project for a brave soul...

... for now I don't consider "optimize for parallel
onlining/offlining/adding/removing of memory and cpus" really necessary.
What is necessary indeed is to not slowdown the whole system just
because some memory is coming/going. Therefore I agree, this patch is a
step into the right direction.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-23  8:46       ` David Hildenbrand
@ 2018-11-26  1:44         ` Wei Yang
  2018-11-26  9:24           ` David Hildenbrand
  0 siblings, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-26  1:44 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Michal Hocko, Wei Yang, osalvador, akpm, linux-mm

On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote:
>On 23.11.18 09:42, Michal Hocko wrote:
>> On Thu 22-11-18 16:26:40, David Hildenbrand wrote:
>>> On 22.11.18 11:12, Wei Yang wrote:
>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>> zone is empty.
>>>>
>>>> Currently the online_pages phase is protected by the global lock
>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>
>>>> This patch is a preparation for removing the global lock during
>>>> online_pages phase. Also this patch changes the documentation of
>>>> node_size_lock to include the protectioin of nr_zones.
>>>
>>> I looked into locking recently, and there is more to it.
>>>
>>> Please read:
>>>
>>> commit dee6da22efac451d361f5224a60be2796d847b51
>>> Author: David Hildenbrand <david@redhat.com>
>>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>>
>>>     memory-hotplug.rst: add some details about locking internals
>>>     
>>>     Let's document the magic a bit, especially why device_hotplug_lock is
>>>     required when adding/removing memory and how it all play together with
>>>     requests to online/offline memory from user space.
>>>
>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>>> as of now.
>> 
>> Well, I would tend to disagree here. You might be describing the current
>> state of art but the device_hotplug_lock doesn't make much sense for the
>> memory hotplug in principle. There is absolutely nothing in the core MM
>
>There are collisions with CPU hotplug that require this lock (when nodes
>come and go as far as I remember). And there is the problematic lock
>inversion that can happen when adding/remving memory. This all has to be
>sorted out, we'll have to see if we really need it for
>onlining/offlining, though, however ...
>

Seems I get a little understanding on this part.

There are two hotplug:
   * CPU hotplug 
   * Memory hotplug.

There are two phase for Memory hotplug:
   * physical add/remove 
   * logical online/offline

All of them are protected by device_hotplug_lock now, so we need to be
careful to release this in any case. Is my understanding correct?

>> that would require this lock. The current state just uses a BKL in some
>> sense and we really want to get rid of that longterm. This patch is a tiny
>> step in that direction and I suspect many more will need to come on the
>> way. We really want to end up with a clear scope of each lock being
>> taken. A project for a brave soul...
>
>... for now I don't consider "optimize for parallel
>onlining/offlining/adding/removing of memory and cpus" really necessary.
>What is necessary indeed is to not slowdown the whole system just
>because some memory is coming/going. Therefore I agree, this patch is a
>step into the right direction.
>

Agree.

The target is to accelerate the hot-plug without slow down the normal
process. 

>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-21  7:15       ` Michal Hocko
  2018-11-22  1:52         ` Wei Yang
@ 2018-11-26  2:28         ` Wei Yang
  2018-11-26  8:16           ` Michal Hocko
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-26  2:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Oscar Salvador, Andrew Morton, Linux-MM

On Wed, Nov 21, 2018 at 3:15 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 21-11-18 02:52:31, Wei Yang wrote:
> > On Tue, Nov 20, 2018 at 08:58:11AM +0100, osalvador@suse.de wrote:
> > >> On the other hand I would like to see the global lock to go away because
> > >> it causes scalability issues and I would like to change it to a range
> > >> lock. This would make this race possible.
> > >>
> > >> That being said this is more of a preparatory work than a fix. One could
> > >> argue that pgdat resize lock is abused here but I am not convinced a
> > >> dedicated lock is much better. We do take this lock already and spanning
> > >> its scope seems reasonable. An update to the documentation is due.
> > >
> > >Would not make more sense to move it within the pgdat lock
> > >in move_pfn_range_to_zone?
> > >The call from free_area_init_core is safe as we are single-thread there.
> > >
> >
> > Agree. This would be better.
> >
> > >And if we want to move towards a range locking, I even think it would be more
> > >consistent if we move it within the zone's span lock (which is already
> > >wrapped with a pgdat lock).
> > >
> >
> > I lost a little here, just want to confirm with you.
> >
> > Instead of call pgdat_resize_lock() around init_currently_empty_zone()
> > in move_pfn_range_to_zone(), we move init_currently_empty_zone() before
> > resize_zone_range()?
> >
> > This sounds reasonable.
>
> Btw. resolving the existing TODO would be nice as well, now that you are
> looking that direction...
>

Michal,

I took a look at commit f1dd2cd13c4b ("mm, memory_hotplug: do not
associate hotadded memory to zones until online"), and try to understand
this TODO.

The reason to acquire these lock is before this commit, the memory is
associated with zone at physical adding phase, while after this commit,
this is delayed to logical online stage.

But I get some difficulty to understand this TODO. You want to get rid of
these lock? While these locks seem necessary to protect those data of
pgdat/zone. Would you mind sharing more on this statement?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-26  2:28         ` Wei Yang
@ 2018-11-26  8:16           ` Michal Hocko
  2018-11-26  9:06             ` Wei Yang
  2018-11-27  3:12             ` Wei Yang
  0 siblings, 2 replies; 40+ messages in thread
From: Michal Hocko @ 2018-11-26  8:16 UTC (permalink / raw)
  To: Wei Yang; +Cc: Oscar Salvador, Andrew Morton, Linux-MM

On Mon 26-11-18 10:28:40, Wei Yang wrote:
[...]
> But I get some difficulty to understand this TODO. You want to get rid of
> these lock? While these locks seem necessary to protect those data of
> pgdat/zone. Would you mind sharing more on this statement?

Why do we need this lock to be irqsave? Is there any caller that uses
the lock from the IRQ context?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-26  8:16           ` Michal Hocko
@ 2018-11-26  9:06             ` Wei Yang
  2018-11-26 10:03               ` Michal Hocko
  2018-11-27  3:12             ` Wei Yang
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-26  9:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, Oscar Salvador, Andrew Morton, Linux-MM

On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
>On Mon 26-11-18 10:28:40, Wei Yang wrote:
>[...]
>> But I get some difficulty to understand this TODO. You want to get rid of
>> these lock? While these locks seem necessary to protect those data of
>> pgdat/zone. Would you mind sharing more on this statement?
>
>Why do we need this lock to be irqsave? Is there any caller that uses
>the lock from the IRQ context?

I see you put the comment 'irqsave' in code, I thought this is the
requirement bringing in by this commit. So this is copyed from somewhere
else?

>From my understanding, we don't access pgdat from interrupt context.

BTW, one more confirmation. One irqsave lock means we can't do something
during holding the lock, like sleep. Is my understanding correct?

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-26  1:44         ` Wei Yang
@ 2018-11-26  9:24           ` David Hildenbrand
  2018-11-27  0:23             ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2018-11-26  9:24 UTC (permalink / raw)
  To: Wei Yang; +Cc: Michal Hocko, osalvador, akpm, linux-mm

On 26.11.18 02:44, Wei Yang wrote:
> On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote:
>> On 23.11.18 09:42, Michal Hocko wrote:
>>> On Thu 22-11-18 16:26:40, David Hildenbrand wrote:
>>>> On 22.11.18 11:12, Wei Yang wrote:
>>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>>> zone is empty.
>>>>>
>>>>> Currently the online_pages phase is protected by the global lock
>>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>>
>>>>> This patch is a preparation for removing the global lock during
>>>>> online_pages phase. Also this patch changes the documentation of
>>>>> node_size_lock to include the protectioin of nr_zones.
>>>>
>>>> I looked into locking recently, and there is more to it.
>>>>
>>>> Please read:
>>>>
>>>> commit dee6da22efac451d361f5224a60be2796d847b51
>>>> Author: David Hildenbrand <david@redhat.com>
>>>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>>>
>>>>     memory-hotplug.rst: add some details about locking internals
>>>>     
>>>>     Let's document the magic a bit, especially why device_hotplug_lock is
>>>>     required when adding/removing memory and how it all play together with
>>>>     requests to online/offline memory from user space.
>>>>
>>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>>>> as of now.
>>>
>>> Well, I would tend to disagree here. You might be describing the current
>>> state of art but the device_hotplug_lock doesn't make much sense for the
>>> memory hotplug in principle. There is absolutely nothing in the core MM
>>
>> There are collisions with CPU hotplug that require this lock (when nodes
>> come and go as far as I remember). And there is the problematic lock
>> inversion that can happen when adding/remving memory. This all has to be
>> sorted out, we'll have to see if we really need it for
>> onlining/offlining, though, however ...
>>
> 
> Seems I get a little understanding on this part.
> 
> There are two hotplug:
>    * CPU hotplug 
>    * Memory hotplug.
> 
> There are two phase for Memory hotplug:
>    * physical add/remove 
>    * logical online/offline
> 
> All of them are protected by device_hotplug_lock now, so we need to be
> careful to release this in any case. Is my understanding correct?

Yes, e.g. the acpi driver always held the device_hotplug_lock (due to
possible problems with concurrent cpu/memory hot(un)plug). Onlining
offlining of devices (including cpu/memory) from user space always held
the device_hotplug_lock. So this part was executed sequentially for a
long time.

I recently made sure that any adding/removing/onlining/offlining
correctly grabs the device_hotplug_lock AND the mem_hotplug_lock in any
case (because it was inconsistent and broken), so it is all executed
sequentially.

So when getting rid of mem_hotplug_lock we only have to care about all
users that don't take the device_hotplug_lock.

> 
>>> that would require this lock. The current state just uses a BKL in some
>>> sense and we really want to get rid of that longterm. This patch is a tiny
>>> step in that direction and I suspect many more will need to come on the
>>> way. We really want to end up with a clear scope of each lock being
>>> taken. A project for a brave soul...
>>
>> ... for now I don't consider "optimize for parallel
>> onlining/offlining/adding/removing of memory and cpus" really necessary.
>> What is necessary indeed is to not slowdown the whole system just
>> because some memory is coming/going. Therefore I agree, this patch is a
>> step into the right direction.
>>
> 
> Agree.
> 
> The target is to accelerate the hot-plug without slow down the normal
> process. 

Indeed!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-26  9:06             ` Wei Yang
@ 2018-11-26 10:03               ` Michal Hocko
  2018-11-27  0:18                 ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2018-11-26 10:03 UTC (permalink / raw)
  To: Wei Yang; +Cc: Oscar Salvador, Andrew Morton, Linux-MM

On Mon 26-11-18 09:06:54, Wei Yang wrote:
> On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
> >On Mon 26-11-18 10:28:40, Wei Yang wrote:
> >[...]
> >> But I get some difficulty to understand this TODO. You want to get rid of
> >> these lock? While these locks seem necessary to protect those data of
> >> pgdat/zone. Would you mind sharing more on this statement?
> >
> >Why do we need this lock to be irqsave? Is there any caller that uses
> >the lock from the IRQ context?
> 
> I see you put the comment 'irqsave' in code, I thought this is the
> requirement bringing in by this commit. So this is copyed from somewhere
> else?

No, the irqsave lock has been there for a long time but it was not clear
to me whether it is still required. Maybe it never was. I just didn't
have time to look into that and put a TODO there. The code wouldn't be
less correct if I kept it.

> >From my understanding, we don't access pgdat from interrupt context.
> 
> BTW, one more confirmation. One irqsave lock means we can't do something
> during holding the lock, like sleep. Is my understanding correct?

You cannot sleep in any atomic context. IRQ safe lock only means that
IRQs are disabled along with the lock. The irqsave variant should be
taken when an IRQ context itself can take the lock. There is a lot of
documentation to clarify this e.g. Linux Device Drivers. I would
recommend to read through that.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-26 10:03               ` Michal Hocko
@ 2018-11-27  0:18                 ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-27  0:18 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, Oscar Salvador, Andrew Morton, Linux-MM

On Mon, Nov 26, 2018 at 11:03:30AM +0100, Michal Hocko wrote:
>On Mon 26-11-18 09:06:54, Wei Yang wrote:
>> On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
>> >On Mon 26-11-18 10:28:40, Wei Yang wrote:
>> >[...]
>> >> But I get some difficulty to understand this TODO. You want to get rid of
>> >> these lock? While these locks seem necessary to protect those data of
>> >> pgdat/zone. Would you mind sharing more on this statement?
>> >
>> >Why do we need this lock to be irqsave? Is there any caller that uses
>> >the lock from the IRQ context?
>> 
>> I see you put the comment 'irqsave' in code, I thought this is the
>> requirement bringing in by this commit. So this is copyed from somewhere
>> else?
>
>No, the irqsave lock has been there for a long time but it was not clear
>to me whether it is still required. Maybe it never was. I just didn't
>have time to look into that and put a TODO there. The code wouldn't be
>less correct if I kept it.
>

Let me summarize what you expect to do.

Go through all the users of pgdat_resize_lock, if none of them is called
from IRQ context, we could do the following change:

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ffd9cd10fcf3..45a5affcab8a 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -272,14 +272,14 @@ static inline bool movable_node_is_enabled(void)
  * pgdat resizing functions
  */
 static inline
-void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags)
+void pgdat_resize_lock(struct pglist_data *pgdat)
 {
-	spin_lock_irqsave(&pgdat->node_size_lock, *flags);
+	spin_lock(&pgdat->node_size_lock);
 }
 static inline
-void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags)
+void pgdat_resize_unlock(struct pglist_data *pgdat)
 {
-	spin_unlock_irqrestore(&pgdat->node_size_lock, *flags);
+	spin_unlock(&pgdat->node_size_lock);
 }
 static inline
 void pgdat_resize_init(struct pglist_data *pgdat)

>> >From my understanding, we don't access pgdat from interrupt context.
>> 
>> BTW, one more confirmation. One irqsave lock means we can't do something
>> during holding the lock, like sleep. Is my understanding correct?
>
>You cannot sleep in any atomic context. IRQ safe lock only means that
>IRQs are disabled along with the lock. The irqsave variant should be
>taken when an IRQ context itself can take the lock. There is a lot of
>documentation to clarify this e.g. Linux Device Drivers. I would
>recommend to read through that.
>

Thanks.

I took a look at this one which seems to resolve my confusion.

https://www.kernel.org/doc/Documentation/locking/spinlocks.txt

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-26  9:24           ` David Hildenbrand
@ 2018-11-27  0:23             ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-27  0:23 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, Michal Hocko, osalvador, akpm, linux-mm

On Mon, Nov 26, 2018 at 10:24:26AM +0100, David Hildenbrand wrote:
>On 26.11.18 02:44, Wei Yang wrote:
>> On Fri, Nov 23, 2018 at 09:46:52AM +0100, David Hildenbrand wrote:
>>> On 23.11.18 09:42, Michal Hocko wrote:
>>>> On Thu 22-11-18 16:26:40, David Hildenbrand wrote:
>>>>> On 22.11.18 11:12, Wei Yang wrote:
>>>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>>>> zone is empty.
>>>>>>
>>>>>> Currently the online_pages phase is protected by the global lock
>>>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>>>
>>>>>> This patch is a preparation for removing the global lock during
>>>>>> online_pages phase. Also this patch changes the documentation of
>>>>>> node_size_lock to include the protectioin of nr_zones.
>>>>>
>>>>> I looked into locking recently, and there is more to it.
>>>>>
>>>>> Please read:
>>>>>
>>>>> commit dee6da22efac451d361f5224a60be2796d847b51
>>>>> Author: David Hildenbrand <david@redhat.com>
>>>>> Date:   Tue Oct 30 15:10:44 2018 -0700
>>>>>
>>>>>     memory-hotplug.rst: add some details about locking internals
>>>>>     
>>>>>     Let's document the magic a bit, especially why device_hotplug_lock is
>>>>>     required when adding/removing memory and how it all play together with
>>>>>     requests to online/offline memory from user space.
>>>>>
>>>>> Short summary: Onlining/offlining of memory requires the device_hotplug_lock
>>>>> as of now.
>>>>
>>>> Well, I would tend to disagree here. You might be describing the current
>>>> state of art but the device_hotplug_lock doesn't make much sense for the
>>>> memory hotplug in principle. There is absolutely nothing in the core MM
>>>
>>> There are collisions with CPU hotplug that require this lock (when nodes
>>> come and go as far as I remember). And there is the problematic lock
>>> inversion that can happen when adding/remving memory. This all has to be
>>> sorted out, we'll have to see if we really need it for
>>> onlining/offlining, though, however ...
>>>
>> 
>> Seems I get a little understanding on this part.
>> 
>> There are two hotplug:
>>    * CPU hotplug 
>>    * Memory hotplug.
>> 
>> There are two phase for Memory hotplug:
>>    * physical add/remove 
>>    * logical online/offline
>> 
>> All of them are protected by device_hotplug_lock now, so we need to be
>> careful to release this in any case. Is my understanding correct?
>
>Yes, e.g. the acpi driver always held the device_hotplug_lock (due to
>possible problems with concurrent cpu/memory hot(un)plug). Onlining
>offlining of devices (including cpu/memory) from user space always held
>the device_hotplug_lock. So this part was executed sequentially for a
>long time.
>
>I recently made sure that any adding/removing/onlining/offlining
>correctly grabs the device_hotplug_lock AND the mem_hotplug_lock in any
>case (because it was inconsistent and broken), so it is all executed
>sequentially.
>
>So when getting rid of mem_hotplug_lock we only have to care about all
>users that don't take the device_hotplug_lock.
>

Thanks for your sharing.

So there are two global locks to protect the hotplug procedure.

  * device_hotplug_lock
  * mem_hotplug_lock

Seems even removing mem_hotplug_lock doesn't help much on scalability?

Maybe we need to move one by one.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-26  8:16           ` Michal Hocko
  2018-11-26  9:06             ` Wei Yang
@ 2018-11-27  3:12             ` Wei Yang
  2018-11-27 13:16               ` Michal Hocko
  1 sibling, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-11-27  3:12 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, Oscar Salvador, Andrew Morton, Linux-MM

On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
>On Mon 26-11-18 10:28:40, Wei Yang wrote:
>[...]
>> But I get some difficulty to understand this TODO. You want to get rid of
>> these lock? While these locks seem necessary to protect those data of
>> pgdat/zone. Would you mind sharing more on this statement?
>
>Why do we need this lock to be irqsave? Is there any caller that uses
>the lock from the IRQ context?

Went through the code, we have totally 9 place acquire
pgdat_resize_lock:

   lib/show_mem.c:         1    show_mem()
   mm/memory_hotplug.c:    4    online/offline_pages/__remove_zone()
   mm/page_alloc.c:        2    defer_init
   mm/sparse.c:            2    not necessary

Two places I am not sure:

   * show_mem() would be called from __alloc_pages_slowpath()
   * __remove_zone() is related to acpi_scan() on x86, may related to
     other method on different arch

I am not 100% for sure, while they looks like to be called in IRQ
context.

My ugly idea is:

   * drop pgdat_resize_lock in show_mem(), we don't change the value
     here. or replace this with a read/write lock?
   * can we adjust pgdat's range in offline_pages()? This would be
     consistent since we adjust them in online_pages().


>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-27  3:12             ` Wei Yang
@ 2018-11-27 13:16               ` Michal Hocko
  2018-11-27 23:56                 ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2018-11-27 13:16 UTC (permalink / raw)
  To: Wei Yang; +Cc: Oscar Salvador, Andrew Morton, Linux-MM

On Tue 27-11-18 03:12:00, Wei Yang wrote:
> On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
> >On Mon 26-11-18 10:28:40, Wei Yang wrote:
> >[...]
> >> But I get some difficulty to understand this TODO. You want to get rid of
> >> these lock? While these locks seem necessary to protect those data of
> >> pgdat/zone. Would you mind sharing more on this statement?
> >
> >Why do we need this lock to be irqsave? Is there any caller that uses
> >the lock from the IRQ context?
> 
> Went through the code, we have totally 9 place acquire
> pgdat_resize_lock:
> 
>    lib/show_mem.c:         1    show_mem()
>    mm/memory_hotplug.c:    4    online/offline_pages/__remove_zone()
>    mm/page_alloc.c:        2    defer_init
>    mm/sparse.c:            2    not necessary
> 
> Two places I am not sure:
> 
>    * show_mem() would be called from __alloc_pages_slowpath()

This shouldn't really need the lock. It is a mostly debugging aid rather
than something that cannot tolarate racing with hotplug. What is the
worst case that can happen?

>    * __remove_zone() is related to acpi_scan() on x86, may related to
>      other method on different arch

This one really needs a lock qwith a larger scope anyway.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock()
  2018-11-27 13:16               ` Michal Hocko
@ 2018-11-27 23:56                 ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-27 23:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Wei Yang, Oscar Salvador, Andrew Morton, Linux-MM

On Tue, Nov 27, 2018 at 02:16:58PM +0100, Michal Hocko wrote:
>On Tue 27-11-18 03:12:00, Wei Yang wrote:
>> On Mon, Nov 26, 2018 at 09:16:08AM +0100, Michal Hocko wrote:
>> >On Mon 26-11-18 10:28:40, Wei Yang wrote:
>> >[...]
>> >> But I get some difficulty to understand this TODO. You want to get rid of
>> >> these lock? While these locks seem necessary to protect those data of
>> >> pgdat/zone. Would you mind sharing more on this statement?
>> >
>> >Why do we need this lock to be irqsave? Is there any caller that uses
>> >the lock from the IRQ context?
>> 
>> Went through the code, we have totally 9 place acquire
>> pgdat_resize_lock:
>> 
>>    lib/show_mem.c:         1    show_mem()
>>    mm/memory_hotplug.c:    4    online/offline_pages/__remove_zone()
>>    mm/page_alloc.c:        2    defer_init
>>    mm/sparse.c:            2    not necessary
>> 
>> Two places I am not sure:
>> 
>>    * show_mem() would be called from __alloc_pages_slowpath()
>
>This shouldn't really need the lock. It is a mostly debugging aid rather
>than something that cannot tolarate racing with hotplug. What is the
>worst case that can happen?
>

Agree.

The worst case is debug information is not exact in case defer init or
hotplug happens at the same time. While this is a rare case.

If you think it is ok, I would suggest to remove the lock here.

>>    * __remove_zone() is related to acpi_scan() on x86, may related to
>>      other method on different arch
>
>This one really needs a lock qwith a larger scope anyway.

Based on my understanding, __remove_zone() happens at physical memory
remove phase. While for currently logic, we adjust zone information at
logic memory online phase.

They looks not consistent?

If we could do this at logical memory offline phase, we are sure this is
not in IRQ context.

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
                     ` (2 preceding siblings ...)
  2018-11-22 15:26   ` David Hildenbrand
@ 2018-11-30  6:58   ` Wei Yang
  2018-11-30  9:30     ` David Hildenbrand
  2018-12-03 20:50     ` [PATCH v4] " Wei Yang
  3 siblings, 2 replies; 40+ messages in thread
From: Wei Yang @ 2018-11-30  6:58 UTC (permalink / raw)
  To: mhocko, osalvador, david; +Cc: akpm, linux-mm, Wei Yang

During online_pages phase, pgdat->nr_zones will be updated in case this
zone is empty.

Currently the online_pages phase is protected by the global lock
mem_hotplug_begin(), which ensures there is no contention during the
update of nr_zones. But this global lock introduces scalability issues.

The patch moves init_currently_empty_zone under both zone_span_writelock
and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
and the zone's start_pfn. Also this patch changes the documentation
of node_size_lock to include the protectioin of nr_zones.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
CC: David Hildenbrand <david@redhat.com>

---
David, I may not catch you exact comment on the code or changelog. If I
missed, just let me know.

---
v3:
  * slightly modify the last paragraph of changelog based on Michal's
    comment
v2:
  * commit log changes
  * modify the code in move_pfn_range_to_zone() instead of in
    init_currently_empty_zone()
  * pgdat_resize_lock documentation change
---
 include/linux/mmzone.h | 7 ++++---
 mm/memory_hotplug.c    | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3d0c472438d2..37d9c5c3faa6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -635,9 +635,10 @@ typedef struct pglist_data {
 #endif
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 	/*
-	 * Must be held any time you expect node_start_pfn, node_present_pages
-	 * or node_spanned_pages stay constant.  Holding this will also
-	 * guarantee that any pfn_valid() stays that way.
+	 * Must be held any time you expect node_start_pfn,
+	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
+	 * Holding this will also guarantee that any pfn_valid() stays that
+	 * way.
 	 *
 	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
 	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 61972da38d93..f626e7e5f57b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	int nid = pgdat->node_id;
 	unsigned long flags;
 
-	if (zone_is_empty(zone))
-		init_currently_empty_zone(zone, start_pfn, nr_pages);
-
 	clear_zone_contiguous(zone);
 
 	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
 	pgdat_resize_lock(pgdat, &flags);
 	zone_span_writelock(zone);
+	if (zone_is_empty(zone))
+		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
 	zone_span_writeunlock(zone);
 	resize_pgdat_range(pgdat, start_pfn, nr_pages);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-30  6:58   ` [PATCH v3] " Wei Yang
@ 2018-11-30  9:30     ` David Hildenbrand
  2018-12-01  0:27       ` Wei Yang
  2018-12-03 20:50     ` [PATCH v4] " Wei Yang
  1 sibling, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2018-11-30  9:30 UTC (permalink / raw)
  To: Wei Yang, mhocko, osalvador; +Cc: akpm, linux-mm

On 30.11.18 07:58, Wei Yang wrote:
> During online_pages phase, pgdat->nr_zones will be updated in case this
> zone is empty.
> 
> Currently the online_pages phase is protected by the global lock
> mem_hotplug_begin(), which ensures there is no contention during the
> update of nr_zones. But this global lock introduces scalability issues.
> 
> The patch moves init_currently_empty_zone under both zone_span_writelock
> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
> and the zone's start_pfn. Also this patch changes the documentation
> of node_size_lock to include the protectioin of nr_zones.

s/protectioin/protection/

> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> CC: David Hildenbrand <david@redhat.com>
> 
> ---
> David, I may not catch you exact comment on the code or changelog. If I
> missed, just let me know.

I guess I would have rewritten it to something like the following

"
Currently the online_pages phase is protected by two global locks
(device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
can result in scalability issues, as it will slow down code relying on
get_online_mems(). Let's prepare code for not having to rely on
get_online_mems() but instead some more fine grained locks.

During online_pages phase, pgdat->nr_zones will be updated in case the
zone is empty. Right now mem_hotplug_lock ensures that there is no
contention during the update of nr_zones.

The patch moves init_currently_empty_zone under both zone_span_writelock
and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
and the zone's start_pfn. Also this patch changes the documentation
of node_size_lock to include the protection of nr_zones.
"

Does that make sense?

> 
> ---
> v3:
>   * slightly modify the last paragraph of changelog based on Michal's
>     comment
> v2:
>   * commit log changes
>   * modify the code in move_pfn_range_to_zone() instead of in
>     init_currently_empty_zone()
>   * pgdat_resize_lock documentation change
> ---
>  include/linux/mmzone.h | 7 ++++---
>  mm/memory_hotplug.c    | 5 ++---
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 3d0c472438d2..37d9c5c3faa6 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -635,9 +635,10 @@ typedef struct pglist_data {
>  #endif
>  #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>  	/*
> -	 * Must be held any time you expect node_start_pfn, node_present_pages
> -	 * or node_spanned_pages stay constant.  Holding this will also
> -	 * guarantee that any pfn_valid() stays that way.
> +	 * Must be held any time you expect node_start_pfn,
> +	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
> +	 * Holding this will also guarantee that any pfn_valid() stays that
> +	 * way.
>  	 *
>  	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
>  	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 61972da38d93..f626e7e5f57b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>  	int nid = pgdat->node_id;
>  	unsigned long flags;
>  
> -	if (zone_is_empty(zone))
> -		init_currently_empty_zone(zone, start_pfn, nr_pages);
> -
>  	clear_zone_contiguous(zone);
>  
>  	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>  	pgdat_resize_lock(pgdat, &flags);
>  	zone_span_writelock(zone);
> +	if (zone_is_empty(zone))
> +		init_currently_empty_zone(zone, start_pfn, nr_pages);
>  	resize_zone_range(zone, start_pfn, nr_pages);
>  	zone_span_writeunlock(zone);
>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-30  9:30     ` David Hildenbrand
@ 2018-12-01  0:27       ` Wei Yang
  2018-12-03 10:09         ` David Hildenbrand
  0 siblings, 1 reply; 40+ messages in thread
From: Wei Yang @ 2018-12-01  0:27 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, mhocko, osalvador, akpm, linux-mm

On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>On 30.11.18 07:58, Wei Yang wrote:
>> During online_pages phase, pgdat->nr_zones will be updated in case this
>> zone is empty.
>> 
>> Currently the online_pages phase is protected by the global lock
>> mem_hotplug_begin(), which ensures there is no contention during the
>> update of nr_zones. But this global lock introduces scalability issues.
>> 
>> The patch moves init_currently_empty_zone under both zone_span_writelock
>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>> and the zone's start_pfn. Also this patch changes the documentation
>> of node_size_lock to include the protectioin of nr_zones.
>
>s/protectioin/protection/
>
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>> CC: David Hildenbrand <david@redhat.com>
>> 
>> ---
>> David, I may not catch you exact comment on the code or changelog. If I
>> missed, just let me know.
>
>I guess I would have rewritten it to something like the following
>
>"
>Currently the online_pages phase is protected by two global locks
>(device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>can result in scalability issues, as it will slow down code relying on
>get_online_mems(). Let's prepare code for not having to rely on
>get_online_mems() but instead some more fine grained locks.

I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
grabed in many places besides this one. In my mind, each place introduce
scalability issue, not only this one.

Or you want to say, the mem_hotplug_lock will introduce scalability
issue in two place:

  * hotplug process itself
  * slab allocation process

The second one is more critical. And this is what we try to address?

>
>During online_pages phase, pgdat->nr_zones will be updated in case the
>zone is empty. Right now mem_hotplug_lock ensures that there is no
>contention during the update of nr_zones.
>
>The patch moves init_currently_empty_zone under both zone_span_writelock
>and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>and the zone's start_pfn. Also this patch changes the documentation
>of node_size_lock to include the protection of nr_zones.
>"
>
>Does that make sense?
>
>> 
>> ---
>> v3:
>>   * slightly modify the last paragraph of changelog based on Michal's
>>     comment
>> v2:
>>   * commit log changes
>>   * modify the code in move_pfn_range_to_zone() instead of in
>>     init_currently_empty_zone()
>>   * pgdat_resize_lock documentation change
>> ---
>>  include/linux/mmzone.h | 7 ++++---
>>  mm/memory_hotplug.c    | 5 ++---
>>  2 files changed, 6 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 3d0c472438d2..37d9c5c3faa6 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -635,9 +635,10 @@ typedef struct pglist_data {
>>  #endif
>>  #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
>>  	/*
>> -	 * Must be held any time you expect node_start_pfn, node_present_pages
>> -	 * or node_spanned_pages stay constant.  Holding this will also
>> -	 * guarantee that any pfn_valid() stays that way.
>> +	 * Must be held any time you expect node_start_pfn,
>> +	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
>> +	 * Holding this will also guarantee that any pfn_valid() stays that
>> +	 * way.
>>  	 *
>>  	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
>>  	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 61972da38d93..f626e7e5f57b 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
>>  	int nid = pgdat->node_id;
>>  	unsigned long flags;
>>  
>> -	if (zone_is_empty(zone))
>> -		init_currently_empty_zone(zone, start_pfn, nr_pages);
>> -
>>  	clear_zone_contiguous(zone);
>>  
>>  	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
>>  	pgdat_resize_lock(pgdat, &flags);
>>  	zone_span_writelock(zone);
>> +	if (zone_is_empty(zone))
>> +		init_currently_empty_zone(zone, start_pfn, nr_pages);
>>  	resize_zone_range(zone, start_pfn, nr_pages);
>>  	zone_span_writeunlock(zone);
>>  	resize_pgdat_range(pgdat, start_pfn, nr_pages);
>> 
>
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-12-01  0:27       ` Wei Yang
@ 2018-12-03 10:09         ` David Hildenbrand
  2018-12-03 20:37           ` Wei Yang
  0 siblings, 1 reply; 40+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:09 UTC (permalink / raw)
  To: Wei Yang; +Cc: mhocko, osalvador, akpm, linux-mm

On 01.12.18 01:27, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>> On 30.11.18 07:58, Wei Yang wrote:
>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>> zone is empty.
>>>
>>> Currently the online_pages phase is protected by the global lock
>>> mem_hotplug_begin(), which ensures there is no contention during the
>>> update of nr_zones. But this global lock introduces scalability issues.
>>>
>>> The patch moves init_currently_empty_zone under both zone_span_writelock
>>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>>> and the zone's start_pfn. Also this patch changes the documentation
>>> of node_size_lock to include the protectioin of nr_zones.
>>
>> s/protectioin/protection/
>>
>>>
>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>>> CC: David Hildenbrand <david@redhat.com>
>>>
>>> ---
>>> David, I may not catch you exact comment on the code or changelog. If I
>>> missed, just let me know.
>>
>> I guess I would have rewritten it to something like the following
>>
>> "
>> Currently the online_pages phase is protected by two global locks
>> (device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>> can result in scalability issues, as it will slow down code relying on
>> get_online_mems(). Let's prepare code for not having to rely on
>> get_online_mems() but instead some more fine grained locks.
> 
> I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
> grabed in many places besides this one. In my mind, each place introduce
> scalability issue, not only this one.

mem_hotplug_lock is grabbed in write only when
adding/removing/onlining/offlining memory and when adding/removing
device memory. The read locker are the critical part for now.

> 
> Or you want to say, the mem_hotplug_lock will introduce scalability
> issue in two place:
> 
>   * hotplug process itself
>   * slab allocation process
> 
> The second one is more critical. And this is what we try to address?

Indeed, especially as the first usually (except device memory) also uses
the device_hotplug_lock, I only consider the second one critical.

Feel free to change this description to whatever you like.
As I already stated scalability of adding/removing/onlining/offlining is
not really an issue as of now (prove me wrong :) ). So I would not care
about including such information in this patch.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-12-03 10:09         ` David Hildenbrand
@ 2018-12-03 20:37           ` Wei Yang
  0 siblings, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-12-03 20:37 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, mhocko, osalvador, akpm, linux-mm

On Mon, Dec 03, 2018 at 11:09:52AM +0100, David Hildenbrand wrote:
>On 01.12.18 01:27, Wei Yang wrote:
>> On Fri, Nov 30, 2018 at 10:30:22AM +0100, David Hildenbrand wrote:
>>> On 30.11.18 07:58, Wei Yang wrote:
>>>> During online_pages phase, pgdat->nr_zones will be updated in case this
>>>> zone is empty.
>>>>
>>>> Currently the online_pages phase is protected by the global lock
>>>> mem_hotplug_begin(), which ensures there is no contention during the
>>>> update of nr_zones. But this global lock introduces scalability issues.
>>>>
>>>> The patch moves init_currently_empty_zone under both zone_span_writelock
>>>> and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
>>>> and the zone's start_pfn. Also this patch changes the documentation
>>>> of node_size_lock to include the protectioin of nr_zones.
>>>
>>> s/protectioin/protection/
>>>
>>>>
>>>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>>>> Acked-by: Michal Hocko <mhocko@suse.com>
>>>> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>>>> CC: David Hildenbrand <david@redhat.com>
>>>>
>>>> ---
>>>> David, I may not catch you exact comment on the code or changelog. If I
>>>> missed, just let me know.
>>>
>>> I guess I would have rewritten it to something like the following
>>>
>>> "
>>> Currently the online_pages phase is protected by two global locks
>>> (device_device_hotplug_lock and mem_hotplug_lock). Especial the latter
>>> can result in scalability issues, as it will slow down code relying on
>>> get_online_mems(). Let's prepare code for not having to rely on
>>> get_online_mems() but instead some more fine grained locks.
>> 
>> I am not sure why we specify get_online_mems() here. mem_hotplug_lock is
>> grabed in many places besides this one. In my mind, each place introduce
>> scalability issue, not only this one.
>
>mem_hotplug_lock is grabbed in write only when
>adding/removing/onlining/offlining memory and when adding/removing
>device memory. The read locker are the critical part for now.
>
>> 
>> Or you want to say, the mem_hotplug_lock will introduce scalability
>> issue in two place:
>> 
>>   * hotplug process itself
>>   * slab allocation process
>> 
>> The second one is more critical. And this is what we try to address?
>
>Indeed, especially as the first usually (except device memory) also uses
>the device_hotplug_lock, I only consider the second one critical.
>
>Feel free to change this description to whatever you like.
>As I already stated scalability of adding/removing/onlining/offlining is
>not really an issue as of now (prove me wrong :) ). So I would not care
>about including such information in this patch.
>

Thanks for your information.

Let me try to reword the changelog.

>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection
  2018-11-30  6:58   ` [PATCH v3] " Wei Yang
  2018-11-30  9:30     ` David Hildenbrand
@ 2018-12-03 20:50     ` Wei Yang
  1 sibling, 0 replies; 40+ messages in thread
From: Wei Yang @ 2018-12-03 20:50 UTC (permalink / raw)
  To: mhocko, osalvador, david; +Cc: akpm, linux-mm, Wei Yang

During online_pages phase, pgdat->nr_zones will be updated in case this
zone is empty.

Currently the online_pages phase is protected by the global locks
(device_device_hotplug_lock and mem_hotplug_lock), which ensures there
is no contention during the update of nr_zones.

These global locks introduces scalability issues (especially the second
one), which slow down code relying on get_online_mems(). This is also a
preparation for not having to rely on get_online_mems() but instead some
more fine grained locks.

The patch moves init_currently_empty_zone under both zone_span_writelock
and pgdat_resize_lock because both the pgdat state is changed (nr_zones)
and the zone's start_pfn. Also this patch changes the documentation
of node_size_lock to include the protection of nr_zones.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
CC: David Hildenbrand <david@redhat.com>

---
v4:
  * mention the preparation for improving scalability by David's
    comment
v3:
  * slightly modify the last paragraph of changelog based on Michal's
    comment
v2:
  * commit log changes
  * modify the code in move_pfn_range_to_zone() instead of in
    init_currently_empty_zone()
  * pgdat_resize_lock documentation change
---
 include/linux/mmzone.h | 7 ++++---
 mm/memory_hotplug.c    | 5 ++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 3d0c472438d2..37d9c5c3faa6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -635,9 +635,10 @@ typedef struct pglist_data {
 #endif
 #if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
 	/*
-	 * Must be held any time you expect node_start_pfn, node_present_pages
-	 * or node_spanned_pages stay constant.  Holding this will also
-	 * guarantee that any pfn_valid() stays that way.
+	 * Must be held any time you expect node_start_pfn,
+	 * node_present_pages, node_spanned_pages or nr_zones stay constant.
+	 * Holding this will also guarantee that any pfn_valid() stays that
+	 * way.
 	 *
 	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
 	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 61972da38d93..f626e7e5f57b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -742,14 +742,13 @@ void __ref move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 	int nid = pgdat->node_id;
 	unsigned long flags;
 
-	if (zone_is_empty(zone))
-		init_currently_empty_zone(zone, start_pfn, nr_pages);
-
 	clear_zone_contiguous(zone);
 
 	/* TODO Huh pgdat is irqsave while zone is not. It used to be like that before */
 	pgdat_resize_lock(pgdat, &flags);
 	zone_span_writelock(zone);
+	if (zone_is_empty(zone))
+		init_currently_empty_zone(zone, start_pfn, nr_pages);
 	resize_zone_range(zone, start_pfn, nr_pages);
 	zone_span_writeunlock(zone);
 	resize_pgdat_range(pgdat, start_pfn, nr_pages);
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-12-03 20:50 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20  1:48 [PATCH] mm, hotplug: protect nr_zones with pgdat_resize_lock() Wei Yang
2018-11-20  7:31 ` Michal Hocko
2018-11-20  7:58   ` osalvador
2018-11-20  8:48     ` Michal Hocko
2018-11-21  2:52     ` Wei Yang
2018-11-21  7:15       ` Michal Hocko
2018-11-22  1:52         ` Wei Yang
2018-11-22  8:39           ` Michal Hocko
2018-11-26  2:28         ` Wei Yang
2018-11-26  8:16           ` Michal Hocko
2018-11-26  9:06             ` Wei Yang
2018-11-26 10:03               ` Michal Hocko
2018-11-27  0:18                 ` Wei Yang
2018-11-27  3:12             ` Wei Yang
2018-11-27 13:16               ` Michal Hocko
2018-11-27 23:56                 ` Wei Yang
2018-11-21  8:24       ` osalvador
2018-11-21  2:44   ` Wei Yang
2018-11-21  7:14     ` Michal Hocko
2018-11-22 10:12 ` [PATCH v2] mm, hotplug: move init_currently_empty_zone() under zone_span_lock protection Wei Yang
2018-11-22 10:15   ` Wei Yang
2018-11-22 10:29     ` Michal Hocko
2018-11-22 14:27       ` Wei Yang
2018-11-22 10:37   ` osalvador
2018-11-22 14:28     ` Wei Yang
2018-11-22 15:26   ` David Hildenbrand
2018-11-22 21:28     ` Wei Yang
2018-11-22 21:53       ` David Hildenbrand
2018-11-22 23:53         ` Wei Yang
2018-11-23  8:42     ` Michal Hocko
2018-11-23  8:46       ` David Hildenbrand
2018-11-26  1:44         ` Wei Yang
2018-11-26  9:24           ` David Hildenbrand
2018-11-27  0:23             ` Wei Yang
2018-11-30  6:58   ` [PATCH v3] " Wei Yang
2018-11-30  9:30     ` David Hildenbrand
2018-12-01  0:27       ` Wei Yang
2018-12-03 10:09         ` David Hildenbrand
2018-12-03 20:37           ` Wei Yang
2018-12-03 20:50     ` [PATCH v4] " Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.