All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/page_alloc: add zone to zonelist if populated
@ 2022-02-03  2:00 Wei Yang
  2022-02-03  9:25 ` David Hildenbrand
  2022-02-03  9:27 ` Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Wei Yang @ 2022-02-03  2:00 UTC (permalink / raw)
  To: akpm, mhocko, mgorman; +Cc: linux-mm, linux-kernel, Wei Yang, David Hildenbrand

During memory hotplug, when online/offline a zone, we need to rebuild
the zonelist for all nodes. Current behavior would lose a valid zone in
zonelist since only pick up managed_zone.

There are two cases for a zone with memory but still !managed.

  * all pages were allocated via memblock
  * all pages were taken by ballooning / virtio-mem

This state maybe temporary, since both of them may release some memory.
Then it end up with a managed zone not in zonelist.

This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
and reclaim from zones with pages managed by the buddy allocator")'.
This patch restore the behavior.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Mel Gorman <mgorman@techsingularity.net>
CC: David Hildenbrand <david@redhat.com>
Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index de15021a2887..b433a57ee76f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
 	do {
 		zone_type--;
 		zone = pgdat->node_zones + zone_type;
-		if (managed_zone(zone)) {
+		if (populated_zone(zone)) {
 			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
 			check_highest_zone(zone_type);
 		}
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: add zone to zonelist if populated
  2022-02-03  2:00 [PATCH] mm/page_alloc: add zone to zonelist if populated Wei Yang
@ 2022-02-03  9:25 ` David Hildenbrand
  2022-02-06  2:11   ` Wei Yang
  2022-02-03  9:27 ` Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2022-02-03  9:25 UTC (permalink / raw)
  To: Wei Yang, akpm, mhocko, mgorman; +Cc: linux-mm, linux-kernel

On 03.02.22 03:00, Wei Yang wrote:
> During memory hotplug, when online/offline a zone, we need to rebuild
> the zonelist for all nodes. Current behavior would lose a valid zone in
> zonelist since only pick up managed_zone.
> 
> There are two cases for a zone with memory but still !managed.
> 
>   * all pages were allocated via memblock
>   * all pages were taken by ballooning / virtio-mem
> 
> This state maybe temporary, since both of them may release some memory.
> Then it end up with a managed zone not in zonelist.
> 
> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
> and reclaim from zones with pages managed by the buddy allocator")'.
> This patch restore the behavior.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Mel Gorman <mgorman@techsingularity.net>
> CC: David Hildenbrand <david@redhat.com>
> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")

That commit mentions that there used to be some ppc64 cases with fadump
where it might have been a real problem. Unfortunately, that commit
doesn't really tell what the performance implications are.

We'd have to know how many "permanent memblock" allocations we have,
that can never get freed.

> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index de15021a2887..b433a57ee76f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>  	do {
>  		zone_type--;
>  		zone = pgdat->node_zones + zone_type;
> -		if (managed_zone(zone)) {
> +		if (populated_zone(zone)) {
>  			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>  			check_highest_zone(zone_type);
>  		}

The comment above the function also expresses that "Add all populated
zones of a node to the zonelist.", so one way or the other, that should
be made consistent.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: add zone to zonelist if populated
  2022-02-03  2:00 [PATCH] mm/page_alloc: add zone to zonelist if populated Wei Yang
  2022-02-03  9:25 ` David Hildenbrand
@ 2022-02-03  9:27 ` Michal Hocko
  2022-02-06  2:17   ` Wei Yang
  2022-03-16  0:40   ` Wei Yang
  1 sibling, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2022-02-03  9:27 UTC (permalink / raw)
  To: Wei Yang; +Cc: akpm, mgorman, linux-mm, linux-kernel, David Hildenbrand

On Thu 03-02-22 02:00:22, Wei Yang wrote:
> During memory hotplug, when online/offline a zone, we need to rebuild
> the zonelist for all nodes. Current behavior would lose a valid zone in
> zonelist since only pick up managed_zone.
> 
> There are two cases for a zone with memory but still !managed.
> 
>   * all pages were allocated via memblock
>   * all pages were taken by ballooning / virtio-mem
> 
> This state maybe temporary, since both of them may release some memory.
> Then it end up with a managed zone not in zonelist.
> 
> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
> and reclaim from zones with pages managed by the buddy allocator")'.
> This patch restore the behavior.

It has been introduced to fix a problem described in the the changelog
(FADUMP configuration making kswapd hogging a cpu). You are not
explaining why the original issue is not possible after this change.

I also think that this is more of theoretical issue than anything that
is a real life concern. It is good to state that in the changelog as
well.

That being said I am not against the change but the changelog needs more
explanation before I can ack it.

> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> CC: Mel Gorman <mgorman@techsingularity.net>
> CC: David Hildenbrand <david@redhat.com>
> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")

Fixes tag should be really used only if the referenced commit breaks
something. I do not really see this to be the case here.

Thanks!

> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index de15021a2887..b433a57ee76f 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>  	do {
>  		zone_type--;
>  		zone = pgdat->node_zones + zone_type;
> -		if (managed_zone(zone)) {
> +		if (populated_zone(zone)) {
>  			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>  			check_highest_zone(zone_type);
>  		}
> -- 
> 2.33.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: add zone to zonelist if populated
  2022-02-03  9:25 ` David Hildenbrand
@ 2022-02-06  2:11   ` Wei Yang
  0 siblings, 0 replies; 6+ messages in thread
From: Wei Yang @ 2022-02-06  2:11 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Wei Yang, akpm, mhocko, mgorman, linux-mm, linux-kernel

On Thu, Feb 03, 2022 at 10:25:51AM +0100, David Hildenbrand wrote:
>On 03.02.22 03:00, Wei Yang wrote:
>> During memory hotplug, when online/offline a zone, we need to rebuild
>> the zonelist for all nodes. Current behavior would lose a valid zone in
>> zonelist since only pick up managed_zone.
>> 
>> There are two cases for a zone with memory but still !managed.
>> 
>>   * all pages were allocated via memblock
>>   * all pages were taken by ballooning / virtio-mem
>> 
>> This state maybe temporary, since both of them may release some memory.
>> Then it end up with a managed zone not in zonelist.
>> 
>> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
>> and reclaim from zones with pages managed by the buddy allocator")'.
>> This patch restore the behavior.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Mel Gorman <mgorman@techsingularity.net>
>> CC: David Hildenbrand <david@redhat.com>
>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
>
>That commit mentions that there used to be some ppc64 cases with fadump
>where it might have been a real problem. Unfortunately, that commit
>doesn't really tell what the performance implications are.
>

It mentioned a 100% CPU usage by commit 1d82de618ddd. Currently I don't find
which part introduced this and how it is fixed.

>We'd have to know how many "permanent memblock" allocations we have,
>that can never get freed.
>

For the case in that commit, the memory are reserved for crash kernel. I am
afraid this never get freed.

But for all the cases, I am not sure.

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: add zone to zonelist if populated
  2022-02-03  9:27 ` Michal Hocko
@ 2022-02-06  2:17   ` Wei Yang
  2022-03-16  0:40   ` Wei Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Wei Yang @ 2022-02-06  2:17 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, akpm, mgorman, linux-mm, linux-kernel, David Hildenbrand

On Thu, Feb 03, 2022 at 10:27:11AM +0100, Michal Hocko wrote:
>On Thu 03-02-22 02:00:22, Wei Yang wrote:
>> During memory hotplug, when online/offline a zone, we need to rebuild
>> the zonelist for all nodes. Current behavior would lose a valid zone in
>> zonelist since only pick up managed_zone.
>> 
>> There are two cases for a zone with memory but still !managed.
>> 
>>   * all pages were allocated via memblock
>>   * all pages were taken by ballooning / virtio-mem
>> 
>> This state maybe temporary, since both of them may release some memory.
>> Then it end up with a managed zone not in zonelist.
>> 
>> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
>> and reclaim from zones with pages managed by the buddy allocator")'.
>> This patch restore the behavior.
>
>It has been introduced to fix a problem described in the the changelog
>(FADUMP configuration making kswapd hogging a cpu). You are not
>explaining why the original issue is not possible after this change.
>

The first sight is kswapd deals with pgdat->node_zones, which is not affected
by pgdat->node_zonelists.

For the exact detail, I don't figure that out now. Will need some time to take
a look into. For that commit, I only found this link.
http://lkml.kernel.org/r/20160831195104.GB8119@techsingularity.net If there
are some other discussions, it would be helpful.

>I also think that this is more of theoretical issue than anything that
>is a real life concern. It is good to state that in the changelog as
>well.
>
>That being said I am not against the change but the changelog needs more
>explanation before I can ack it.
>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Mel Gorman <mgorman@techsingularity.net>
>> CC: David Hildenbrand <david@redhat.com>
>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
>
>Fixes tag should be really used only if the referenced commit breaks
>something. I do not really see this to be the case here.
>

Got it.

>Thanks!
>
>> ---
>>  mm/page_alloc.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index de15021a2887..b433a57ee76f 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6092,7 +6092,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>>  	do {
>>  		zone_type--;
>>  		zone = pgdat->node_zones + zone_type;
>> -		if (managed_zone(zone)) {
>> +		if (populated_zone(zone)) {
>>  			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>>  			check_highest_zone(zone_type);
>>  		}
>> -- 
>> 2.33.1
>
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mm/page_alloc: add zone to zonelist if populated
  2022-02-03  9:27 ` Michal Hocko
  2022-02-06  2:17   ` Wei Yang
@ 2022-03-16  0:40   ` Wei Yang
  1 sibling, 0 replies; 6+ messages in thread
From: Wei Yang @ 2022-03-16  0:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Wei Yang, akpm, mgorman, linux-mm, linux-kernel, David Hildenbrand

On Thu, Feb 03, 2022 at 10:27:11AM +0100, Michal Hocko wrote:
>On Thu 03-02-22 02:00:22, Wei Yang wrote:
>> During memory hotplug, when online/offline a zone, we need to rebuild
>> the zonelist for all nodes. Current behavior would lose a valid zone in
>> zonelist since only pick up managed_zone.
>> 
>> There are two cases for a zone with memory but still !managed.
>> 
>>   * all pages were allocated via memblock
>>   * all pages were taken by ballooning / virtio-mem
>> 
>> This state maybe temporary, since both of them may release some memory.
>> Then it end up with a managed zone not in zonelist.
>> 
>> This is introduced in 'commit 6aa303defb74 ("mm, vmscan: only allocate
>> and reclaim from zones with pages managed by the buddy allocator")'.
>> This patch restore the behavior.
>
>It has been introduced to fix a problem described in the the changelog
>(FADUMP configuration making kswapd hogging a cpu). You are not
>explaining why the original issue is not possible after this change.
>

After some reading, here is what I find.

To prevent this problem again, we need to make sure reclaim only applies to
managed_zones. After go through the code, there are only two places we don't
guarantee this when iterating zone.

  1. skip_throttle_noprogress()
  2. throttle_direct_reclaim()

After we make sure vmscan only reclaim on managed_zone, the problem won't be
possible after this change.

BTW, there are another two places use for_each_zone_zonelist_nodemask(). It's
ok to not check managed_zone, since actually they are doing a node base
iteration.

If this looks good to you, I would adjust the changelog and send two patches
to fix the above two places.

>I also think that this is more of theoretical issue than anything that
>is a real life concern. It is good to state that in the changelog as
>well.
>
>That being said I am not against the change but the changelog needs more
>explanation before I can ack it.
>
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> CC: Mel Gorman <mgorman@techsingularity.net>
>> CC: David Hildenbrand <david@redhat.com>
>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
>
>Fixes tag should be really used only if the referenced commit breaks
>something. I do not really see this to be the case here.
>
>Thanks!
>

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-03-16  0:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-03  2:00 [PATCH] mm/page_alloc: add zone to zonelist if populated Wei Yang
2022-02-03  9:25 ` David Hildenbrand
2022-02-06  2:11   ` Wei Yang
2022-02-03  9:27 ` Michal Hocko
2022-02-06  2:17   ` Wei Yang
2022-03-16  0:40   ` Wei Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.