linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
@ 2022-04-07 12:06 Juergen Gross
  2022-04-07 12:14 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Juergen Gross @ 2022-04-07 12:06 UTC (permalink / raw)
  To: xen-devel, linux-mm, linux-kernel
  Cc: Juergen Gross, Andrew Morton, stable,
	Marek Marczykowski-Górecki, Michal Hocko

Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
zones with pages managed by the buddy allocator") only zones with free
memory are included in a built zonelist. This is problematic when e.g.
all memory of a zone has been ballooned out when zonelists are being
rebuilt.

The decision whether to rebuild the zonelists when onlining new memory
is done based on populated_zone() returning 0 for the zone the memory
will be added to. The new zone is added to the zonelists only, if it
has free memory pages (managed_zone() returns a non-zero value) after
the memory has been onlined. This implies, that onlining memory will
always free the added pages to the allocator immediately, but this is
not true in all cases: when e.g. running as a Xen guest the onlined
new memory will be added only to the ballooned memory list, it will be
freed only when the guest is being ballooned up afterwards.

Another problem with using managed_zone() for the decision whether a
zone is being added to the zonelists is, that a zone with all memory
used will in fact be removed from all zonelists in case the zonelists
happen to be rebuilt.

Use populated_zone() when building a zonelist as it has been done
before that commit.

Cc: stable@vger.kernel.org
Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
V2:
- updated commit message (Michal Hocko)
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdc8f60ae462..3d0662af3289 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6128,7 +6128,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
 	do {
 		zone_type--;
 		zone = pgdat->node_zones + zone_type;
-		if (managed_zone(zone)) {
+		if (populated_zone(zone)) {
 			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
 			check_highest_zone(zone_type);
 		}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
  2022-04-07 12:06 [PATCH v2] mm, page_alloc: fix build_zonerefs_node() Juergen Gross
@ 2022-04-07 12:14 ` Michal Hocko
  2022-04-07 12:45 ` David Hildenbrand
  2022-04-07 22:44 ` Andrew Morton
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2022-04-07 12:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-mm, linux-kernel, Andrew Morton, stable,
	Marek Marczykowski-Górecki, Mel Gorman

[CC Mel]

On Thu 07-04-22 14:06:37, Juergen Gross wrote:
> Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
> zones with pages managed by the buddy allocator") only zones with free
> memory are included in a built zonelist. This is problematic when e.g.
> all memory of a zone has been ballooned out when zonelists are being
> rebuilt.
> 
> The decision whether to rebuild the zonelists when onlining new memory
> is done based on populated_zone() returning 0 for the zone the memory
> will be added to. The new zone is added to the zonelists only, if it
> has free memory pages (managed_zone() returns a non-zero value) after
> the memory has been onlined. This implies, that onlining memory will
> always free the added pages to the allocator immediately, but this is
> not true in all cases: when e.g. running as a Xen guest the onlined
> new memory will be added only to the ballooned memory list, it will be
> freed only when the guest is being ballooned up afterwards.

Thanks this is much more clearer!
 
> Another problem with using managed_zone() for the decision whether a
> zone is being added to the zonelists is, that a zone with all memory
> used will in fact be removed from all zonelists in case the zonelists
> happen to be rebuilt.
> 
> Use populated_zone() when building a zonelist as it has been done
> before that commit.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> V2:
> - updated commit message (Michal Hocko)
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bdc8f60ae462..3d0662af3289 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6128,7 +6128,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>  	do {
>  		zone_type--;
>  		zone = pgdat->node_zones + zone_type;
> -		if (managed_zone(zone)) {
> +		if (populated_zone(zone)) {
>  			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>  			check_highest_zone(zone_type);
>  		}
> -- 
> 2.34.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
  2022-04-07 12:06 [PATCH v2] mm, page_alloc: fix build_zonerefs_node() Juergen Gross
  2022-04-07 12:14 ` Michal Hocko
@ 2022-04-07 12:45 ` David Hildenbrand
  2022-04-07 12:53   ` Juergen Gross
  2022-04-07 22:44 ` Andrew Morton
  2 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2022-04-07 12:45 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, linux-mm, linux-kernel
  Cc: Andrew Morton, stable, Marek Marczykowski-Górecki, Michal Hocko

On 07.04.22 14:06, Juergen Gross wrote:
> Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
> zones with pages managed by the buddy allocator") only zones with free
> memory are included in a built zonelist. This is problematic when e.g.
> all memory of a zone has been ballooned out when zonelists are being
> rebuilt.
> 
> The decision whether to rebuild the zonelists when onlining new memory
> is done based on populated_zone() returning 0 for the zone the memory
> will be added to. The new zone is added to the zonelists only, if it
> has free memory pages (managed_zone() returns a non-zero value) after
> the memory has been onlined. This implies, that onlining memory will
> always free the added pages to the allocator immediately, but this is
> not true in all cases: when e.g. running as a Xen guest the onlined
> new memory will be added only to the ballooned memory list, it will be
> freed only when the guest is being ballooned up afterwards.
> 
> Another problem with using managed_zone() for the decision whether a
> zone is being added to the zonelists is, that a zone with all memory
> used will in fact be removed from all zonelists in case the zonelists
> happen to be rebuilt.
> 
> Use populated_zone() when building a zonelist as it has been done
> before that commit.
> 
> Cc: stable@vger.kernel.org
> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> V2:
> - updated commit message (Michal Hocko)
> ---
>  mm/page_alloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bdc8f60ae462..3d0662af3289 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6128,7 +6128,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>  	do {
>  		zone_type--;
>  		zone = pgdat->node_zones + zone_type;
> -		if (managed_zone(zone)) {
> +		if (populated_zone(zone)) {
>  			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>  			check_highest_zone(zone_type);
>  		}

Did you drop my Ack?

Also, I'd appreciate getting CCed on patches where I commented.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
  2022-04-07 12:45 ` David Hildenbrand
@ 2022-04-07 12:53   ` Juergen Gross
  0 siblings, 0 replies; 6+ messages in thread
From: Juergen Gross @ 2022-04-07 12:53 UTC (permalink / raw)
  To: David Hildenbrand, xen-devel, linux-mm, linux-kernel
  Cc: Andrew Morton, stable, Marek Marczykowski-Górecki, Michal Hocko


[-- Attachment #1.1.1: Type: text/plain, Size: 2543 bytes --]

On 07.04.22 14:45, David Hildenbrand wrote:
> On 07.04.22 14:06, Juergen Gross wrote:
>> Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
>> zones with pages managed by the buddy allocator") only zones with free
>> memory are included in a built zonelist. This is problematic when e.g.
>> all memory of a zone has been ballooned out when zonelists are being
>> rebuilt.
>>
>> The decision whether to rebuild the zonelists when onlining new memory
>> is done based on populated_zone() returning 0 for the zone the memory
>> will be added to. The new zone is added to the zonelists only, if it
>> has free memory pages (managed_zone() returns a non-zero value) after
>> the memory has been onlined. This implies, that onlining memory will
>> always free the added pages to the allocator immediately, but this is
>> not true in all cases: when e.g. running as a Xen guest the onlined
>> new memory will be added only to the ballooned memory list, it will be
>> freed only when the guest is being ballooned up afterwards.
>>
>> Another problem with using managed_zone() for the decision whether a
>> zone is being added to the zonelists is, that a zone with all memory
>> used will in fact be removed from all zonelists in case the zonelists
>> happen to be rebuilt.
>>
>> Use populated_zone() when building a zonelist as it has been done
>> before that commit.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator")
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> ---
>> V2:
>> - updated commit message (Michal Hocko)
>> ---
>>   mm/page_alloc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index bdc8f60ae462..3d0662af3289 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6128,7 +6128,7 @@ static int build_zonerefs_node(pg_data_t *pgdat, struct zoneref *zonerefs)
>>   	do {
>>   		zone_type--;
>>   		zone = pgdat->node_zones + zone_type;
>> -		if (managed_zone(zone)) {
>> +		if (populated_zone(zone)) {
>>   			zoneref_set_zone(zone, &zonerefs[nr_zones++]);
>>   			check_highest_zone(zone_type);
>>   		}
> 
> Did you drop my Ack?

Oh, sorry for that.

> Also, I'd appreciate getting CCed on patches where I commented.

Will do in future.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
  2022-04-07 12:06 [PATCH v2] mm, page_alloc: fix build_zonerefs_node() Juergen Gross
  2022-04-07 12:14 ` Michal Hocko
  2022-04-07 12:45 ` David Hildenbrand
@ 2022-04-07 22:44 ` Andrew Morton
  2022-04-08  5:50   ` Juergen Gross
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2022-04-07 22:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: xen-devel, linux-mm, linux-kernel, stable,
	Marek Marczykowski-Górecki, Michal Hocko

On Thu,  7 Apr 2022 14:06:37 +0200 Juergen Gross <jgross@suse.com> wrote:

> Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
> zones with pages managed by the buddy allocator")

Six years ago!

> only zones with free
> memory are included in a built zonelist. This is problematic when e.g.
> all memory of a zone has been ballooned out when zonelists are being
> rebuilt.
> 
> The decision whether to rebuild the zonelists when onlining new memory
> is done based on populated_zone() returning 0 for the zone the memory
> will be added to. The new zone is added to the zonelists only, if it
> has free memory pages (managed_zone() returns a non-zero value) after
> the memory has been onlined. This implies, that onlining memory will
> always free the added pages to the allocator immediately, but this is
> not true in all cases: when e.g. running as a Xen guest the onlined
> new memory will be added only to the ballooned memory list, it will be
> freed only when the guest is being ballooned up afterwards.
> 
> Another problem with using managed_zone() for the decision whether a
> zone is being added to the zonelists is, that a zone with all memory
> used will in fact be removed from all zonelists in case the zonelists
> happen to be rebuilt.
> 
> Use populated_zone() when building a zonelist as it has been done
> before that commit.
> 
> Cc: stable@vger.kernel.org

Some details, please.  Is this really serious enough to warrant
backporting?  Is some new workload/usage pattern causing people to hit
this?


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm, page_alloc: fix build_zonerefs_node()
  2022-04-07 22:44 ` Andrew Morton
@ 2022-04-08  5:50   ` Juergen Gross
  0 siblings, 0 replies; 6+ messages in thread
From: Juergen Gross @ 2022-04-08  5:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: xen-devel, linux-mm, linux-kernel, stable,
	Marek Marczykowski-Górecki, Michal Hocko


[-- Attachment #1.1.1: Type: text/plain, Size: 2004 bytes --]

On 08.04.22 00:44, Andrew Morton wrote:
> On Thu,  7 Apr 2022 14:06:37 +0200 Juergen Gross <jgross@suse.com> wrote:
> 
>> Since commit 6aa303defb74 ("mm, vmscan: only allocate and reclaim from
>> zones with pages managed by the buddy allocator")
> 
> Six years ago!
> 
>> only zones with free
>> memory are included in a built zonelist. This is problematic when e.g.
>> all memory of a zone has been ballooned out when zonelists are being
>> rebuilt.
>>
>> The decision whether to rebuild the zonelists when onlining new memory
>> is done based on populated_zone() returning 0 for the zone the memory
>> will be added to. The new zone is added to the zonelists only, if it
>> has free memory pages (managed_zone() returns a non-zero value) after
>> the memory has been onlined. This implies, that onlining memory will
>> always free the added pages to the allocator immediately, but this is
>> not true in all cases: when e.g. running as a Xen guest the onlined
>> new memory will be added only to the ballooned memory list, it will be
>> freed only when the guest is being ballooned up afterwards.
>>
>> Another problem with using managed_zone() for the decision whether a
>> zone is being added to the zonelists is, that a zone with all memory
>> used will in fact be removed from all zonelists in case the zonelists
>> happen to be rebuilt.
>>
>> Use populated_zone() when building a zonelist as it has been done
>> before that commit.
>>
>> Cc: stable@vger.kernel.org
> 
> Some details, please.  Is this really serious enough to warrant
> backporting?  Is some new workload/usage pattern causing people to hit
> this?

Yes. There was a report that QubesOS (based on Xen) is hitting this
problem. Xen has switched to use the zone device functionality in
kernel 5.9 and QubesOS wants to use memory hotplugging for guests in
order to be able to start a guest with minimal memory and expand it
as needed. This was the report leading to the patch.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-08  5:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 12:06 [PATCH v2] mm, page_alloc: fix build_zonerefs_node() Juergen Gross
2022-04-07 12:14 ` Michal Hocko
2022-04-07 12:45 ` David Hildenbrand
2022-04-07 12:53   ` Juergen Gross
2022-04-07 22:44 ` Andrew Morton
2022-04-08  5:50   ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).