linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] hugetlb: Fix confusing behavior
@ 2022-04-01 10:12 Peng Liu
  2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
  2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
  0 siblings, 2 replies; 11+ messages in thread
From: Peng Liu @ 2022-04-01 10:12 UTC (permalink / raw)
  To: mike.kravetz, akpm, yaozhenguo1, linux-mm, linux-kernel, stable
  Cc: liupeng256

When config pernode, the behavior of 1G hugepage is different
from 2M hugepage, patch 1 will fix this. To avoid printing
useless information, __setup() is modified to always return'1'.

v1->v2:
  Split "return 1" into a single patch as suggested by Mike.

Peng Liu (2):
  hugetlb: Fix hugepages_setup when deal with pernode
  hugetlb: Fix return value of __setup handlers

 mm/hugetlb.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.18.0.huawei.25



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-01 10:12 [PATCH v2 0/2] hugetlb: Fix confusing behavior Peng Liu
@ 2022-04-01 10:12 ` Peng Liu
  2022-04-01 10:43   ` David Hildenbrand
  2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Peng Liu @ 2022-04-01 10:12 UTC (permalink / raw)
  To: mike.kravetz, akpm, yaozhenguo1, linux-mm, linux-kernel, stable
  Cc: liupeng256

Hugepages can be specified to pernode since "hugetlbfs: extend
the definition of hugepages parameter to support node allocation",
but the following problem is observed.

Confusing behavior is observed when both 1G and 2M hugepage is set
after "numa=off".
 cmdline hugepage settings:
  hugepagesz=1G hugepages=0:3,1:3
  hugepagesz=2M hugepages=0:1024,1:1024
 results:
  HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
  HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages

Furthermore, confusing behavior can be also observed when invalid
node behind valid node.

To fix this, hugetlb_hstate_alloc_pages should be called even when
hugepages_setup going to invalid.

Cc: <stable@vger.kernel.org>
Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
---
 mm/hugetlb.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b34f50156f7e..9cd746432ca9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4131,6 +4131,7 @@ static int __init hugepages_setup(char *s)
 	int count;
 	unsigned long tmp;
 	char *p = s;
+	int ret = 1;
 
 	if (!parsed_valid_hugepagesz) {
 		pr_warn("HugeTLB: hugepages=%s does not follow a valid hugepagesz, ignoring\n", s);
@@ -4189,6 +4190,7 @@ static int __init hugepages_setup(char *s)
 		}
 	}
 
+out:
 	/*
 	 * Global state is always initialized later in hugetlb_init.
 	 * But we need to allocate gigantic hstates here early to still
@@ -4199,11 +4201,12 @@ static int __init hugepages_setup(char *s)
 
 	last_mhp = mhp;
 
-	return 1;
+	return ret;
 
 invalid:
 	pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p);
-	return 0;
+	ret = 0;
+	goto out;
 }
 __setup("hugepages=", hugepages_setup);
 
-- 
2.18.0.huawei.25



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers
  2022-04-01 10:12 [PATCH v2 0/2] hugetlb: Fix confusing behavior Peng Liu
  2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
@ 2022-04-01 10:12 ` Peng Liu
  2022-04-01 10:46   ` David Hildenbrand
  1 sibling, 1 reply; 11+ messages in thread
From: Peng Liu @ 2022-04-01 10:12 UTC (permalink / raw)
  To: mike.kravetz, akpm, yaozhenguo1, linux-mm, linux-kernel, stable
  Cc: liupeng256

When __setup() return '0', using invalid option values causes the
entire kernel boot option string to be reported as Unknown. Hugetlb
calls __setup() and will return '0' when set invalid parameter
string.

The following phenomenon is observed:
 cmdline:
  hugepagesz=1Y hugepages=1
 dmesg:
  HugeTLB: unsupported hugepagesz=1Y
  HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
  Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"

Since hugetlb will print warn or error information before return for
invalid parameter string, just use return '1' to avoid print again.

Signed-off-by: Peng Liu <liupeng256@huawei.com>
---
 mm/hugetlb.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9cd746432ca9..6dde34c115c9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4131,12 +4131,11 @@ static int __init hugepages_setup(char *s)
 	int count;
 	unsigned long tmp;
 	char *p = s;
-	int ret = 1;
 
 	if (!parsed_valid_hugepagesz) {
 		pr_warn("HugeTLB: hugepages=%s does not follow a valid hugepagesz, ignoring\n", s);
 		parsed_valid_hugepagesz = true;
-		return 0;
+		return 1;
 	}
 
 	/*
@@ -4152,7 +4151,7 @@ static int __init hugepages_setup(char *s)
 
 	if (mhp == last_mhp) {
 		pr_warn("HugeTLB: hugepages= specified twice without interleaving hugepagesz=, ignoring hugepages=%s\n", s);
-		return 0;
+		return 1;
 	}
 
 	while (*p) {
@@ -4163,7 +4162,7 @@ static int __init hugepages_setup(char *s)
 		if (p[count] == ':') {
 			if (!hugetlb_node_alloc_supported()) {
 				pr_warn("HugeTLB: architecture can't support node specific alloc, ignoring!\n");
-				return 0;
+				return 1;
 			}
 			if (tmp >= nr_online_nodes)
 				goto invalid;
@@ -4201,11 +4200,10 @@ static int __init hugepages_setup(char *s)
 
 	last_mhp = mhp;
 
-	return ret;
+	return 1;
 
 invalid:
 	pr_warn("HugeTLB: Invalid hugepages parameter %s\n", p);
-	ret = 0;
 	goto out;
 }
 __setup("hugepages=", hugepages_setup);
@@ -4227,7 +4225,7 @@ static int __init hugepagesz_setup(char *s)
 
 	if (!arch_hugetlb_valid_size(size)) {
 		pr_err("HugeTLB: unsupported hugepagesz=%s\n", s);
-		return 0;
+		return 1;
 	}
 
 	h = size_to_hstate(size);
@@ -4242,7 +4240,7 @@ static int __init hugepagesz_setup(char *s)
 		if (!parsed_default_hugepagesz ||  h != &default_hstate ||
 		    default_hstate.max_huge_pages) {
 			pr_warn("HugeTLB: hugepagesz=%s specified twice, ignoring\n", s);
-			return 0;
+			return 1;
 		}
 
 		/*
@@ -4273,14 +4271,14 @@ static int __init default_hugepagesz_setup(char *s)
 	parsed_valid_hugepagesz = false;
 	if (parsed_default_hugepagesz) {
 		pr_err("HugeTLB: default_hugepagesz previously specified, ignoring %s\n", s);
-		return 0;
+		return 1;
 	}
 
 	size = (unsigned long)memparse(s, NULL);
 
 	if (!arch_hugetlb_valid_size(size)) {
 		pr_err("HugeTLB: unsupported default_hugepagesz=%s\n", s);
-		return 0;
+		return 1;
 	}
 
 	hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT);
-- 
2.18.0.huawei.25



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
@ 2022-04-01 10:43   ` David Hildenbrand
  2022-04-01 17:23     ` Mike Kravetz
  2022-04-02  2:36     ` liupeng (DM)
  0 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2022-04-01 10:43 UTC (permalink / raw)
  To: Peng Liu, mike.kravetz, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

On 01.04.22 12:12, Peng Liu wrote:
> Hugepages can be specified to pernode since "hugetlbfs: extend
> the definition of hugepages parameter to support node allocation",
> but the following problem is observed.
> 
> Confusing behavior is observed when both 1G and 2M hugepage is set
> after "numa=off".
>  cmdline hugepage settings:
>   hugepagesz=1G hugepages=0:3,1:3
>   hugepagesz=2M hugepages=0:1024,1:1024
>  results:
>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
> 
> Furthermore, confusing behavior can be also observed when invalid
> node behind valid node.
> 
> To fix this, hugetlb_hstate_alloc_pages should be called even when
> hugepages_setup going to invalid.

Shouldn't we bail out if someone requests node-specific allocations but
we are not running with NUMA?

What's the result after your change?

> 
> Cc: <stable@vger.kernel.org>

I am not sure if this is really stable material.

> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
> Signed-off-by: Peng Liu <liupeng256@huawei.com>

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers
  2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
@ 2022-04-01 10:46   ` David Hildenbrand
  2022-04-02  1:33     ` liupeng (DM)
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-04-01 10:46 UTC (permalink / raw)
  To: Peng Liu, mike.kravetz, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

On 01.04.22 12:12, Peng Liu wrote:
> When __setup() return '0', using invalid option values causes the
> entire kernel boot option string to be reported as Unknown. Hugetlb
> calls __setup() and will return '0' when set invalid parameter
> string.
> 
> The following phenomenon is observed:
>  cmdline:
>   hugepagesz=1Y hugepages=1
>  dmesg:
>   HugeTLB: unsupported hugepagesz=1Y
>   HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
>   Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"
> 
> Since hugetlb will print warn or error information before return for
> invalid parameter string, just use return '1' to avoid print again.
> 
> Signed-off-by: Peng Liu <liupeng256@huawei.com>
> ---
>  mm/hugetlb.c | 18 ++++++++----------
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9cd746432ca9..6dde34c115c9 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4131,12 +4131,11 @@ static int __init hugepages_setup(char *s)
>  	int count;
>  	unsigned long tmp;
>  	char *p = s;
> -	int ret = 1;

Adding this in #1 to remove it in #2 is a bit sub-optimal IMHO.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-01 10:43   ` David Hildenbrand
@ 2022-04-01 17:23     ` Mike Kravetz
  2022-04-04 10:41       ` David Hildenbrand
  2022-04-02  2:36     ` liupeng (DM)
  1 sibling, 1 reply; 11+ messages in thread
From: Mike Kravetz @ 2022-04-01 17:23 UTC (permalink / raw)
  To: David Hildenbrand, Peng Liu, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

On 4/1/22 03:43, David Hildenbrand wrote:
> On 01.04.22 12:12, Peng Liu wrote:
>> Hugepages can be specified to pernode since "hugetlbfs: extend
>> the definition of hugepages parameter to support node allocation",
>> but the following problem is observed.
>>
>> Confusing behavior is observed when both 1G and 2M hugepage is set
>> after "numa=off".
>>  cmdline hugepage settings:
>>   hugepagesz=1G hugepages=0:3,1:3
>>   hugepagesz=2M hugepages=0:1024,1:1024
>>  results:
>>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>
>> Furthermore, confusing behavior can be also observed when invalid
>> node behind valid node.
>>
>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>> hugepages_setup going to invalid.
> 
> Shouldn't we bail out if someone requests node-specific allocations but
> we are not running with NUMA?

I thought about this as well, and could not come up with a good answer.
Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
allocations on the same command line.  I would have no problem bailing out
in such situations.  But, I think that would also require the hugetlb command
line processing to look for such situations.

One could also argue that if there is only a single node (not numa=off on
command line) and someone specifies node local allocations we should bail.

I was 'thinking' about a situation where we had multiple nodes and node
local allocations were 'hard coded' via grub or something.  Then, for some
reason one node fails to come up on a reboot.  Should we bail on all the
hugetlb allocations, or should we try to allocate on the still available
nodes?

When I went back and reread the reason for this change, I see that it is
primarily for 'some debugging and test cases'.

> 
> What's the result after your change?
> 
>>
>> Cc: <stable@vger.kernel.org>
> 
> I am not sure if this is really stable material.

Right now, we partially and inconsistently process node specific allocations
if there are missing nodes.  We allocate 'regular' hugetlb pages on existing
nodes.  But, we do not allocate gigantic hugetlb pages on existing nodes.

I believe this is worth fixing in stable.

Since the behavior for missing nodes was not really spelled out when node
specific allocations were introduced, I think an acceptable stable fix could
be to bail.

In any case, I think we need to do something.

> 
>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
> 

-- 
Mike Kravetz


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers
  2022-04-01 10:46   ` David Hildenbrand
@ 2022-04-02  1:33     ` liupeng (DM)
  2022-04-04 10:25       ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: liupeng (DM) @ 2022-04-02  1:33 UTC (permalink / raw)
  To: David Hildenbrand, mike.kravetz, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]


On 2022/4/1 18:46, David Hildenbrand wrote:
> On 01.04.22 12:12, Peng Liu wrote:
>> When __setup() return '0', using invalid option values causes the
>> entire kernel boot option string to be reported as Unknown. Hugetlb
>> calls __setup() and will return '0' when set invalid parameter
>> string.
>>
>> The following phenomenon is observed:
>>   cmdline:
>>    hugepagesz=1Y hugepages=1
>>   dmesg:
>>    HugeTLB: unsupported hugepagesz=1Y
>>    HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
>>    Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"
>>
>> Since hugetlb will print warn or error information before return for
>> invalid parameter string, just use return '1' to avoid print again.
>>
>> Signed-off-by: Peng Liu<liupeng256@huawei.com>
>> ---
>>   mm/hugetlb.c | 18 ++++++++----------
>>   1 file changed, 8 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 9cd746432ca9..6dde34c115c9 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -4131,12 +4131,11 @@ static int __init hugepages_setup(char *s)
>>   	int count;
>>   	unsigned long tmp;
>>   	char *p = s;
>> -	int ret = 1;
> Adding this in #1 to remove it in #2 is a bit sub-optimal IMHO.
>
For #2, which is not necessary for stable, #1 may be needed for stable, 
this is why we split #2 into a single patch.

[-- Attachment #2: Type: text/html, Size: 1888 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-01 10:43   ` David Hildenbrand
  2022-04-01 17:23     ` Mike Kravetz
@ 2022-04-02  2:36     ` liupeng (DM)
  1 sibling, 0 replies; 11+ messages in thread
From: liupeng (DM) @ 2022-04-02  2:36 UTC (permalink / raw)
  To: David Hildenbrand, mike.kravetz, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable


On 2022/4/1 18:43, David Hildenbrand wrote:
> On 01.04.22 12:12, Peng Liu wrote:
>> Hugepages can be specified to pernode since "hugetlbfs: extend
>> the definition of hugepages parameter to support node allocation",
>> but the following problem is observed.
>>
>> Confusing behavior is observed when both 1G and 2M hugepage is set
>> after "numa=off".
>>   cmdline hugepage settings:
>>    hugepagesz=1G hugepages=0:3,1:3
>>    hugepagesz=2M hugepages=0:1024,1:1024
>>   results:
>>    HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>    HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>
>> Furthermore, confusing behavior can be also observed when invalid
>> node behind valid node.
>>
>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>> hugepages_setup going to invalid.
> Shouldn't we bail out if someone requests node-specific allocations but
> we are not running with NUMA?
>
> What's the result after your change?
>
>> Cc: <stable@vger.kernel.org>
> I am not sure if this is really stable material.

This change will make 1G-huge-page consistent with 2M-huge-page when
an invalid node is configured. After this patch, all per node huge pages
will allocate until an invalid node.

Thus, the basic question is "what will lead to an invalid node".
1) Some debugging and test cases as Mike suggested.
2) When part of physical memory or cpu is broken and bios not report
the node with physical damage, but still use the original grub.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers
  2022-04-02  1:33     ` liupeng (DM)
@ 2022-04-04 10:25       ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2022-04-04 10:25 UTC (permalink / raw)
  To: liupeng (DM),
	mike.kravetz, akpm, yaozhenguo1, linux-mm, linux-kernel, stable

On 02.04.22 03:33, liupeng (DM) wrote:
> 
> On 2022/4/1 18:46, David Hildenbrand wrote:
>> On 01.04.22 12:12, Peng Liu wrote:
>>> When __setup() return '0', using invalid option values causes the
>>> entire kernel boot option string to be reported as Unknown. Hugetlb
>>> calls __setup() and will return '0' when set invalid parameter
>>> string.
>>>
>>> The following phenomenon is observed:
>>>  cmdline:
>>>   hugepagesz=1Y hugepages=1
>>>  dmesg:
>>>   HugeTLB: unsupported hugepagesz=1Y
>>>   HugeTLB: hugepages=1 does not follow a valid hugepagesz, ignoring
>>>   Unknown kernel command line parameters "hugepagesz=1Y hugepages=1"
>>>
>>> Since hugetlb will print warn or error information before return for
>>> invalid parameter string, just use return '1' to avoid print again.
>>>
>>> Signed-off-by: Peng Liu <liupeng256@huawei.com>
>>> ---
>>>  mm/hugetlb.c | 18 ++++++++----------
>>>  1 file changed, 8 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 9cd746432ca9..6dde34c115c9 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4131,12 +4131,11 @@ static int __init hugepages_setup(char *s)
>>>  	int count;
>>>  	unsigned long tmp;
>>>  	char *p = s;
>>> -	int ret = 1;
>> Adding this in #1 to remove it in #2 is a bit sub-optimal IMHO.
>>
> For #2, which is not necessary for stable, #1 may be needed for stable,
> this is why we split #2 into a single patch.
> 

Again, I don't think #1 is stable material, sorry.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-01 17:23     ` Mike Kravetz
@ 2022-04-04 10:41       ` David Hildenbrand
  2022-04-04 23:48         ` Mike Kravetz
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2022-04-04 10:41 UTC (permalink / raw)
  To: Mike Kravetz, Peng Liu, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

On 01.04.22 19:23, Mike Kravetz wrote:
> On 4/1/22 03:43, David Hildenbrand wrote:
>> On 01.04.22 12:12, Peng Liu wrote:
>>> Hugepages can be specified to pernode since "hugetlbfs: extend
>>> the definition of hugepages parameter to support node allocation",
>>> but the following problem is observed.
>>>
>>> Confusing behavior is observed when both 1G and 2M hugepage is set
>>> after "numa=off".
>>>  cmdline hugepage settings:
>>>   hugepagesz=1G hugepages=0:3,1:3
>>>   hugepagesz=2M hugepages=0:1024,1:1024
>>>  results:
>>>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>>
>>> Furthermore, confusing behavior can be also observed when invalid
>>> node behind valid node.
>>>
>>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>>> hugepages_setup going to invalid.
>>
>> Shouldn't we bail out if someone requests node-specific allocations but
>> we are not running with NUMA?
> 
> I thought about this as well, and could not come up with a good answer.
> Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
> allocations on the same command line.  I would have no problem bailing out
> in such situations.  But, I think that would also require the hugetlb command
> line processing to look for such situations.

Yes. Right now I see

if (tmp >= nr_online_nodes)
	goto invalid;

Which seems a little strange, because IIUC, it's the number of online
nodes, which is completely wrong with a sparse online bitmap. Just
imagine node 0 and node 2 are online, and node 1 is offline. Assuming
that "node < 2" is valid is wrong.

Why don't we check for node_online() and bail out if that is not the
case? Is it too early for that check? But why does comparing against
nr_online_nodes() work, then?


Having that said, I'm not sure if all usage of nr_online_nodes in
mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that,
it's really just used for "nr_online_nodes > 1". I might be wrong, though.

> 
> One could also argue that if there is only a single node (not numa=off on
> command line) and someone specifies node local allocations we should bail.

I assume "numa=off" is always parsed before hugepages_setup() is called,
right? So we can just rely on the actual numa information.


> 
> I was 'thinking' about a situation where we had multiple nodes and node
> local allocations were 'hard coded' via grub or something.  Then, for some
> reason one node fails to come up on a reboot.  Should we bail on all the
> hugetlb allocations, or should we try to allocate on the still available
> nodes?

Depends on what "bail" means. Printing a warning and stopping to
allocate further is certainly good enough for my taste :)

> 
> When I went back and reread the reason for this change, I see that it is
> primarily for 'some debugging and test cases'.
> 
>>
>> What's the result after your change?
>>
>>>
>>> Cc: <stable@vger.kernel.org>
>>
>> I am not sure if this is really stable material.
> 
> Right now, we partially and inconsistently process node specific allocations
> if there are missing nodes.  We allocate 'regular' hugetlb pages on existing
> nodes.  But, we do not allocate gigantic hugetlb pages on existing nodes.
> 
> I believe this is worth fixing in stable.

I am skeptical.

https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst

" - It must fix a real bug that bothers people (not a, "This could be a
   problem..." type thing)."

While the current behavior is suboptimal, it's certainly not an urgent
bug (?) and the kernel will boot and work just fine. As you mentioned
"nobody SHOULD specify both 'numa=off' and ask for node specific
allocations on the same command line.", this is just a corner case.

Adjusting it upstream -- okay. Backporting to stable? I don't think so.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode
  2022-04-04 10:41       ` David Hildenbrand
@ 2022-04-04 23:48         ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2022-04-04 23:48 UTC (permalink / raw)
  To: David Hildenbrand, Peng Liu, akpm, yaozhenguo1, linux-mm,
	linux-kernel, stable

On 4/4/22 03:41, David Hildenbrand wrote:
> On 01.04.22 19:23, Mike Kravetz wrote:
>> On 4/1/22 03:43, David Hildenbrand wrote:
>>> On 01.04.22 12:12, Peng Liu wrote:
>>>> Hugepages can be specified to pernode since "hugetlbfs: extend
>>>> the definition of hugepages parameter to support node allocation",
>>>> but the following problem is observed.
>>>>
>>>> Confusing behavior is observed when both 1G and 2M hugepage is set
>>>> after "numa=off".
>>>>  cmdline hugepage settings:
>>>>   hugepagesz=1G hugepages=0:3,1:3
>>>>   hugepagesz=2M hugepages=0:1024,1:1024
>>>>  results:
>>>>   HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>>>>   HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>>>
>>>> Furthermore, confusing behavior can be also observed when invalid
>>>> node behind valid node.
>>>>
>>>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>>>> hugepages_setup going to invalid.
>>>
>>> Shouldn't we bail out if someone requests node-specific allocations but
>>> we are not running with NUMA?
>>
>> I thought about this as well, and could not come up with a good answer.
>> Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
>> allocations on the same command line.  I would have no problem bailing out
>> in such situations.  But, I think that would also require the hugetlb command
>> line processing to look for such situations.
> 
> Yes. Right now I see
> 
> if (tmp >= nr_online_nodes)
> 	goto invalid;
> 
> Which seems a little strange, because IIUC, it's the number of online
> nodes, which is completely wrong with a sparse online bitmap. Just
> imagine node 0 and node 2 are online, and node 1 is offline. Assuming
> that "node < 2" is valid is wrong.
> 
> Why don't we check for node_online() and bail out if that is not the
> case? Is it too early for that check? But why does comparing against
> nr_online_nodes() work, then?
> 
> 
> Having that said, I'm not sure if all usage of nr_online_nodes in
> mm/hugetlb.c is wrong, with a sparse online bitmap. Outside of that,
> it's really just used for "nr_online_nodes > 1". I might be wrong, though.

I think you are correct.  My bad for not being more thorough in reviewing
the original patch that added this code.  My incorrect assumption was that
a sparse node map was only possible via offline operations which could not
happen this early in boot.  I now see that a sparse map can be presented
by fw/bios/etc.  So, yes I do believe we need to check for online nodes.

-- 
Mike Kravetz

> 
>>
>> One could also argue that if there is only a single node (not numa=off on
>> command line) and someone specifies node local allocations we should bail.
> 
> I assume "numa=off" is always parsed before hugepages_setup() is called,
> right? So we can just rely on the actual numa information.
> 
> 
>>
>> I was 'thinking' about a situation where we had multiple nodes and node
>> local allocations were 'hard coded' via grub or something.  Then, for some
>> reason one node fails to come up on a reboot.  Should we bail on all the
>> hugetlb allocations, or should we try to allocate on the still available
>> nodes?
> 
> Depends on what "bail" means. Printing a warning and stopping to
> allocate further is certainly good enough for my taste :)
> 
>>
>> When I went back and reread the reason for this change, I see that it is
>> primarily for 'some debugging and test cases'.
>>
>>>
>>> What's the result after your change?
>>>
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>
>>> I am not sure if this is really stable material.
>>
>> Right now, we partially and inconsistently process node specific allocations
>> if there are missing nodes.  We allocate 'regular' hugetlb pages on existing
>> nodes.  But, we do not allocate gigantic hugetlb pages on existing nodes.
>>
>> I believe this is worth fixing in stable.
> 
> I am skeptical.
> 
> https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst
> 
> " - It must fix a real bug that bothers people (not a, "This could be a
>    problem..." type thing)."
> 
> While the current behavior is suboptimal, it's certainly not an urgent
> bug (?) and the kernel will boot and work just fine. As you mentioned
> "nobody SHOULD specify both 'numa=off' and ask for node specific
> allocations on the same command line.", this is just a corner case.
> 
> Adjusting it upstream -- okay. Backporting to stable? I don't think so.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-04 23:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 10:12 [PATCH v2 0/2] hugetlb: Fix confusing behavior Peng Liu
2022-04-01 10:12 ` [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode Peng Liu
2022-04-01 10:43   ` David Hildenbrand
2022-04-01 17:23     ` Mike Kravetz
2022-04-04 10:41       ` David Hildenbrand
2022-04-04 23:48         ` Mike Kravetz
2022-04-02  2:36     ` liupeng (DM)
2022-04-01 10:12 ` [PATCH v2 2/2] hugetlb: Fix return value of __setup handlers Peng Liu
2022-04-01 10:46   ` David Hildenbrand
2022-04-02  1:33     ` liupeng (DM)
2022-04-04 10:25       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).