linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND] x86/numa: move setting parsed numa node to num_add_memblk
@ 2017-12-01 10:13 zhong jiang
  2017-12-11 12:03 ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: zhong jiang @ 2017-12-01 10:13 UTC (permalink / raw)
  To: mhocko, iamjoonsoo.kim, mgorman, minchan, vbabka, akpm
  Cc: linux-kernel, linux-mm

The acpi table are very much like user input. it is likely to
introduce some unreasonable node in some architecture. but
they do not ingore the node and bail out in time. it will result
in unnecessary print.
e.g  x86:  start is equal to end is a unreasonable node.
numa_blk_memblk will fails but return 0.

meanwhile, Arm64 node will double set it to "numa_node_parsed"
after NUMA adds a memblk successfully.  but X86 is not. because
numa_add_memblk is not set in X86.

In view of the above problems. I think it need a better improvement.
we add a check here for bypassing the invalid memblk node.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
---
 arch/x86/mm/amdtopology.c | 1 -
 arch/x86/mm/numa.c        | 3 ++-
 drivers/acpi/numa.c       | 5 ++++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
index 91f501b..7657042 100644
--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -151,7 +151,6 @@ int __init amd_numa_init(void)
 
 		prevbase = base;
 		numa_add_memblk(nodeid, base, limit);
-		node_set(nodeid, numa_nodes_parsed);
 	}
 
 	if (!nodes_weight(numa_nodes_parsed))
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 25504d5..8f87f26 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -150,6 +150,8 @@ static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
 	mi->blk[mi->nr_blks].end = end;
 	mi->blk[mi->nr_blks].nid = nid;
 	mi->nr_blks++;
+
+	node_set(nid, numa_nodes_parsed);
 	return 0;
 }
 
@@ -693,7 +695,6 @@ static int __init dummy_numa_init(void)
 	printk(KERN_INFO "Faking a node at [mem %#018Lx-%#018Lx]\n",
 	       0LLU, PFN_PHYS(max_pfn) - 1);
 
-	node_set(0, numa_nodes_parsed);
 	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
 
 	return 0;
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 917f1cc..f2e33cb 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -294,7 +294,9 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 		goto out_err_bad_srat;
 	}
 
-	node_set(node, numa_nodes_parsed);
+	/* some architecture is likely to ignore a unreasonable node */
+	if (!node_isset(node, numa_nodes_parsed))
+		goto out;
 
 	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
 		node, pxm,
@@ -309,6 +311,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
 
 	max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
 
+out:
 	return 0;
 out_err_bad_srat:
 	bad_srat();
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RESEND] x86/numa: move setting parsed numa node to num_add_memblk
  2017-12-01 10:13 [RESEND] x86/numa: move setting parsed numa node to num_add_memblk zhong jiang
@ 2017-12-11 12:03 ` Michal Hocko
  2017-12-11 12:59   ` zhong jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2017-12-11 12:03 UTC (permalink / raw)
  To: zhong jiang
  Cc: iamjoonsoo.kim, mgorman, minchan, vbabka, akpm, linux-kernel, linux-mm

On Fri 01-12-17 18:13:52, zhong jiang wrote:
> The acpi table are very much like user input. it is likely to
> introduce some unreasonable node in some architecture. but
> they do not ingore the node and bail out in time. it will result
> in unnecessary print.
> e.g  x86:  start is equal to end is a unreasonable node.
> numa_blk_memblk will fails but return 0.
> 
> meanwhile, Arm64 node will double set it to "numa_node_parsed"
> after NUMA adds a memblk successfully.  but X86 is not. because
> numa_add_memblk is not set in X86.

I am sorry but I still fail to understand wht the actual problem is.
You said that x86 will print a message. Alright at least you know that
the platform provides a nonsense ACPI/SRAT? tables and you can complain.
But does the kernel misbehave? In what way?

> In view of the above problems. I think it need a better improvement.
> we add a check here for bypassing the invalid memblk node.
> 
> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
> ---
>  arch/x86/mm/amdtopology.c | 1 -
>  arch/x86/mm/numa.c        | 3 ++-
>  drivers/acpi/numa.c       | 5 ++++-
>  3 files changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
> index 91f501b..7657042 100644
> --- a/arch/x86/mm/amdtopology.c
> +++ b/arch/x86/mm/amdtopology.c
> @@ -151,7 +151,6 @@ int __init amd_numa_init(void)
>  
>  		prevbase = base;
>  		numa_add_memblk(nodeid, base, limit);
> -		node_set(nodeid, numa_nodes_parsed);
>  	}
>  
>  	if (!nodes_weight(numa_nodes_parsed))
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index 25504d5..8f87f26 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -150,6 +150,8 @@ static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>  	mi->blk[mi->nr_blks].end = end;
>  	mi->blk[mi->nr_blks].nid = nid;
>  	mi->nr_blks++;
> +
> +	node_set(nid, numa_nodes_parsed);
>  	return 0;
>  }
>  
> @@ -693,7 +695,6 @@ static int __init dummy_numa_init(void)
>  	printk(KERN_INFO "Faking a node at [mem %#018Lx-%#018Lx]\n",
>  	       0LLU, PFN_PHYS(max_pfn) - 1);
>  
> -	node_set(0, numa_nodes_parsed);
>  	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>  
>  	return 0;
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 917f1cc..f2e33cb 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -294,7 +294,9 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>  		goto out_err_bad_srat;
>  	}
>  
> -	node_set(node, numa_nodes_parsed);
> +	/* some architecture is likely to ignore a unreasonable node */
> +	if (!node_isset(node, numa_nodes_parsed))
> +		goto out;
>  
>  	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
>  		node, pxm,
> @@ -309,6 +311,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>  
>  	max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
>  
> +out:
>  	return 0;
>  out_err_bad_srat:
>  	bad_srat();
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND] x86/numa: move setting parsed numa node to num_add_memblk
  2017-12-11 12:03 ` Michal Hocko
@ 2017-12-11 12:59   ` zhong jiang
  2017-12-11 13:45     ` Michal Hocko
  0 siblings, 1 reply; 5+ messages in thread
From: zhong jiang @ 2017-12-11 12:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: iamjoonsoo.kim, mgorman, minchan, vbabka, akpm, linux-kernel, linux-mm

On 2017/12/11 20:03, Michal Hocko wrote:
> On Fri 01-12-17 18:13:52, zhong jiang wrote:
>> The acpi table are very much like user input. it is likely to
>> introduce some unreasonable node in some architecture. but
>> they do not ingore the node and bail out in time. it will result
>> in unnecessary print.
>> e.g  x86:  start is equal to end is a unreasonable node.
>> numa_blk_memblk will fails but return 0.
>>
>> meanwhile, Arm64 node will double set it to "numa_node_parsed"
>> after NUMA adds a memblk successfully.  but X86 is not. because
>> numa_add_memblk is not set in X86.
> I am sorry but I still fail to understand wht the actual problem is.
> You said that x86 will print a message. Alright at least you know that
> the platform provides a nonsense ACPI/SRAT? tables and you can complain.
> But does the kernel misbehave? In what way?
  From the view of  the following code , we should expect that the node is reasonable.
  otherwise, if we only want to complain,  it should bail out in time after printing the
  unreasonable message.

          node_set(node, numa_nodes_parsed);

        pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
                node, pxm,
                (unsigned long long) start, (unsigned long long) end - 1,
                hotpluggable ? " hotplug" : "",
                ma->flags & ACPI_SRAT_MEM_NON_VOLATILE ? " non-volatile" : "");

        /* Mark hotplug range in memblock. */
        if (hotpluggable && memblock_mark_hotplug(start, ma->length))
                pr_warn("SRAT: Failed to mark hotplug range [mem %#010Lx-%#010Lx] in memblock\n",
                        (unsigned long long)start, (unsigned long long)end - 1);

        max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));

        return 0;
out_err_bad_srat:
        bad_srat();

 In addition.  Arm64  will double set node to numa_nodes_parsed after add a memblk
successfully.  Because numa_add_memblk will perform node_set(*, *).

         if (numa_add_memblk(node, start, end) < 0) {
                pr_err("SRAT: Failed to add memblk to node %u [mem %#010Lx-%#010Lx]\n",
                       node, (unsigned long long) start,
                       (unsigned long long) end - 1);
                goto out_err_bad_srat;
        }

        node_set(node, numa_nodes_parsed);

Thanks
zhong jiang
>> In view of the above problems. I think it need a better improvement.
>> we add a check here for bypassing the invalid memblk node.
>>
>> Signed-off-by: zhong jiang <zhongjiang@huawei.com>
>> ---
>>  arch/x86/mm/amdtopology.c | 1 -
>>  arch/x86/mm/numa.c        | 3 ++-
>>  drivers/acpi/numa.c       | 5 ++++-
>>  3 files changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/mm/amdtopology.c b/arch/x86/mm/amdtopology.c
>> index 91f501b..7657042 100644
>> --- a/arch/x86/mm/amdtopology.c
>> +++ b/arch/x86/mm/amdtopology.c
>> @@ -151,7 +151,6 @@ int __init amd_numa_init(void)
>>  
>>  		prevbase = base;
>>  		numa_add_memblk(nodeid, base, limit);
>> -		node_set(nodeid, numa_nodes_parsed);
>>  	}
>>  
>>  	if (!nodes_weight(numa_nodes_parsed))
>> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
>> index 25504d5..8f87f26 100644
>> --- a/arch/x86/mm/numa.c
>> +++ b/arch/x86/mm/numa.c
>> @@ -150,6 +150,8 @@ static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
>>  	mi->blk[mi->nr_blks].end = end;
>>  	mi->blk[mi->nr_blks].nid = nid;
>>  	mi->nr_blks++;
>> +
>> +	node_set(nid, numa_nodes_parsed);
>>  	return 0;
>>  }
>>  
>> @@ -693,7 +695,6 @@ static int __init dummy_numa_init(void)
>>  	printk(KERN_INFO "Faking a node at [mem %#018Lx-%#018Lx]\n",
>>  	       0LLU, PFN_PHYS(max_pfn) - 1);
>>  
>> -	node_set(0, numa_nodes_parsed);
>>  	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>>  
>>  	return 0;
>> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
>> index 917f1cc..f2e33cb 100644
>> --- a/drivers/acpi/numa.c
>> +++ b/drivers/acpi/numa.c
>> @@ -294,7 +294,9 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>  		goto out_err_bad_srat;
>>  	}
>>  
>> -	node_set(node, numa_nodes_parsed);
>> +	/* some architecture is likely to ignore a unreasonable node */
>> +	if (!node_isset(node, numa_nodes_parsed))
>> +		goto out;
>>  
>>  	pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
>>  		node, pxm,
>> @@ -309,6 +311,7 @@ void __init acpi_numa_slit_init(struct acpi_table_slit *slit)
>>  
>>  	max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
>>  
>> +out:
>>  	return 0;
>>  out_err_bad_srat:
>>  	bad_srat();
>> -- 
>> 1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND] x86/numa: move setting parsed numa node to num_add_memblk
  2017-12-11 12:59   ` zhong jiang
@ 2017-12-11 13:45     ` Michal Hocko
  2017-12-12  6:52       ` zhong jiang
  0 siblings, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2017-12-11 13:45 UTC (permalink / raw)
  To: zhong jiang
  Cc: iamjoonsoo.kim, mgorman, minchan, vbabka, akpm, linux-kernel, linux-mm

On Mon 11-12-17 20:59:29, zhong jiang wrote:
> On 2017/12/11 20:03, Michal Hocko wrote:
> > On Fri 01-12-17 18:13:52, zhong jiang wrote:
> >> The acpi table are very much like user input. it is likely to
> >> introduce some unreasonable node in some architecture. but
> >> they do not ingore the node and bail out in time. it will result
> >> in unnecessary print.
> >> e.g  x86:  start is equal to end is a unreasonable node.
> >> numa_blk_memblk will fails but return 0.
> >>
> >> meanwhile, Arm64 node will double set it to "numa_node_parsed"
> >> after NUMA adds a memblk successfully.  but X86 is not. because
> >> numa_add_memblk is not set in X86.
> > I am sorry but I still fail to understand wht the actual problem is.
> > You said that x86 will print a message. Alright at least you know that
> > the platform provides a nonsense ACPI/SRAT? tables and you can complain.
> > But does the kernel misbehave? In what way?
>   From the view of  the following code , we should expect that the node is reasonable.
>   otherwise, if we only want to complain,  it should bail out in time after printing the
>   unreasonable message.
> 
>           node_set(node, numa_nodes_parsed);
> 
>         pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
>                 node, pxm,
>                 (unsigned long long) start, (unsigned long long) end - 1,
>                 hotpluggable ? " hotplug" : "",
>                 ma->flags & ACPI_SRAT_MEM_NON_VOLATILE ? " non-volatile" : "");
> 
>         /* Mark hotplug range in memblock. */
>         if (hotpluggable && memblock_mark_hotplug(start, ma->length))
>                 pr_warn("SRAT: Failed to mark hotplug range [mem %#010Lx-%#010Lx] in memblock\n",
>                         (unsigned long long)start, (unsigned long long)end - 1);
> 
>         max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
> 
>         return 0;
> out_err_bad_srat:
>         bad_srat();
> 
>  In addition.  Arm64  will double set node to numa_nodes_parsed after add a memblk
> successfully.  Because numa_add_memblk will perform node_set(*, *).
> 
>          if (numa_add_memblk(node, start, end) < 0) {
>                 pr_err("SRAT: Failed to add memblk to node %u [mem %#010Lx-%#010Lx]\n",
>                        node, (unsigned long long) start,
>                        (unsigned long long) end - 1);
>                 goto out_err_bad_srat;
>         }
> 
>         node_set(node, numa_nodes_parsed);

I am sorry but I _do not_ understand how this answers my simple
question. You are describing the code flow which doesn't really explain
what is the _user_ or a _runtime_ visible effect. Anybody reading this
changelog will have to scratch his head to understand what the heck does
this fix and whether the patch needs to be considered for backporting.
See my point?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RESEND] x86/numa: move setting parsed numa node to num_add_memblk
  2017-12-11 13:45     ` Michal Hocko
@ 2017-12-12  6:52       ` zhong jiang
  0 siblings, 0 replies; 5+ messages in thread
From: zhong jiang @ 2017-12-12  6:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: iamjoonsoo.kim, mgorman, minchan, vbabka, akpm, linux-kernel, linux-mm

On 2017/12/11 21:45, Michal Hocko wrote:
> On Mon 11-12-17 20:59:29, zhong jiang wrote:
>> On 2017/12/11 20:03, Michal Hocko wrote:
>>> On Fri 01-12-17 18:13:52, zhong jiang wrote:
>>>> The acpi table are very much like user input. it is likely to
>>>> introduce some unreasonable node in some architecture. but
>>>> they do not ingore the node and bail out in time. it will result
>>>> in unnecessary print.
>>>> e.g  x86:  start is equal to end is a unreasonable node.
>>>> numa_blk_memblk will fails but return 0.
>>>>
>>>> meanwhile, Arm64 node will double set it to "numa_node_parsed"
>>>> after NUMA adds a memblk successfully.  but X86 is not. because
>>>> numa_add_memblk is not set in X86.
>>> I am sorry but I still fail to understand wht the actual problem is.
>>> You said that x86 will print a message. Alright at least you know that
>>> the platform provides a nonsense ACPI/SRAT? tables and you can complain.
>>> But does the kernel misbehave? In what way?
>>   From the view of  the following code , we should expect that the node is reasonable.
>>   otherwise, if we only want to complain,  it should bail out in time after printing the
>>   unreasonable message.
>>
>>           node_set(node, numa_nodes_parsed);
>>
>>         pr_info("SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]%s%s\n",
>>                 node, pxm,
>>                 (unsigned long long) start, (unsigned long long) end - 1,
>>                 hotpluggable ? " hotplug" : "",
>>                 ma->flags & ACPI_SRAT_MEM_NON_VOLATILE ? " non-volatile" : "");
>>
>>         /* Mark hotplug range in memblock. */
>>         if (hotpluggable && memblock_mark_hotplug(start, ma->length))
>>                 pr_warn("SRAT: Failed to mark hotplug range [mem %#010Lx-%#010Lx] in memblock\n",
>>                         (unsigned long long)start, (unsigned long long)end - 1);
>>
>>         max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));
>>
>>         return 0;
>> out_err_bad_srat:
>>         bad_srat();
>>
>>  In addition.  Arm64  will double set node to numa_nodes_parsed after add a memblk
>> successfully.  Because numa_add_memblk will perform node_set(*, *).
>>
>>          if (numa_add_memblk(node, start, end) < 0) {
>>                 pr_err("SRAT: Failed to add memblk to node %u [mem %#010Lx-%#010Lx]\n",
>>                        node, (unsigned long long) start,
>>                        (unsigned long long) end - 1);
>>                 goto out_err_bad_srat;
>>         }
>>
>>         node_set(node, numa_nodes_parsed);
> I am sorry but I _do not_ understand how this answers my simple
> question. You are describing the code flow which doesn't really explain
> what is the _user_ or a _runtime_ visible effect. Anybody reading this
> changelog will have to scratch his head to understand what the heck does
> this fix and whether the patch needs to be considered for backporting.
> See my point?
 There  is not any visible effect to the user.  IMO,  it is  a better optimization.
 Maybe I put more words  to explain  how  the patch works.  :-[

 I found the code is messy when reading it without a real issue. 

 Thanks
 zhong jiang
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-12-12  6:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 10:13 [RESEND] x86/numa: move setting parsed numa node to num_add_memblk zhong jiang
2017-12-11 12:03 ` Michal Hocko
2017-12-11 12:59   ` zhong jiang
2017-12-11 13:45     ` Michal Hocko
2017-12-12  6:52       ` zhong jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).