linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
@ 2021-03-18 13:06 Valentin Schneider
  2021-03-19 14:47 ` John Paul Adrian Glaubitz
  2021-03-24 18:54 ` Andrew Morton
  0 siblings, 2 replies; 6+ messages in thread
From: Valentin Schneider @ 2021-03-18 13:06 UTC (permalink / raw)
  To: linux-kernel, linux-ia64, debian-ia64
  Cc: John Paul Adrian Glaubitz, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Sergei Trofimovich, Anatoly Pugachev

John Paul reported a warning about bogus NUMA distance values spurred by
commit:

  620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")

In this case, the afflicted machine comes up with a reported 256 possible
nodes, all of which are 0 distance away from one another. This was
previously silently ignored, but is now caught by the aforementioned
commit.

The culprit is ia64's node_possible_map which remains unchanged from its
initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
have any SRAT nor SLIT table, but AIUI the possible map remains untouched
regardless of what ACPI tables end up being parsed. Thus, !online &&
possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
"reserved and have no meaning" as per the ACPI spec).

Follow x86 / drivers/base/arch_numa's example and set the possible map to
the parsed map, which in this case seems to be the online map.

Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
This might need an earlier Fixes: tag, but all of this is quite old and
dusty (the git blame rabbit hole leads me to ~2008/2007)

Alternatively, can we deprecate ia64 already?
---
 arch/ia64/kernel/acpi.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index a5636524af76..e2af6b172200 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void)
 	if (srat_num_cpus == 0) {
 		node_set_online(0);
 		node_cpuid[0].phys_id = hard_smp_processor_id();
-		return;
+		slit_distance(0, 0) = LOCAL_DISTANCE;
+		goto out;
 	}
 
 	/*
@@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void)
 			for (j = 0; j < MAX_NUMNODES; j++)
 				slit_distance(i, j) = i == j ?
 					LOCAL_DISTANCE : REMOTE_DISTANCE;
-		return;
+		goto out;
 	}
 
 	memset(numa_slit, -1, sizeof(numa_slit));
@@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void)
 		printk("\n");
 	}
 #endif
+out:
+	node_possible_map = node_online_map;
 }
 #endif				/* CONFIG_ACPI_NUMA */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
  2021-03-18 13:06 [PATCH] ia64: Ensure proper NUMA distance and possible map initialization Valentin Schneider
@ 2021-03-19 14:47 ` John Paul Adrian Glaubitz
  2021-03-19 19:10   ` Sergei Trofimovich
  2021-03-24 18:54 ` Andrew Morton
  1 sibling, 1 reply; 6+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-03-19 14:47 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel, linux-ia64, debian-ia64
  Cc: Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Sergei Trofimovich, Anatoly Pugachev

Hi Valentin!

On 3/18/21 2:06 PM, Valentin Schneider wrote:
> John Paul reported a warning about bogus NUMA distance values spurred by
> commit:
> 
>   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> 
> In this case, the afflicted machine comes up with a reported 256 possible
> nodes, all of which are 0 distance away from one another. This was
> previously silently ignored, but is now caught by the aforementioned
> commit.
> 
> The culprit is ia64's node_possible_map which remains unchanged from its
> initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
> have any SRAT nor SLIT table, but AIUI the possible map remains untouched
> regardless of what ACPI tables end up being parsed. Thus, !online &&
> possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
> "reserved and have no meaning" as per the ACPI spec).
> 
> Follow x86 / drivers/base/arch_numa's example and set the possible map to
> the parsed map, which in this case seems to be the online map.
> 
> Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
> Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> This might need an earlier Fixes: tag, but all of this is quite old and
> dusty (the git blame rabbit hole leads me to ~2008/2007)
> 
> Alternatively, can we deprecate ia64 already?
> ---
>  arch/ia64/kernel/acpi.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
> index a5636524af76..e2af6b172200 100644
> --- a/arch/ia64/kernel/acpi.c
> +++ b/arch/ia64/kernel/acpi.c
> @@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void)
>  	if (srat_num_cpus == 0) {
>  		node_set_online(0);
>  		node_cpuid[0].phys_id = hard_smp_processor_id();
> -		return;
> +		slit_distance(0, 0) = LOCAL_DISTANCE;
> +		goto out;
>  	}
>  
>  	/*
> @@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void)
>  			for (j = 0; j < MAX_NUMNODES; j++)
>  				slit_distance(i, j) = i == j ?
>  					LOCAL_DISTANCE : REMOTE_DISTANCE;
> -		return;
> +		goto out;
>  	}
>  
>  	memset(numa_slit, -1, sizeof(numa_slit));
> @@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void)
>  		printk("\n");
>  	}
>  #endif
> +out:
> +	node_possible_map = node_online_map;
>  }
>  #endif				/* CONFIG_ACPI_NUMA */
>  
> 

Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>

Could you send this patch through Andrew Morton's tree? The ia64 port currently
has no maintainer, so we have to use an alternative tree.

@Sergei: Could you test/ack this patch as well?

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
  2021-03-19 14:47 ` John Paul Adrian Glaubitz
@ 2021-03-19 19:10   ` Sergei Trofimovich
  2021-03-20 19:02     ` John Paul Adrian Glaubitz
  0 siblings, 1 reply; 6+ messages in thread
From: Sergei Trofimovich @ 2021-03-19 19:10 UTC (permalink / raw)
  To: John Paul Adrian Glaubitz
  Cc: Valentin Schneider, linux-kernel, linux-ia64, debian-ia64,
	Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Anatoly Pugachev

On Fri, 19 Mar 2021 15:47:09 +0100
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:

> Hi Valentin!
> 
> On 3/18/21 2:06 PM, Valentin Schneider wrote:
> > John Paul reported a warning about bogus NUMA distance values spurred by
> > commit:
> > 
> >   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> > 
> > In this case, the afflicted machine comes up with a reported 256 possible
> > nodes, all of which are 0 distance away from one another. This was
> > previously silently ignored, but is now caught by the aforementioned
> > commit.
> > 
> > The culprit is ia64's node_possible_map which remains unchanged from its
> > initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
> > have any SRAT nor SLIT table, but AIUI the possible map remains untouched
> > regardless of what ACPI tables end up being parsed. Thus, !online &&
> > possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
> > "reserved and have no meaning" as per the ACPI spec).
> > 
> > Follow x86 / drivers/base/arch_numa's example and set the possible map to
> > the parsed map, which in this case seems to be the online map.
> > 
> > Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
> > Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> > Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> > ---
> > This might need an earlier Fixes: tag, but all of this is quite old and
> > dusty (the git blame rabbit hole leads me to ~2008/2007)
> > 
> > Alternatively, can we deprecate ia64 already?
> > ---
> >  arch/ia64/kernel/acpi.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
> > index a5636524af76..e2af6b172200 100644
> > --- a/arch/ia64/kernel/acpi.c
> > +++ b/arch/ia64/kernel/acpi.c
> > @@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void)
> >  	if (srat_num_cpus == 0) {
> >  		node_set_online(0);
> >  		node_cpuid[0].phys_id = hard_smp_processor_id();
> > -		return;
> > +		slit_distance(0, 0) = LOCAL_DISTANCE;
> > +		goto out;
> >  	}
> >  
> >  	/*
> > @@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void)
> >  			for (j = 0; j < MAX_NUMNODES; j++)
> >  				slit_distance(i, j) = i == j ?
> >  					LOCAL_DISTANCE : REMOTE_DISTANCE;
> > -		return;
> > +		goto out;
> >  	}
> >  
> >  	memset(numa_slit, -1, sizeof(numa_slit));
> > @@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void)
> >  		printk("\n");
> >  	}
> >  #endif
> > +out:
> > +	node_possible_map = node_online_map;
> >  }
> >  #endif				/* CONFIG_ACPI_NUMA */
> >  
> >   
> 
> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> 
> Could you send this patch through Andrew Morton's tree? The ia64 port currently
> has no maintainer, so we have to use an alternative tree.
> 
> @Sergei: Could you test/ack this patch as well?

Booted successfully without problems on rx3600.

Tested-by: Sergei Trofimovich <slyfox@gentoo.org>


-- 

  Sergei

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
  2021-03-19 19:10   ` Sergei Trofimovich
@ 2021-03-20 19:02     ` John Paul Adrian Glaubitz
  0 siblings, 0 replies; 6+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-03-20 19:02 UTC (permalink / raw)
  To: Sergei Trofimovich
  Cc: Valentin Schneider, linux-kernel, linux-ia64, debian-ia64,
	Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann, Anatoly Pugachev,
	Andrew Morton

On 3/19/21 8:10 PM, Sergei Trofimovich wrote:
> On Fri, 19 Mar 2021 15:47:09 +0100
> John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:
> 
>> Hi Valentin!
>>
>> On 3/18/21 2:06 PM, Valentin Schneider wrote:
>>> John Paul reported a warning about bogus NUMA distance values spurred by
>>> commit:
>>>
>>>   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
>>>
>>> In this case, the afflicted machine comes up with a reported 256 possible
>>> nodes, all of which are 0 distance away from one another. This was
>>> previously silently ignored, but is now caught by the aforementioned
>>> commit.
>>>
>>> The culprit is ia64's node_possible_map which remains unchanged from its
>>> initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
>>> have any SRAT nor SLIT table, but AIUI the possible map remains untouched
>>> regardless of what ACPI tables end up being parsed. Thus, !online &&
>>> possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
>>> "reserved and have no meaning" as per the ACPI spec).
>>>
>>> Follow x86 / drivers/base/arch_numa's example and set the possible map to
>>> the parsed map, which in this case seems to be the online map.
>>>
>>> Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
>>> Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
>>> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
>>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>>> ---
>>> This might need an earlier Fixes: tag, but all of this is quite old and
>>> dusty (the git blame rabbit hole leads me to ~2008/2007)
>>>
>>> Alternatively, can we deprecate ia64 already?
>>> ---
>>>  arch/ia64/kernel/acpi.c | 7 +++++--
>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
>>> index a5636524af76..e2af6b172200 100644
>>> --- a/arch/ia64/kernel/acpi.c
>>> +++ b/arch/ia64/kernel/acpi.c
>>> @@ -446,7 +446,8 @@ void __init acpi_numa_fixup(void)
>>>  	if (srat_num_cpus == 0) {
>>>  		node_set_online(0);
>>>  		node_cpuid[0].phys_id = hard_smp_processor_id();
>>> -		return;
>>> +		slit_distance(0, 0) = LOCAL_DISTANCE;
>>> +		goto out;
>>>  	}
>>>  
>>>  	/*
>>> @@ -489,7 +490,7 @@ void __init acpi_numa_fixup(void)
>>>  			for (j = 0; j < MAX_NUMNODES; j++)
>>>  				slit_distance(i, j) = i == j ?
>>>  					LOCAL_DISTANCE : REMOTE_DISTANCE;
>>> -		return;
>>> +		goto out;
>>>  	}
>>>  
>>>  	memset(numa_slit, -1, sizeof(numa_slit));
>>> @@ -514,6 +515,8 @@ void __init acpi_numa_fixup(void)
>>>  		printk("\n");
>>>  	}
>>>  #endif
>>> +out:
>>> +	node_possible_map = node_online_map;
>>>  }
>>>  #endif				/* CONFIG_ACPI_NUMA */
>>>  
>>>   
>>
>> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
>>
>> Could you send this patch through Andrew Morton's tree? The ia64 port currently
>> has no maintainer, so we have to use an alternative tree.
>>
>> @Sergei: Could you test/ack this patch as well?
> 
> Booted successfully without problems on rx3600.
> 
> Tested-by: Sergei Trofimovich <slyfox@gentoo.org>

Great, thanks!

@Andrew: Could you pick up this patch through your tree?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
  2021-03-18 13:06 [PATCH] ia64: Ensure proper NUMA distance and possible map initialization Valentin Schneider
  2021-03-19 14:47 ` John Paul Adrian Glaubitz
@ 2021-03-24 18:54 ` Andrew Morton
  2021-03-24 18:59   ` John Paul Adrian Glaubitz
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2021-03-24 18:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, linux-ia64, debian-ia64, John Paul Adrian Glaubitz,
	Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Sergei Trofimovich, Anatoly Pugachev

On Thu, 18 Mar 2021 13:06:17 +0000 Valentin Schneider <valentin.schneider@arm.com> wrote:

> John Paul reported a warning about bogus NUMA distance values spurred by
> commit:
> 
>   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> 
> In this case, the afflicted machine comes up with a reported 256 possible
> nodes, all of which are 0 distance away from one another. This was
> previously silently ignored, but is now caught by the aforementioned
> commit.
> 
> The culprit is ia64's node_possible_map which remains unchanged from its
> initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
> have any SRAT nor SLIT table, but AIUI the possible map remains untouched
> regardless of what ACPI tables end up being parsed. Thus, !online &&
> possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
> "reserved and have no meaning" as per the ACPI spec).
> 
> Follow x86 / drivers/base/arch_numa's example and set the possible map to
> the parsed map, which in this case seems to be the online map.
> 
> Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
> Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
> This might need an earlier Fixes: tag, but all of this is quite old and
> dusty (the git blame rabbit hole leads me to ~2008/2007)
> 

Thanks.  Is this worth a cc:stable tag?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ia64: Ensure proper NUMA distance and possible map initialization
  2021-03-24 18:54 ` Andrew Morton
@ 2021-03-24 18:59   ` John Paul Adrian Glaubitz
  0 siblings, 0 replies; 6+ messages in thread
From: John Paul Adrian Glaubitz @ 2021-03-24 18:59 UTC (permalink / raw)
  To: Andrew Morton, Valentin Schneider
  Cc: linux-kernel, linux-ia64, debian-ia64, Peter Zijlstra (Intel),
	Ingo Molnar, Vincent Guittot, Dietmar Eggemann,
	Sergei Trofimovich, Anatoly Pugachev

Hi!

On 3/24/21 7:54 PM, Andrew Morton wrote:
> On Thu, 18 Mar 2021 13:06:17 +0000 Valentin Schneider <valentin.schneider@arm.com> wrote:
> 
>> John Paul reported a warning about bogus NUMA distance values spurred by
>> commit:
>>
>>   620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
>>
>> In this case, the afflicted machine comes up with a reported 256 possible
>> nodes, all of which are 0 distance away from one another. This was
>> previously silently ignored, but is now caught by the aforementioned
>> commit.
>>
>> The culprit is ia64's node_possible_map which remains unchanged from its
>> initialization value of NODE_MASK_ALL. In John's case, the machine doesn't
>> have any SRAT nor SLIT table, but AIUI the possible map remains untouched
>> regardless of what ACPI tables end up being parsed. Thus, !online &&
>> possible nodes remain with a bogus distance of 0 (distances \in [0, 9] are
>> "reserved and have no meaning" as per the ACPI spec).
>>
>> Follow x86 / drivers/base/arch_numa's example and set the possible map to
>> the parsed map, which in this case seems to be the online map.
>>
>> Link: http://lore.kernel.org/r/255d6b5d-194e-eb0e-ecdd-97477a534441@physik.fu-berlin.de
>> Fixes: 620a6dc40754 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
>> Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>> ---
>> This might need an earlier Fixes: tag, but all of this is quite old and
>> dusty (the git blame rabbit hole leads me to ~2008/2007)
>>
> 
> Thanks.  Is this worth a cc:stable tag?

Looks like the regression was introduced 5.12-rc1, so no need for backporting.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaubitz@debian.org
`. `'   Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-24 19:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 13:06 [PATCH] ia64: Ensure proper NUMA distance and possible map initialization Valentin Schneider
2021-03-19 14:47 ` John Paul Adrian Glaubitz
2021-03-19 19:10   ` Sergei Trofimovich
2021-03-20 19:02     ` John Paul Adrian Glaubitz
2021-03-24 18:54 ` Andrew Morton
2021-03-24 18:59   ` John Paul Adrian Glaubitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).