linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arch/arm64:  Fix topology initialization for core scheduling
@ 2022-03-22 16:03 Phil Auld
  2022-03-29 14:02 ` Dietmar Eggemann
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Auld @ 2022-03-22 16:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	linux-arm-kernel

Some arm64 rely on store_cpu_topology() to setup the real topology.
This needs to be done before the call to notify_cpu_starting() which
tell the scheduler about the cpu otherwise the core scheduling data
structures are setup in a way that does not match the actual topology.

Without this change stress-ng (which enables core scheduling in its prctl 
tests) causes a warning and then a crash (trimmed for legibility):

[ 1853.805168] ------------[ cut here ]------------
[ 1853.809784] task_rq(b)->core != rq->core
[ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
...
[ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 1854.231256] Call trace:
[ 1854.233689]  pick_next_task+0x3dc/0x81c
[ 1854.237512]  __schedule+0x10c/0x4cc
[ 1854.240988]  schedule_idle+0x34/0x54

Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
Signed-off-by: Phil Auld <pauld@redhat.com>
---
This is a similar issue to 
  f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") 
which fixed it for MIPS. 



 arch/arm64/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1e6baa..3b46041f2b97 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	 * Log the CPU info before it is marked online and might get read.
 	 */
 	cpuinfo_store_cpu();
+	store_cpu_topology(cpu);
 
 	/*
 	 * Enable GIC and timers.
@@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
 
 	ipi_setup(cpu);
 
-	store_cpu_topology(cpu);
 	numa_add_cpu(cpu);
 
 	/*
-- 
2.18.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-22 16:03 [PATCH] arch/arm64: Fix topology initialization for core scheduling Phil Auld
@ 2022-03-29 14:02 ` Dietmar Eggemann
  2022-03-29 15:20   ` Phil Auld
  0 siblings, 1 reply; 8+ messages in thread
From: Dietmar Eggemann @ 2022-03-29 14:02 UTC (permalink / raw)
  To: Phil Auld, linux-kernel
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, Peter Zijlstra,
	linux-arm-kernel

On 22/03/2022 17:03, Phil Auld wrote:
> Some arm64 rely on store_cpu_topology() to setup the real topology.
> This needs to be done before the call to notify_cpu_starting() which
> tell the scheduler about the cpu otherwise the core scheduling data
> structures are setup in a way that does not match the actual topology.
> 
> Without this change stress-ng (which enables core scheduling in its prctl 
> tests) causes a warning and then a crash (trimmed for legibility):
> 
> [ 1853.805168] ------------[ cut here ]------------
> [ 1853.809784] task_rq(b)->core != rq->core
> [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
> ...
> [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> ...
> [ 1854.231256] Call trace:
> [ 1854.233689]  pick_next_task+0x3dc/0x81c
> [ 1854.237512]  __schedule+0x10c/0x4cc
> [ 1854.240988]  schedule_idle+0x34/0x54
> 
> Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
> Signed-off-by: Phil Auld <pauld@redhat.com>
> ---
> This is a similar issue to 
>   f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") 
> which fixed it for MIPS.

I assume this is for a machine which relies on MPIDR-based setup
(package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
topology setup.

Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.

Which machine were you using?

>  arch/arm64/kernel/smp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 27df5c1e6baa..3b46041f2b97 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>  	 * Log the CPU info before it is marked online and might get read.
>  	 */
>  	cpuinfo_store_cpu();
> +	store_cpu_topology(cpu);
>  
>  	/*
>  	 * Enable GIC and timers.
> @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
>  
>  	ipi_setup(cpu);
>  
> -	store_cpu_topology(cpu);
>  	numa_add_cpu(cpu);
>  
>  	/*


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-29 14:02 ` Dietmar Eggemann
@ 2022-03-29 15:20   ` Phil Auld
  2022-03-29 18:55     ` Dietmar Eggemann
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Auld @ 2022-03-29 15:20 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
> On 22/03/2022 17:03, Phil Auld wrote:
> > Some arm64 rely on store_cpu_topology() to setup the real topology.
> > This needs to be done before the call to notify_cpu_starting() which
> > tell the scheduler about the cpu otherwise the core scheduling data
> > structures are setup in a way that does not match the actual topology.
> > 
> > Without this change stress-ng (which enables core scheduling in its prctl 
> > tests) causes a warning and then a crash (trimmed for legibility):
> > 
> > [ 1853.805168] ------------[ cut here ]------------
> > [ 1853.809784] task_rq(b)->core != rq->core
> > [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
> > ...
> > [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > ...
> > [ 1854.231256] Call trace:
> > [ 1854.233689]  pick_next_task+0x3dc/0x81c
> > [ 1854.237512]  __schedule+0x10c/0x4cc
> > [ 1854.240988]  schedule_idle+0x34/0x54
> > 
> > Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > ---
> > This is a similar issue to 
> >   f2703def339c ("MIPS: smp: fill in sibling and core maps earlier") 
> > which fixed it for MIPS.
> 
> I assume this is for a machine which relies on MPIDR-based setup
> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
> topology setup.

Yes, that's my understanding. No PPTT.

> 
> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.
>
> Which machine were you using?

This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
chips.

ARM (CN9980-2200LG4077-Y21-G) 


Thanks,
Phil

> 
> >  arch/arm64/kernel/smp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 27df5c1e6baa..3b46041f2b97 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
> >  	 * Log the CPU info before it is marked online and might get read.
> >  	 */
> >  	cpuinfo_store_cpu();
> > +	store_cpu_topology(cpu);
> >  
> >  	/*
> >  	 * Enable GIC and timers.
> > @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
> >  
> >  	ipi_setup(cpu);
> >  
> > -	store_cpu_topology(cpu);
> >  	numa_add_cpu(cpu);
> >  
> >  	/*
> 

-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-29 15:20   ` Phil Auld
@ 2022-03-29 18:55     ` Dietmar Eggemann
  2022-03-29 19:50       ` Phil Auld
  0 siblings, 1 reply; 8+ messages in thread
From: Dietmar Eggemann @ 2022-03-29 18:55 UTC (permalink / raw)
  To: Phil Auld
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On 29/03/2022 17:20, Phil Auld wrote:
> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
>> On 22/03/2022 17:03, Phil Auld wrote:

[...]

>> I assume this is for a machine which relies on MPIDR-based setup
>> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
>> topology setup.
> 
> Yes, that's my understanding. No PPTT.
> 
>>
>> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
>> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.
>>
>> Which machine were you using?
> 
> This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
> chips.
> 
> ARM (CN9980-2200LG4077-Y21-G) 
I'm using the same processor just with ACPI/PPTT.

# sudo dmidecode -t 4 | grep "Part Number"
	Part Number: CN9980-2200LG4077-21-Y-G
	Part Number: CN9980-2200LG4077-21-Y-G

# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings
0,32,64,96

# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
SMT
MC
NUMA

But no matter whether I disable parse_acpi_topology() or just force
`cpu_topology[cpu].package_id = -1` in this function, I always end up with:

# cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
MC
NUMA

# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0

so no SMT sched domain. The MPIDR-based topology fallback code in
store_cpu_topology() forces `cpuid_topo->thread_id  = -1`.

IMHO this is why on my machine I don't see this issue while running:

root@oss-apollo7007:~# stress-ng --prctl 256 -t 60
stress-ng: info:  [2388042] dispatching hogs: 256 prctl

Is there something I miss in my setup to provoke this issue?

[...]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-29 18:55     ` Dietmar Eggemann
@ 2022-03-29 19:50       ` Phil Auld
  2022-03-30 15:48         ` Dietmar Eggemann
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Auld @ 2022-03-29 19:50 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote:
> On 29/03/2022 17:20, Phil Auld wrote:
> > On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
> >> On 22/03/2022 17:03, Phil Auld wrote:
> 
> [...]
> 
> >> I assume this is for a machine which relies on MPIDR-based setup
> >> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
> >> topology setup.
> > 
> > Yes, that's my understanding. No PPTT.
> > 
> >>
> >> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
> >> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.
> >>
> >> Which machine were you using?
> > 
> > This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
> > chips.
> > 
> > ARM (CN9980-2200LG4077-Y21-G) 
> I'm using the same processor just with ACPI/PPTT.
>

Maybe I'm misinformed about these systems having no PPTT...  

I'm reclaiming the system. Is there a way I can tell from userspace?


> # sudo dmidecode -t 4 | grep "Part Number"
> 	Part Number: CN9980-2200LG4077-21-Y-G
> 	Part Number: CN9980-2200LG4077-21-Y-G
> 
> # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings
> 0,32,64,96
> 
> # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
> SMT
> MC
> NUMA
> 
> But no matter whether I disable parse_acpi_topology() or just force
> `cpu_topology[cpu].package_id = -1` in this function, I always end up with:
> 
> # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name
> MC
> NUMA
> 
> # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
> 0
> 
> so no SMT sched domain. The MPIDR-based topology fallback code in
> store_cpu_topology() forces `cpuid_topo->thread_id  = -1`.

Right. So since I'm getting SMT it must not have package_id == -1.
In which case you should be able to reproduce it because it must
be that the call the update_siblings_masks() is required.  That
appears to only be called from store_cpu_topology() which is
after the scheduler has already setup the core pointers.

The fix could be the same but I should reword the commit message
since it should effect all SMT arm systems I'd think.

Or maybe the ACPI topology code should call update_sibling_masks().


> 
> IMHO this is why on my machine I don't see this issue while running:
> 
> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60
> stress-ng: info:  [2388042] dispatching hogs: 256 prctl
> 
> Is there something I miss in my setup to provoke this issue?
>

Make sure you have a stress-ng that is new enough and built against
headers that have the CORE_SCHED prctls defined.


BTW, thanks for taking a look.


Cheers,
Phil

> [...]
> 

-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-29 19:50       ` Phil Auld
@ 2022-03-30 15:48         ` Dietmar Eggemann
  2022-03-30 15:52           ` Phil Auld
  2022-03-30 16:07           ` Phil Auld
  0 siblings, 2 replies; 8+ messages in thread
From: Dietmar Eggemann @ 2022-03-30 15:48 UTC (permalink / raw)
  To: Phil Auld
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On 29/03/2022 21:50, Phil Auld wrote:
> On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote:
>> On 29/03/2022 17:20, Phil Auld wrote:
>>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
>>>> On 22/03/2022 17:03, Phil Auld wrote:

[...]

>>> This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
>>> chips.
>>>
>>> ARM (CN9980-2200LG4077-Y21-G) 
>> I'm using the same processor just with ACPI/PPTT.
>>
> 
> Maybe I'm misinformed about these systems having no PPTT...  
> 
> I'm reclaiming the system. Is there a way I can tell from userspace?

# cat /sys/firmware/acpi/tables/PPTT > pptt.dat
# iasl -d pptt.dat
# vim pptt.dsl

[...]

>> so no SMT sched domain. The MPIDR-based topology fallback code in
>> store_cpu_topology() forces `cpuid_topo->thread_id  = -1`.
> 
> Right. So since I'm getting SMT it must not have package_id == -1.
> In which case you should be able to reproduce it because it must
> be that the call the update_siblings_masks() is required.  That
> appears to only be called from store_cpu_topology() which is
> after the scheduler has already setup the core pointers.
> 
> The fix could be the same but I should reword the commit message
> since it should effect all SMT arm systems I'd think.
> 
> Or maybe the ACPI topology code should call update_sibling_masks(). 
>>
>> IMHO this is why on my machine I don't see this issue while running:
>>
>> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60
>> stress-ng: info:  [2388042] dispatching hogs: 256 prctl
>>
>> Is there something I miss in my setup to provoke this issue?
>>
> 
> Make sure you have a stress-ng that is new enough and built against
> headers that have the CORE_SCHED prctls defined.

Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12
which includes:

  9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl

To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to
add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file.

Now the issue you described triggers on this machine immediately.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-30 15:48         ` Dietmar Eggemann
@ 2022-03-30 15:52           ` Phil Auld
  2022-03-30 16:07           ` Phil Auld
  1 sibling, 0 replies; 8+ messages in thread
From: Phil Auld @ 2022-03-30 15:52 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On Wed, Mar 30, 2022 at 05:48:34PM +0200 Dietmar Eggemann wrote:
> On 29/03/2022 21:50, Phil Auld wrote:
> > On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote:
> >> On 29/03/2022 17:20, Phil Auld wrote:
> >>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
> >>>> On 22/03/2022 17:03, Phil Auld wrote:
> 
> [...]
> 
> >>> This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
> >>> chips.
> >>>
> >>> ARM (CN9980-2200LG4077-Y21-G) 
> >> I'm using the same processor just with ACPI/PPTT.
> >>
> > 
> > Maybe I'm misinformed about these systems having no PPTT...  
> > 
> > I'm reclaiming the system. Is there a way I can tell from userspace?
> 
> # cat /sys/firmware/acpi/tables/PPTT > pptt.dat
> # iasl -d pptt.dat
> # vim pptt.dsl
>

Thanks, I'll git that a try.  I suspect these are the same as yours though
and I was just mistaken :)


> [...]
> 
> >> so no SMT sched domain. The MPIDR-based topology fallback code in
> >> store_cpu_topology() forces `cpuid_topo->thread_id  = -1`.
> > 
> > Right. So since I'm getting SMT it must not have package_id == -1.
> > In which case you should be able to reproduce it because it must
> > be that the call the update_siblings_masks() is required.  That
> > appears to only be called from store_cpu_topology() which is
> > after the scheduler has already setup the core pointers.
> > 
> > The fix could be the same but I should reword the commit message
> > since it should effect all SMT arm systems I'd think.
> > 
> > Or maybe the ACPI topology code should call update_sibling_masks(). 
> >>
> >> IMHO this is why on my machine I don't see this issue while running:
> >>
> >> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60
> >> stress-ng: info:  [2388042] dispatching hogs: 256 prctl
> >>
> >> Is there something I miss in my setup to provoke this issue?
> >>
> > 
> > Make sure you have a stress-ng that is new enough and built against
> > headers that have the CORE_SCHED prctls defined.
> 
> Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12
> which includes:
> 
>   9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl
> 
> To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to
> add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file.
> 
> Now the issue you described triggers on this machine immediately.
>

Great!  I'll repost the patch with a more accurate commit message then.

And if you come up with something different that works for me too. Let
me know and I'll test it here.


Cheers,
Phil


-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arch/arm64: Fix topology initialization for core scheduling
  2022-03-30 15:48         ` Dietmar Eggemann
  2022-03-30 15:52           ` Phil Auld
@ 2022-03-30 16:07           ` Phil Auld
  1 sibling, 0 replies; 8+ messages in thread
From: Phil Auld @ 2022-03-30 16:07 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Catalin Marinas, Will Deacon, Mark Rutland,
	Peter Zijlstra, linux-arm-kernel

On Wed, Mar 30, 2022 at 05:48:34PM +0200 Dietmar Eggemann wrote:
> On 29/03/2022 21:50, Phil Auld wrote:
> > On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote:
> >> On 29/03/2022 17:20, Phil Auld wrote:
> >>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
> >>>> On 22/03/2022 17:03, Phil Auld wrote:
> 
> [...]
> 
> >>> This instance is an HPE Apollo 70 set to smt-4.  I believe it's ThunderX2
> >>> chips.
> >>>
> >>> ARM (CN9980-2200LG4077-Y21-G) 
> >> I'm using the same processor just with ACPI/PPTT.
> >>
> > 
> > Maybe I'm misinformed about these systems having no PPTT...  
> > 
> > I'm reclaiming the system. Is there a way I can tell from userspace?
> 
> # cat /sys/firmware/acpi/tables/PPTT > pptt.dat
> # iasl -d pptt.dat
> # vim pptt.dsl
> 

I don't have iasl but 

# strings pptt.dat 
PPTT
ServerCL
 CAVM


So that looks like it has a PPTT entry.  


Cheers,
Phil


> [...]
> 
> >> so no SMT sched domain. The MPIDR-based topology fallback code in
> >> store_cpu_topology() forces `cpuid_topo->thread_id  = -1`.
> > 
> > Right. So since I'm getting SMT it must not have package_id == -1.
> > In which case you should be able to reproduce it because it must
> > be that the call the update_siblings_masks() is required.  That
> > appears to only be called from store_cpu_topology() which is
> > after the scheduler has already setup the core pointers.
> > 
> > The fix could be the same but I should reword the commit message
> > since it should effect all SMT arm systems I'd think.
> > 
> > Or maybe the ACPI topology code should call update_sibling_masks(). 
> >>
> >> IMHO this is why on my machine I don't see this issue while running:
> >>
> >> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60
> >> stress-ng: info:  [2388042] dispatching hogs: 256 prctl
> >>
> >> Is there something I miss in my setup to provoke this issue?
> >>
> > 
> > Make sure you have a stress-ng that is new enough and built against
> > headers that have the CORE_SCHED prctls defined.
> 
> Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12
> which includes:
> 
>   9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl
> 
> To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to
> add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file.
> 
> Now the issue you described triggers on this machine immediately.
> 

-- 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-03-30 16:09 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22 16:03 [PATCH] arch/arm64: Fix topology initialization for core scheduling Phil Auld
2022-03-29 14:02 ` Dietmar Eggemann
2022-03-29 15:20   ` Phil Auld
2022-03-29 18:55     ` Dietmar Eggemann
2022-03-29 19:50       ` Phil Auld
2022-03-30 15:48         ` Dietmar Eggemann
2022-03-30 15:52           ` Phil Auld
2022-03-30 16:07           ` Phil Auld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).