LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/1] powerpc/smp: Set numa node before updating mask
@ 2021-04-01 15:42 Srikar Dronamraju
  2021-04-01 22:51 ` Nathan Lynch
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2021-04-01 15:42 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju,
	Peter Zijlstra, Scott Cheloha, Geetika Moolchandani, Ingo Molnar,
	linuxppc-dev, Valentin Schneider

Geethika reported a trace when doing a dlpar CPU add.

------------[ cut here ]------------
WARNING: CPU: 152 PID: 1134 at kernel/sched/topology.c:2057
CPU: 152 PID: 1134 Comm: kworker/152:1 Not tainted 5.12.0-rc5-master #5
Workqueue: events cpuset_hotplug_workfn
NIP:  c0000000001cfc14 LR: c0000000001cfc10 CTR: c0000000007e3420
REGS: c0000034a08eb260 TRAP: 0700   Not tainted  (5.12.0-rc5-master+)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28828422  XER: 00000020
CFAR: c0000000001fd888 IRQMASK: 0 #012GPR00: c0000000001cfc10
c0000034a08eb500 c000000001f35400 0000000000000027 #012GPR04:
c0000035abaa8010 c0000035abb30a00 0000000000000027 c0000035abaa8018
#012GPR08: 0000000000000023 c0000035abaaef48 00000035aa540000
c0000035a49dffe8 #012GPR12: 0000000028828424 c0000035bf1a1c80
0000000000000497 0000000000000004 #012GPR16: c00000000347a258
0000000000000140 c00000000203d468 c000000001a1a490 #012GPR20:
c000000001f9c160 c0000034adf70920 c0000034aec9fd20 0000000100087bd3
#012GPR24: 0000000100087bd3 c0000035b3de09f8 0000000000000030
c0000035b3de09f8 #012GPR28: 0000000000000028 c00000000347a280
c0000034aefe0b00 c0000000010a2a68
NIP [c0000000001cfc14] build_sched_domains+0x6a4/0x1500
LR [c0000000001cfc10] build_sched_domains+0x6a0/0x1500
Call Trace:
[c0000034a08eb500] [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 (unreliable)
[c0000034a08eb640] [c0000000001d1e6c] partition_sched_domains_locked+0x3ec/0x530
[c0000034a08eb6e0] [c0000000002936d4] rebuild_sched_domains_locked+0x524/0xbf0
[c0000034a08eb7e0] [c000000000296bb0] rebuild_sched_domains+0x40/0x70
[c0000034a08eb810] [c000000000296e74] cpuset_hotplug_workfn+0x294/0xe20
[c0000034a08ebc30] [c000000000178dd0] process_one_work+0x300/0x670
[c0000034a08ebd10] [c0000000001791b8] worker_thread+0x78/0x520
[c0000034a08ebda0] [c000000000185090] kthread+0x1a0/0x1b0
[c0000034a08ebe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
Instruction dump:
7d2903a6 4e800421 e8410018 7f67db78 7fe6fb78 7f45d378 7f84e378 7c681b78
3c62ff1a 3863c6f8 4802dc35 60000000 <0fe00000> 3920fff4 f9210070 e86100a0
---[ end trace 532d9066d3d4d7ec ]---

Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search
for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before
updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated
correctly before its used in setting the masks. Setting the numa_node will
ensure that when cpu_cpu_mask() gets called, the correct node number is
used. This code movement helped fix the above call trace.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5a4d59a1070d..1a99d75679a8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1521,6 +1521,9 @@ void start_secondary(void *unused)
 
 	vdso_getcpu_init();
 #endif
+	set_numa_node(numa_cpu_lookup_table[cpu]);
+	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
+
 	/* Update topology CPU masks */
 	add_cpu_to_masks(cpu);
 
@@ -1539,9 +1542,6 @@ void start_secondary(void *unused)
 			shared_caches = true;
 	}
 
-	set_numa_node(numa_cpu_lookup_table[cpu]);
-	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-
 	smp_wmb();
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-01 15:42 [PATCH 1/1] powerpc/smp: Set numa node before updating mask Srikar Dronamraju
@ 2021-04-01 22:51 ` Nathan Lynch
  2021-04-02  3:18   ` Srikar Dronamraju
  2021-04-09 10:14   ` Srikar Dronamraju
  2021-04-13 12:25 ` Nathan Lynch
  2021-04-19  4:00 ` Michael Ellerman
  2 siblings, 2 replies; 12+ messages in thread
From: Nathan Lynch @ 2021-04-01 22:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Srikar Dronamraju, Peter Zijlstra,
	Scott Cheloha, Geetika Moolchandani, Ingo Molnar, Laurent Dufour,
	linuxppc-dev, Valentin Schneider

Hi Srikar,

Thanks for figuring this out.

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>
> Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search
> for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before
> updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated
> correctly before its used in setting the masks. Setting the numa_node will
> ensure that when cpu_cpu_mask() gets called, the correct node number is
> used. This code movement helped fix the above call trace.
>
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5a4d59a1070d..1a99d75679a8 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1521,6 +1521,9 @@ void start_secondary(void *unused)
>  
>  	vdso_getcpu_init();
>  #endif
> +	set_numa_node(numa_cpu_lookup_table[cpu]);
> +	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> +
>  	/* Update topology CPU masks */
>  	add_cpu_to_masks(cpu);
>  
> @@ -1539,9 +1542,6 @@ void start_secondary(void *unused)
>  			shared_caches = true;
>  	}
>  
> -	set_numa_node(numa_cpu_lookup_table[cpu]);
> -	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> -

Regardless of your change: at boot time, this set of calls to
set_numa_node() and set_numa_mem() is redundant, right? Because
smp_prepare_cpus() has:

	for_each_possible_cpu(cpu) {
		...
		if (cpu_present(cpu)) {
			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
			set_cpu_numa_mem(cpu,
				local_memory_node(numa_cpu_lookup_table[cpu]));
		}

I would rather that, when onlining a CPU that happens to have been
dynamically added after boot, we enter start_secondary() with conditions
equivalent to those at boot time. Or as close to that as is practical.

So I'd suggest that pseries_add_processor() be made to update
these things when the CPUs are marked present, before onlining them.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-01 22:51 ` Nathan Lynch
@ 2021-04-02  3:18   ` Srikar Dronamraju
  2021-04-07 12:19     ` Nathan Lynch
  2021-04-09 10:14   ` Srikar Dronamraju
  1 sibling, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2021-04-02  3:18 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

* Nathan Lynch <nathanl@linux.ibm.com> [2021-04-01 17:51:05]:

Thanks Nathan for reviewing.

> > -	set_numa_node(numa_cpu_lookup_table[cpu]);
> > -	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> > -
> 
> Regardless of your change: at boot time, this set of calls to
> set_numa_node() and set_numa_mem() is redundant, right? Because
> smp_prepare_cpus() has:
> 
> 	for_each_possible_cpu(cpu) {
> 		...
> 		if (cpu_present(cpu)) {
> 			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> 			set_cpu_numa_mem(cpu,
> 				local_memory_node(numa_cpu_lookup_table[cpu]));
> 		}
> 
> I would rather that, when onlining a CPU that happens to have been
> dynamically added after boot, we enter start_secondary() with conditions
> equivalent to those at boot time. Or as close to that as is practical.
> 
> So I'd suggest that pseries_add_processor() be made to update
> these things when the CPUs are marked present, before onlining them.

In pseries_add_processor, we are only marking the cpu as present. i.e
I believe numa_setup_cpu() would not have been called. So we may not have a
way to associate the CPU to the node. Otherwise we will have to call
numa_setup_cpu() or the hcall_vphn.

We could try calling numa_setup_cpu() immediately after we set the
CPU to be present, but that would be one more extra hcall + I dont know if
there are any more steps needed before CPU being made present and
associating the CPU to the node. Are we sure the node is already online? For
the numa_mem, we are better of if the zonelists for the node are built.

or the other solution would be to call this in map_cpu_to_node().
Here also we have to be sure the zonelists for the node are already built.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-02  3:18   ` Srikar Dronamraju
@ 2021-04-07 12:19     ` Nathan Lynch
  2021-04-07 16:49       ` Srikar Dronamraju
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Lynch @ 2021-04-07 12:19 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

Sorry for the delay in following up here.

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> > -	set_numa_node(numa_cpu_lookup_table[cpu]);
>> > -	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
>> > -
>> 
>> Regardless of your change: at boot time, this set of calls to
>> set_numa_node() and set_numa_mem() is redundant, right? Because
>> smp_prepare_cpus() has:
>> 
>> 	for_each_possible_cpu(cpu) {
>> 		...
>> 		if (cpu_present(cpu)) {
>> 			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
>> 			set_cpu_numa_mem(cpu,
>> 				local_memory_node(numa_cpu_lookup_table[cpu]));
>> 		}
>> 
>> I would rather that, when onlining a CPU that happens to have been
>> dynamically added after boot, we enter start_secondary() with conditions
>> equivalent to those at boot time. Or as close to that as is practical.
>> 
>> So I'd suggest that pseries_add_processor() be made to update
>> these things when the CPUs are marked present, before onlining them.
>
> In pseries_add_processor, we are only marking the cpu as present. i.e
> I believe numa_setup_cpu() would not have been called. So we may not have a
> way to associate the CPU to the node. Otherwise we will have to call
> numa_setup_cpu() or the hcall_vphn.
>
> We could try calling numa_setup_cpu() immediately after we set the
> CPU to be present, but that would be one more extra hcall + I dont know if
> there are any more steps needed before CPU being made present and
> associating the CPU to the node.

An additional hcall in this path doesn't seem too expensive.

> Are we sure the node is already online?

I see that dlpar_online_cpu() calls find_and_online_cpu_nid(), so yes I
think that's covered.

> For the numa_mem, we are better of if the zonelists for the node are
> built.
>
> or the other solution would be to call this in map_cpu_to_node().
> Here also we have to be sure the zonelists for the node are already
> built.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-07 12:19     ` Nathan Lynch
@ 2021-04-07 16:49       ` Srikar Dronamraju
  2021-04-07 19:46         ` Nathan Lynch
  0 siblings, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2021-04-07 16:49 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

* Nathan Lynch <nathanl@linux.ibm.com> [2021-04-07 07:19:10]:

> Sorry for the delay in following up here.
> 

No issues.

> >> So I'd suggest that pseries_add_processor() be made to update
> >> these things when the CPUs are marked present, before onlining them.
> >
> > In pseries_add_processor, we are only marking the cpu as present. i.e
> > I believe numa_setup_cpu() would not have been called. So we may not have a
> > way to associate the CPU to the node. Otherwise we will have to call
> > numa_setup_cpu() or the hcall_vphn.
> >
> > We could try calling numa_setup_cpu() immediately after we set the
> > CPU to be present, but that would be one more extra hcall + I dont know if
> > there are any more steps needed before CPU being made present and
> > associating the CPU to the node.
> 
> An additional hcall in this path doesn't seem too expensive.
> 
> > Are we sure the node is already online?
> 
> I see that dlpar_online_cpu() calls find_and_online_cpu_nid(), so yes I
> think that's covered.

Okay, 

Can we just call set_cpu_numa_node() at the end of map_cpu_to_node().
The advantage would be the update to numa_cpu_lookup_table and cpu_to_node
would happen at the same time and would be in sync.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-07 16:49       ` Srikar Dronamraju
@ 2021-04-07 19:46         ` Nathan Lynch
  2021-04-08 11:11           ` Srikar Dronamraju
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Lynch @ 2021-04-07 19:46 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> * Nathan Lynch <nathanl@linux.ibm.com> [2021-04-07 07:19:10]:
>
>> Sorry for the delay in following up here.
>> 
>
> No issues.
>
>> >> So I'd suggest that pseries_add_processor() be made to update
>> >> these things when the CPUs are marked present, before onlining them.
>> >
>> > In pseries_add_processor, we are only marking the cpu as present. i.e
>> > I believe numa_setup_cpu() would not have been called. So we may not have a
>> > way to associate the CPU to the node. Otherwise we will have to call
>> > numa_setup_cpu() or the hcall_vphn.
>> >
>> > We could try calling numa_setup_cpu() immediately after we set the
>> > CPU to be present, but that would be one more extra hcall + I dont know if
>> > there are any more steps needed before CPU being made present and
>> > associating the CPU to the node.
>> 
>> An additional hcall in this path doesn't seem too expensive.
>> 
>> > Are we sure the node is already online?
>> 
>> I see that dlpar_online_cpu() calls find_and_online_cpu_nid(), so yes I
>> think that's covered.
>
> Okay, 
>
> Can we just call set_cpu_numa_node() at the end of map_cpu_to_node().
> The advantage would be the update to numa_cpu_lookup_table and cpu_to_node
> would happen at the same time and would be in sync.

I don't know. I guess this question just makes me wonder whether powerpc
needs to have the additional lookup table. How is it different from the
generic per_cpu numa_node?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-07 19:46         ` Nathan Lynch
@ 2021-04-08 11:11           ` Srikar Dronamraju
  2021-04-08 20:00             ` Nathan Lynch
  0 siblings, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2021-04-08 11:11 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

* Nathan Lynch <nathanl@linux.ibm.com> [2021-04-07 14:46:24]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> 
> > * Nathan Lynch <nathanl@linux.ibm.com> [2021-04-07 07:19:10]:
> >
> >> Sorry for the delay in following up here.
> >> 
> >
> > No issues.
> >
> >> >> So I'd suggest that pseries_add_processor() be made to update
> >> >> these things when the CPUs are marked present, before onlining them.
> >> >
> >> > In pseries_add_processor, we are only marking the cpu as present. i.e
> >> > I believe numa_setup_cpu() would not have been called. So we may not have a
> >> > way to associate the CPU to the node. Otherwise we will have to call
> >> > numa_setup_cpu() or the hcall_vphn.
> >> >
> >> > We could try calling numa_setup_cpu() immediately after we set the
> >> > CPU to be present, but that would be one more extra hcall + I dont know if
> >> > there are any more steps needed before CPU being made present and
> >> > associating the CPU to the node.
> >> 
> >> An additional hcall in this path doesn't seem too expensive.
> >> 
> >> > Are we sure the node is already online?
> >> 
> >> I see that dlpar_online_cpu() calls find_and_online_cpu_nid(), so yes I
> >> think that's covered.
> >
> > Okay, 
> >
> > Can we just call set_cpu_numa_node() at the end of map_cpu_to_node().
> > The advantage would be the update to numa_cpu_lookup_table and cpu_to_node
> > would happen at the same time and would be in sync.
> 
> I don't know. I guess this question just makes me wonder whether powerpc
> needs to have the additional lookup table. How is it different from the
> generic per_cpu numa_node?

lookup table is for early cpu to node i.e when per_cpu variables may not be
available. This would mean that calling set_numa_node/set_cpu_numa_node from
map_cpu_to_node() may not always be an option, since map_cpu_to_node() does
end up getting called very early in the system.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-08 11:11           ` Srikar Dronamraju
@ 2021-04-08 20:00             ` Nathan Lynch
  0 siblings, 0 replies; 12+ messages in thread
From: Nathan Lynch @ 2021-04-08 20:00 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> * Nathan Lynch <nathanl@linux.ibm.com> [2021-04-07 14:46:24]:
>> I don't know. I guess this question just makes me wonder whether powerpc
>> needs to have the additional lookup table. How is it different from the
>> generic per_cpu numa_node?
>
> lookup table is for early cpu to node i.e when per_cpu variables may not be
> available. This would mean that calling set_numa_node/set_cpu_numa_node from
> map_cpu_to_node() may not always be an option, since map_cpu_to_node() does
> end up getting called very early in the system.

Ah that's right, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-01 22:51 ` Nathan Lynch
  2021-04-02  3:18   ` Srikar Dronamraju
@ 2021-04-09 10:14   ` Srikar Dronamraju
  2021-04-13 12:23     ` Nathan Lynch
  1 sibling, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2021-04-09 10:14 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

Hey Nathan,

>
> Regardless of your change: at boot time, this set of calls to
> set_numa_node() and set_numa_mem() is redundant, right? Because
> smp_prepare_cpus() has:
>
> 	for_each_possible_cpu(cpu) {
> 		...
> 		if (cpu_present(cpu)) {
> 			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> 			set_cpu_numa_mem(cpu,
> 				local_memory_node(numa_cpu_lookup_table[cpu]));
> 		}
>
> I would rather that, when onlining a CPU that happens to have been
> dynamically added after boot, we enter start_secondary() with conditions
> equivalent to those at boot time. Or as close to that as is practical.
>
> So I'd suggest that pseries_add_processor() be made to update
> these things when the CPUs are marked present, before onlining them.

I did try updating at pseries_add_processor when things were being marked as
present. But looks like the zonelists may not be updated at that time.
I saw couple of crashes in local_memory_node() when dplar adding CPUs.
(Appending the patch that causes these crash to this mail for your reference)

[  293.669901] Kernel attempted to read user page (1508) - exploit attempt? (uid: 0)                                                                                                                     [1309/17174]
[  293.669925] BUG: Kernel NULL pointer dereference on read at 0x00001508
[  293.669935] Faulting instruction address: 0xc000000000484ae0
[  293.669947] Oops: Kernel access of bad area, sig: 11 [#1]
[  293.669956] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  293.669969] Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp
mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag
af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_ fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set
nf_tables nfnetlink dm_multipath pseries_rng xts vmx_c rypto uio_pdrv_genirq
uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth
scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod fuse
[  293.670101] CPU: 17 PID: 3183 Comm: drmgr Not tainted 5.12.0-rc5-master+ #2
[  293.670113] NIP:  c000000000484ae0 LR: c00000000010a548 CTR: 00000000006dd130
[  293.670121] REGS: c0000025a9997160 TRAP: 0300   Not tainted  (5.12.0-rc5-master+)
[  293.670131] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48008422  XER: 00000008
[  293.670155] CFAR: c00000000010a544 DAR: 0000000000001508 DSISR: 40000000 IRQMASK: 0
[  293.670155] GPR00: c00000000010a548 c0000025a9997400 c000000001f55500 0000000000000000
[  293.670155] GPR04: c000000001a3c598 0000000000000006 0000000000000027 c0000035873b8018
[  293.670155] GPR08: 0000000000000023 c0000000020464f8 00000035894d0000 c000003584c8ffe8
[  293.670155] GPR12: 0000000028008424 c0000035bf22ce80 0000000000000000 0000000000000000
[  293.670155] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  293.670155] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000001fbc160
[  293.670155] GPR24: 0000000000000002 0000000000000200 c000000001fc04c8 0000000000000001
[  293.670155] GPR28: c0000000010c6fc8 c000000001fc08a0 c000002f8fb99e60 0000000000000040
[  293.670263] NIP [c000000000484ae0] local_memory_node+0x20/0x80
[  293.670281] LR [c00000000010a548] pseries_add_processor+0x368/0x3b0
[  293.670292] Call Trace:
[  293.670297] [c0000025a9997400] [c00000000010a524] pseries_add_processor+0x344/0x3b0 (unreliable)
[  293.670311] [c0000025a99976c0] [c00000000010a790] pseries_smp_notifier+0x200/0x2a0
[  293.670322] [c0000025a9997780] [c000000000197288] blocking_notifier_call_chain+0xa8/0x110
[  293.670335] [c0000025a99977d0] [c000000000b27010] of_attach_node+0xc0/0x110
[  293.670347] [c0000025a9997830] [c0000000001032a0] dlpar_attach_node+0x30/0x70
[  293.670358] [c0000025a99978a0] [c000000000109ac0] dlpar_cpu_add+0x1d0/0x510
[  293.670368] [c0000025a9997980] [c00000000010a898] dlpar_cpu+0x68/0x6e0
[  293.670377] [c0000025a9997a80] [c0000000001036b8] handle_dlpar_errorlog+0xf8/0x190
[  293.670388] [c0000025a9997af0] [c000000000103928] dlpar_store+0x178/0x4a0
[  293.670396] [c0000025a9997bb0] [c0000000007df050] kobj_attr_store+0x30/0x50
[  293.670408] [c0000025a9997bd0] [c00000000062f0b0] sysfs_kf_write+0x70/0xb0
[  293.670421] [c0000025a9997c10] [c00000000062d4e0] kernfs_fop_write_iter+0x1d0/0x280
[  293.670432] [c0000025a9997c60] [c00000000051673c] new_sync_write+0x14c/0x1d0
[  293.670445] [c0000025a9997d00] [c000000000519df4] vfs_write+0x264/0x380
[  293.670455] [c0000025a9997d60] [c00000000051a0ec] ksys_write+0x7c/0x140
[  293.670464] [c0000025a9997db0] [c000000000032af0] system_call_exception+0x150/0x290
[  293.670475] [c0000025a9997e10] [c00000000000d45c] system_call_common+0xec/0x278
[  293.670486] --- interrupt: c00 at 0x20000025bd74

That leaves us with just 2 options for now.
1. Update numa_mem later and only update numa_node here.
- Over a longer period of time, this would be more confusing since we
may lose track of why we are splitting the set of numa_node and numa_mem.

or
2. Use my earlier patch.

My choice would be to go with my earlier patch.
Please do let me know your thoughts on the same.

-- 
Thanks and Regards
Srikar Dronamraju

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 3beeb030cd78..1cdd83703f93 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,6 +44,7 @@ extern void __init dump_numa_cpu_topology(void);
 
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_setup_cpu(unsigned long cpu);
 
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
 {
@@ -81,6 +82,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
 {
 }
 
+static int numa_setup_cpu(unsigned long cpu)
+{
+	return first_online_node;
+}
+
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
 
 static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5a4d59a1070d..0d0c71ba4672 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1539,9 +1539,6 @@ void start_secondary(void *unused)
 			shared_caches = true;
 	}
 
-	set_numa_node(numa_cpu_lookup_table[cpu]);
-	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-
 	smp_wmb();
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..11914a28db67 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -501,7 +501,7 @@ static int vphn_get_nid(long unused)
  * Figure out to which domain a cpu belongs and stick it there.
  * Return the id of the domain used.
  */
-static int numa_setup_cpu(unsigned long lcpu)
+int numa_setup_cpu(unsigned long lcpu)
 {
 	struct device_node *cpu;
 	int fcpu = cpu_first_thread_sibling(lcpu);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..311fbe916d5b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -198,9 +198,14 @@ static int pseries_add_processor(struct device_node *np)
 	}
 
 	for_each_cpu(cpu, tmp) {
+		int nid;
+
 		BUG_ON(cpu_present(cpu));
 		set_cpu_present(cpu, true);
 		set_hard_smp_processor_id(cpu, be32_to_cpu(*intserv++));
+		nid = numa_setup_cpu(cpu);
+		set_cpu_numa_node(cpu, nid);
+		set_cpu_numa_mem(cpu, local_memory_node(nid));
 	}
 	err = 0;
 out_unlock:


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-09 10:14   ` Srikar Dronamraju
@ 2021-04-13 12:23     ` Nathan Lynch
  0 siblings, 0 replies; 12+ messages in thread
From: Nathan Lynch @ 2021-04-13 12:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Valentin Schneider, Laurent Dufour,
	linuxppc-dev, Ingo Molnar

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> That leaves us with just 2 options for now.
> 1. Update numa_mem later and only update numa_node here.
> - Over a longer period of time, this would be more confusing since we
> may lose track of why we are splitting the set of numa_node and numa_mem.
>
> or
> 2. Use my earlier patch.
>
> My choice would be to go with my earlier patch.
> Please do let me know your thoughts on the same.

OK, agreed. Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-01 15:42 [PATCH 1/1] powerpc/smp: Set numa node before updating mask Srikar Dronamraju
  2021-04-01 22:51 ` Nathan Lynch
@ 2021-04-13 12:25 ` Nathan Lynch
  2021-04-19  4:00 ` Michael Ellerman
  2 siblings, 0 replies; 12+ messages in thread
From: Nathan Lynch @ 2021-04-13 12:25 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Srikar Dronamraju, Peter Zijlstra,
	Scott Cheloha, Geetika Moolchandani, Ingo Molnar, linuxppc-dev,
	Valentin Schneider

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>
> Some of the per-CPU masks use cpu_cpu_mask as a filter to limit the search
> for related CPUs. On a dlpar add of a CPU, update cpu_cpu_mask before
> updating the per-CPU masks. This will ensure the cpu_cpu_mask is updated
> correctly before its used in setting the masks. Setting the numa_node will
> ensure that when cpu_cpu_mask() gets called, the correct node number is
> used. This code movement helped fix the above call trace.
>
> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] powerpc/smp: Set numa node before updating mask
  2021-04-01 15:42 [PATCH 1/1] powerpc/smp: Set numa node before updating mask Srikar Dronamraju
  2021-04-01 22:51 ` Nathan Lynch
  2021-04-13 12:25 ` Nathan Lynch
@ 2021-04-19  4:00 ` Michael Ellerman
  2 siblings, 0 replies; 12+ messages in thread
From: Michael Ellerman @ 2021-04-19  4:00 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Peter Zijlstra, Scott Cheloha,
	Geetika Moolchandani, Ingo Molnar, linuxppc-dev,
	Valentin Schneider

On Thu, 1 Apr 2021 21:12:00 +0530, Srikar Dronamraju wrote:
> Geethika reported a trace when doing a dlpar CPU add.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 152 PID: 1134 at kernel/sched/topology.c:2057
> CPU: 152 PID: 1134 Comm: kworker/152:1 Not tainted 5.12.0-rc5-master #5
> Workqueue: events cpuset_hotplug_workfn
> NIP:  c0000000001cfc14 LR: c0000000001cfc10 CTR: c0000000007e3420
> REGS: c0000034a08eb260 TRAP: 0700   Not tainted  (5.12.0-rc5-master+)
> MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28828422  XER: 00000020
> CFAR: c0000000001fd888 IRQMASK: 0 #012GPR00: c0000000001cfc10
> c0000034a08eb500 c000000001f35400 0000000000000027 #012GPR04:
> c0000035abaa8010 c0000035abb30a00 0000000000000027 c0000035abaa8018
> #012GPR08: 0000000000000023 c0000035abaaef48 00000035aa540000
> c0000035a49dffe8 #012GPR12: 0000000028828424 c0000035bf1a1c80
> 0000000000000497 0000000000000004 #012GPR16: c00000000347a258
> 0000000000000140 c00000000203d468 c000000001a1a490 #012GPR20:
> c000000001f9c160 c0000034adf70920 c0000034aec9fd20 0000000100087bd3
> #012GPR24: 0000000100087bd3 c0000035b3de09f8 0000000000000030
> c0000035b3de09f8 #012GPR28: 0000000000000028 c00000000347a280
> c0000034aefe0b00 c0000000010a2a68
> NIP [c0000000001cfc14] build_sched_domains+0x6a4/0x1500
> LR [c0000000001cfc10] build_sched_domains+0x6a0/0x1500
> Call Trace:
> [c0000034a08eb500] [c0000000001cfc10] build_sched_domains+0x6a0/0x1500 (unreliable)
> [c0000034a08eb640] [c0000000001d1e6c] partition_sched_domains_locked+0x3ec/0x530
> [c0000034a08eb6e0] [c0000000002936d4] rebuild_sched_domains_locked+0x524/0xbf0
> [c0000034a08eb7e0] [c000000000296bb0] rebuild_sched_domains+0x40/0x70
> [c0000034a08eb810] [c000000000296e74] cpuset_hotplug_workfn+0x294/0xe20
> [c0000034a08ebc30] [c000000000178dd0] process_one_work+0x300/0x670
> [c0000034a08ebd10] [c0000000001791b8] worker_thread+0x78/0x520
> [c0000034a08ebda0] [c000000000185090] kthread+0x1a0/0x1b0
> [c0000034a08ebe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70
> Instruction dump:
> 7d2903a6 4e800421 e8410018 7f67db78 7fe6fb78 7f45d378 7f84e378 7c681b78
> 3c62ff1a 3863c6f8 4802dc35 60000000 <0fe00000> 3920fff4 f9210070 e86100a0
> ---[ end trace 532d9066d3d4d7ec ]---
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/smp: Set numa node before updating mask
      https://git.kernel.org/powerpc/c/6980d13f0dd189846887bbbfa43793d9a41768d3

cheers

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 15:42 [PATCH 1/1] powerpc/smp: Set numa node before updating mask Srikar Dronamraju
2021-04-01 22:51 ` Nathan Lynch
2021-04-02  3:18   ` Srikar Dronamraju
2021-04-07 12:19     ` Nathan Lynch
2021-04-07 16:49       ` Srikar Dronamraju
2021-04-07 19:46         ` Nathan Lynch
2021-04-08 11:11           ` Srikar Dronamraju
2021-04-08 20:00             ` Nathan Lynch
2021-04-09 10:14   ` Srikar Dronamraju
2021-04-13 12:23     ` Nathan Lynch
2021-04-13 12:25 ` Nathan Lynch
2021-04-19  4:00 ` Michael Ellerman

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git