LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/5] Early node associativity
@ 2019-09-06 13:50 Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 1/5] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Srikar Dronamraju, Nicholas Piggin, Abdul Haleem,
	Satheesh Rajendran, linuxppc-dev

Abdul reported  a warning on a shared lpar.
"WARNING: workqueue cpumask: online intersect > possible intersect".
This is because per node workqueue possible mask is set very early in the
boot process even before the system was querying the home node
associativity. However per node workqueue online cpumask gets updated
dynamically. Hence there is a chance when per node workqueue online cpumask
is a superset of per node workqueue possible mask.

Link for v1: https://patchwork.ozlabs.org/patch/1151658
Changelog: v1->v2
 - Handled comments from Nathan Lynch.

Link for v2: http://lkml.kernel.org/r/20190829055023.6171-1-srikar@linux.vnet.ibm.com
Changelog: v2->v3
 - Handled comments from Nathan Lynch.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>

Srikar Dronamraju (5):
  powerpc/vphn: Check for error from hcall_vphn
  powerpc/numa: Handle extra hcall_vphn error cases
  powerpc/numa: Use cpu node map of first sibling thread
  powerpc/numa: Early request for home node associativity
  powerpc/numa: Remove late request for home node associativity

 arch/powerpc/include/asm/topology.h   |  4 --
 arch/powerpc/kernel/smp.c             |  5 --
 arch/powerpc/mm/numa.c                | 96 ++++++++++++++++++++++++++---------
 arch/powerpc/platforms/pseries/vphn.c |  3 +-
 4 files changed, 74 insertions(+), 34 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/5] powerpc/vphn: Check for error from hcall_vphn
  2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
@ 2019-09-06 13:50 ` Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 2/5] powerpc/numa: Handle extra hcall_vphn error cases Srikar Dronamraju
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Satheesh Rajendran, linuxppc-dev,
	Srikar Dronamraju, Nicholas Piggin

There is no value in unpacking associativity, if
H_HOME_NODE_ASSOCIATIVITY hcall has returned an error.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
Changelog (v2->v1):
- Split the patch into 2(Suggested by Nathan).

 arch/powerpc/platforms/pseries/vphn.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/vphn.c b/arch/powerpc/platforms/pseries/vphn.c
index 3f07bf6..cca474a 100644
--- a/arch/powerpc/platforms/pseries/vphn.c
+++ b/arch/powerpc/platforms/pseries/vphn.c
@@ -82,7 +82,8 @@ long hcall_vphn(unsigned long cpu, u64 flags, __be32 *associativity)
 	long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
 
 	rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, cpu);
-	vphn_unpack_associativity(retbuf, associativity);
+	if (rc == H_SUCCESS)
+		vphn_unpack_associativity(retbuf, associativity);
 
 	return rc;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/5] powerpc/numa: Handle extra hcall_vphn error cases
  2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 1/5] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
@ 2019-09-06 13:50 ` Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread Srikar Dronamraju
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Satheesh Rajendran, linuxppc-dev,
	Srikar Dronamraju, Nicholas Piggin

Currently code handles H_FUNCTION, H_SUCCESS, H_HARDWARE return codes.
However hcall_vphn can return other return codes. Now it also handles
H_PARAMETER return code.  Also the rest return codes are handled under the
default case.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
Changelog (v2->v1):
 Handled comments from Nathan:
  - Split patch from patch 1.
  - Corrected a problem where I missed calling stop_topology_update().
  - Using pr_err_ratelimited instead of printk.

 arch/powerpc/mm/numa.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 50d68d2..8fbe57c 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1191,23 +1191,30 @@ static long vphn_get_associativity(unsigned long cpu,
 				VPHN_FLAG_VCPU, associativity);
 
 	switch (rc) {
+	case H_SUCCESS:
+		dbg("VPHN hcall succeeded. Reset polling...\n");
+		timed_topology_update(0);
+		goto out;
+
 	case H_FUNCTION:
-		printk_once(KERN_INFO
-			"VPHN is not supported. Disabling polling...\n");
-		stop_topology_update();
+		pr_err_ratelimited("VPHN unsupported. Disabling polling...\n");
 		break;
 	case H_HARDWARE:
-		printk(KERN_ERR
-			"hcall_vphn() experienced a hardware fault "
+		pr_err_ratelimited("hcall_vphn() experienced a hardware fault "
 			"preventing VPHN. Disabling polling...\n");
-		stop_topology_update();
 		break;
-	case H_SUCCESS:
-		dbg("VPHN hcall succeeded. Reset polling...\n");
-		timed_topology_update(0);
+	case H_PARAMETER:
+		pr_err_ratelimited("hcall_vphn() was passed an invalid parameter. "
+			"Disabling polling...\n");
+		break;
+	default:
+		pr_err_ratelimited("hcall_vphn() returned %ld. Disabling polling...\n"
+			, rc);
 		break;
 	}
 
+	stop_topology_update();
+out:
 	return rc;
 }
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 1/5] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 2/5] powerpc/numa: Handle extra hcall_vphn error cases Srikar Dronamraju
@ 2019-09-06 13:50 ` Srikar Dronamraju
  2019-09-11 14:48   ` Nathan Lynch
  2019-09-06 13:50 ` [PATCH v3 4/5] powerpc/numa: Early request for home node associativity Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 5/5] powerpc/numa: Remove late " Srikar Dronamraju
  4 siblings, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Satheesh Rajendran, linuxppc-dev,
	Srikar Dronamraju, Nicholas Piggin

All the sibling threads of a core have to be part of the same node.
To ensure that all the sibling threads map to the same node, always
lookup/update the cpu-to-node map of the first thread in the core.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8fbe57c..d0af9a2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
  */
 static int numa_setup_cpu(unsigned long lcpu)
 {
-	int nid = NUMA_NO_NODE;
 	struct device_node *cpu;
+	int fcpu = cpu_first_thread_sibling(lcpu);
+	int nid = NUMA_NO_NODE;
 
 	/*
 	 * If a valid cpu-to-node mapping is already available, use it
 	 * directly instead of querying the firmware, since it represents
 	 * the most recent mapping notified to us by the platform (eg: VPHN).
+	 * Since cpu_to_node binding remains the same for all threads in the
+	 * core. If a valid cpu-to-node mapping is already available, for
+	 * the first thread in the core, use it.
 	 */
-	if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
+	nid = numa_cpu_lookup_table[fcpu];
+	if (nid >= 0) {
 		map_cpu_to_node(lcpu, nid);
 		return nid;
 	}
@@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
 	if (nid < 0 || !node_possible(nid))
 		nid = first_online_node;
 
+	/*
+	 * Update for the first thread of the core. All threads of a core
+	 * have to be part of the same node. This not only avoids querying
+	 * for every other thread in the core, but always avoids a case
+	 * where virtual node associativity change causes subsequent threads
+	 * of a core to be associated with different nid.
+	 */
+	if (fcpu != lcpu)
+		map_cpu_to_node(fcpu, nid);
+
 	map_cpu_to_node(lcpu, nid);
 	of_node_put(cpu);
 out:
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 4/5] powerpc/numa: Early request for home node associativity
  2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2019-09-06 13:50 ` [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread Srikar Dronamraju
@ 2019-09-06 13:50 ` Srikar Dronamraju
  2019-09-06 13:50 ` [PATCH v3 5/5] powerpc/numa: Remove late " Srikar Dronamraju
  4 siblings, 0 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Satheesh Rajendran, linuxppc-dev,
	Srikar Dronamraju, Nicholas Piggin

Currently the kernel detects if its running on a shared lpar platform
and requests home node associativity before the scheduler sched_domains
are setup. However between the time NUMA setup is initialized and the
request for home node associativity, workqueue initializes its per node
cpumask. The per node workqueue possible cpumask may turn invalid
after home node associativity resulting in weird situations like
workqueue possible cpumask being a subset of workqueue online cpumask.

This can be fixed by requesting home node associativity earlier just
before NUMA setup. However at the NUMA setup time, kernel may not be in
a position to detect if its running on a shared lpar platform. So
request for home node associativity and if the request fails, fallback
on the device tree property.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
Changelog (v1->v2):
- Handled comments from Nathan Lynch
  * Dont depend on pacas to be setup for the hwid

Changelog (v2->v3):
- Handled comments from Nathan Lynch
  * Use first thread of the core for cpu-to-node map.
  * get hardware-id in numa_setup_cpu

 arch/powerpc/mm/numa.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d0af9a2..5bb11ef 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -461,13 +461,27 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 	return nid;
 }
 
+static int vphn_get_nid(long hwid)
+{
+	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	long rc;
+
+	rc = hcall_vphn(hwid, VPHN_FLAG_VCPU, associativity);
+	if (rc == H_SUCCESS)
+		return associativity_to_nid(associativity);
+
+	return NUMA_NO_NODE;
+}
+
 /*
  * Figure out to which domain a cpu belongs and stick it there.
+ * cpu_to_phys_id is only valid between smp_setup_cpu_maps() and
+ * smp_setup_pacas(). If called outside this window, set get_hwid to true.
  * Return the id of the domain used.
  */
-static int numa_setup_cpu(unsigned long lcpu)
+static int numa_setup_cpu(unsigned long lcpu, bool get_hwid)
 {
-	struct device_node *cpu;
+	struct device_node *cpu = NULL;
 	int fcpu = cpu_first_thread_sibling(lcpu);
 	int nid = NUMA_NO_NODE;
 
@@ -485,6 +499,27 @@ static int numa_setup_cpu(unsigned long lcpu)
 		return nid;
 	}
 
+	/*
+	 * On a shared lpar, device tree will not have node associativity.
+	 * At this time lppaca, or its __old_status field may not be
+	 * updated. Hence kernel cannot detect if its on a shared lpar. So
+	 * request an explicit associativity irrespective of whether the
+	 * lpar is shared or dedicated. Use the device tree property as a
+	 * fallback.
+	 */
+	if (firmware_has_feature(FW_FEATURE_VPHN)) {
+		long hwid;
+
+		if (get_hwid)
+			hwid = get_hard_smp_processor_id(lcpu);
+		else
+			hwid = cpu_to_phys_id[lcpu];
+		nid = vphn_get_nid(hwid);
+	}
+
+	if (nid != NUMA_NO_NODE)
+		goto out_present;
+
 	cpu = of_get_cpu_node(lcpu, NULL);
 
 	if (!cpu) {
@@ -496,6 +531,7 @@ static int numa_setup_cpu(unsigned long lcpu)
 	}
 
 	nid = of_node_to_nid_single(cpu);
+	of_node_put(cpu);
 
 out_present:
 	if (nid < 0 || !node_possible(nid))
@@ -512,7 +548,6 @@ static int numa_setup_cpu(unsigned long lcpu)
 		map_cpu_to_node(fcpu, nid);
 
 	map_cpu_to_node(lcpu, nid);
-	of_node_put(cpu);
 out:
 	return nid;
 }
@@ -543,7 +578,7 @@ static int ppc_numa_cpu_prepare(unsigned int cpu)
 {
 	int nid;
 
-	nid = numa_setup_cpu(cpu);
+	nid = numa_setup_cpu(cpu, true);
 	verify_cpu_node_mapping(cpu, nid);
 	return 0;
 }
@@ -890,7 +925,7 @@ void __init mem_topology_setup(void)
 	reset_numa_cpu_lookup_table();
 
 	for_each_present_cpu(cpu)
-		numa_setup_cpu(cpu);
+		numa_setup_cpu(cpu, false);
 }
 
 void __init initmem_init(void)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 5/5] powerpc/numa: Remove late request for home node associativity
  2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
                   ` (3 preceding siblings ...)
  2019-09-06 13:50 ` [PATCH v3 4/5] powerpc/numa: Early request for home node associativity Srikar Dronamraju
@ 2019-09-06 13:50 ` " Srikar Dronamraju
  4 siblings, 0 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-06 13:50 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Satheesh Rajendran, linuxppc-dev,
	Srikar Dronamraju, Nicholas Piggin

With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea626306810 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.

Hence remove the late request for home node associativity.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h | 4 ----
 arch/powerpc/kernel/smp.c           | 5 -----
 arch/powerpc/mm/numa.c              | 9 ---------
 3 files changed, 18 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 2f7e1ea..9bd396f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,7 +98,6 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 extern int timed_topology_update(int nsecs);
-extern void __init shared_proc_topology_init(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -121,9 +120,6 @@ static inline int timed_topology_update(int nsecs)
 	return 0;
 }
 
-#ifdef CONFIG_SMP
-static inline void shared_proc_topology_init(void) {}
-#endif
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ea6adbf..cdd39a0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1359,11 +1359,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	if (smp_ops && smp_ops->bringup_done)
 		smp_ops->bringup_done();
 
-	/*
-	 * On a shared LPAR, associativity needs to be requested.
-	 * Hence, get numa topology before dumping cpu topology
-	 */
-	shared_proc_topology_init();
 	dump_numa_cpu_topology();
 
 #ifdef CONFIG_SCHED_SMT
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 5bb11ef..22a7bf5 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1625,15 +1625,6 @@ int prrn_is_enabled(void)
 	return prrn_enabled;
 }
 
-void __init shared_proc_topology_init(void)
-{
-	if (lppaca_shared_proc(get_lppaca())) {
-		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
-			    nr_cpumask_bits);
-		numa_update_cpu_topology(false);
-	}
-}
-
 static int topology_read(struct seq_file *file, void *v)
 {
 	if (vphn_enabled || prrn_enabled)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-06 13:50 ` [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread Srikar Dronamraju
@ 2019-09-11 14:48   ` Nathan Lynch
  2019-09-11 17:05     ` Srikar Dronamraju
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Lynch @ 2019-09-11 14:48 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Satheesh Rajendran, linuxppc-dev, Srikar Dronamraju, Nicholas Piggin

Hi Srikar,

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> @@ -467,15 +467,20 @@ static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   */
>  static int numa_setup_cpu(unsigned long lcpu)
>  {
> -	int nid = NUMA_NO_NODE;
>  	struct device_node *cpu;
> +	int fcpu = cpu_first_thread_sibling(lcpu);
> +	int nid = NUMA_NO_NODE;
>  
>  	/*
>  	 * If a valid cpu-to-node mapping is already available, use it
>  	 * directly instead of querying the firmware, since it represents
>  	 * the most recent mapping notified to us by the platform (eg: VPHN).
> +	 * Since cpu_to_node binding remains the same for all threads in the
> +	 * core. If a valid cpu-to-node mapping is already available, for
> +	 * the first thread in the core, use it.
>  	 */
> -	if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
> +	nid = numa_cpu_lookup_table[fcpu];
> +	if (nid >= 0) {
>  		map_cpu_to_node(lcpu, nid);
>  		return nid;
>  	}

Yes, we need to something like this to prevent a VPHN change that occurs
concurrently with onlining a core's threads from messing us up.

Is it a good assumption that the first thread of a sibling group will
have its mapping initialized first? I think the answer is yes for boot,
but hotplug... not so sure.


> @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
>  	if (nid < 0 || !node_possible(nid))
>  		nid = first_online_node;
>  
> +	/*
> +	 * Update for the first thread of the core. All threads of a core
> +	 * have to be part of the same node. This not only avoids querying
> +	 * for every other thread in the core, but always avoids a case
> +	 * where virtual node associativity change causes subsequent threads
> +	 * of a core to be associated with different nid.
> +	 */
> +	if (fcpu != lcpu)
> +		map_cpu_to_node(fcpu, nid);
> +

OK, I see that this somewhat addresses my concern above. But changing
this mapping for a remote cpu is unsafe except under specific
circumstances. I think this should first assert:

* numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
* cpu_online(fcpu) == false

to document and enforce the conditions that must hold for this to be OK.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-11 14:48   ` Nathan Lynch
@ 2019-09-11 17:05     ` Srikar Dronamraju
  2019-09-12 16:41       ` Nathan Lynch
  0 siblings, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-11 17:05 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Satheesh Rajendran, linuxppc-dev, Nicholas Piggin

Hi Nathan, 

Thanks for your reviews.

> > -	if ((nid = numa_cpu_lookup_table[lcpu]) >= 0) {
> > +	nid = numa_cpu_lookup_table[fcpu];
> > +	if (nid >= 0) {
> >  		map_cpu_to_node(lcpu, nid);
> >  		return nid;
> >  	}
> 
> Yes, we need to something like this to prevent a VPHN change that occurs
> concurrently with onlining a core's threads from messing us up.
> 
> Is it a good assumption that the first thread of a sibling group will
> have its mapping initialized first? I think the answer is yes for boot,
> but hotplug... not so sure.
> 
> 
> > @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
> >  	if (nid < 0 || !node_possible(nid))
> >  		nid = first_online_node;
> >  
> > +	/*
> > +	 * Update for the first thread of the core. All threads of a core
> > +	 * have to be part of the same node. This not only avoids querying
> > +	 * for every other thread in the core, but always avoids a case
> > +	 * where virtual node associativity change causes subsequent threads
> > +	 * of a core to be associated with different nid.
> > +	 */
> > +	if (fcpu != lcpu)
> > +		map_cpu_to_node(fcpu, nid);
> > +
> 
> OK, I see that this somewhat addresses my concern above. But changing
> this mapping for a remote cpu is unsafe except under specific
> circumstances. I think this should first assert:
> 
> * numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
> * cpu_online(fcpu) == false
> 
> to document and enforce the conditions that must hold for this to be OK.

I do understand that we shouldn't be modifying the nid for a different cpu.

We just checked above that the mapping for the first cpu doesnt exist.
If the first cpu (or remote cpu as you coin it) was online, then its
mapping should have existed and we return even before we come here.

nid = numa_cpu_lookup_table[fcpu];
if (nid >= 0) {
	map_cpu_to_node(lcpu, nid);
	return nid;
}

Currently numa_setup_cpus is only called at very early boot and in cpu
hotplug. At hotplug time, the oneline of cpus is serialized. Right? Do we 
see a chance of remote cpu changing its state as we set its nid here?

Also lets say if we assert and for some unknown reason the assertion fails.
How do we handle the failure case?  We cant get out without setting
the nid. We cant continue setting the nid. Should we panic the system given
that the check a few lines above is now turning out to be false? Probably
no, as I think we can live with it.

Any thoughts?

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-11 17:05     ` Srikar Dronamraju
@ 2019-09-12 16:41       ` Nathan Lynch
  2019-09-12 17:23         ` Srikar Dronamraju
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Lynch @ 2019-09-12 16:41 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Satheesh Rajendran, linuxppc-dev, Nicholas Piggin

Hi Srikar,

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>> > @@ -496,6 +501,16 @@ static int numa_setup_cpu(unsigned long lcpu)
>> >  	if (nid < 0 || !node_possible(nid))
>> >  		nid = first_online_node;
>> >  
>> > +	/*
>> > +	 * Update for the first thread of the core. All threads of a core
>> > +	 * have to be part of the same node. This not only avoids querying
>> > +	 * for every other thread in the core, but always avoids a case
>> > +	 * where virtual node associativity change causes subsequent threads
>> > +	 * of a core to be associated with different nid.
>> > +	 */
>> > +	if (fcpu != lcpu)
>> > +		map_cpu_to_node(fcpu, nid);
>> > +
>> 
>> OK, I see that this somewhat addresses my concern above. But changing
>> this mapping for a remote cpu is unsafe except under specific
>> circumstances. I think this should first assert:
>> 
>> * numa_cpu_lookup_table[fcpu] == NUMA_NO_NODE
>> * cpu_online(fcpu) == false
>> 
>> to document and enforce the conditions that must hold for this to be OK.
>
> I do understand that we shouldn't be modifying the nid for a different cpu.
>
> We just checked above that the mapping for the first cpu doesnt exist.
> If the first cpu (or remote cpu as you coin it) was online, then its
> mapping should have existed and we return even before we come here.

I agree that is how the code will work with your change, and I'm fine
with simply warning if fcpu is offline.

The point is to make this rule more explicit in the code for the benefit
of future readers and to catch violations of it by future changes. There
is a fair amount of code remaining in this file and elsewhere in
arch/powerpc that was written under the impression that changing the
cpu-node relationship at runtime is OK.


> nid = numa_cpu_lookup_table[fcpu];
> if (nid >= 0) {
> 	map_cpu_to_node(lcpu, nid);
> 	return nid;
> }
>
> Currently numa_setup_cpus is only called at very early boot and in cpu
> hotplug. At hotplug time, the oneline of cpus is serialized. Right? Do we 
> see a chance of remote cpu changing its state as we set its nid here?
>
> Also lets say if we assert and for some unknown reason the assertion fails.
> How do we handle the failure case?  We cant get out without setting
> the nid. We cant continue setting the nid. Should we panic the system given
> that the check a few lines above is now turning out to be false? Probably
> no, as I think we can live with it.
>
> Any thoughts?

I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
experience, the downstream effects of violating this condition are
varied and quite difficult to debug. Seems only appropriate to emit a
warning and stack trace before the OS inevitably becomes unstable.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-12 16:41       ` Nathan Lynch
@ 2019-09-12 17:23         ` Srikar Dronamraju
  2019-09-12 18:15           ` Nathan Lynch
  0 siblings, 1 reply; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-12 17:23 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Satheesh Rajendran, linuxppc-dev, Nicholas Piggin

> 
> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
> experience, the downstream effects of violating this condition are
> varied and quite difficult to debug. Seems only appropriate to emit a
> warning and stack trace before the OS inevitably becomes unstable.

I still have to try but wouldn't this be a problem for the boot-cpu?
I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
No?

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-12 17:23         ` Srikar Dronamraju
@ 2019-09-12 18:15           ` Nathan Lynch
  2019-09-13  5:44             ` Srikar Dronamraju
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Lynch @ 2019-09-12 18:15 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Satheesh Rajendran, linuxppc-dev, Nicholas Piggin

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

>> 
>> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
>> experience, the downstream effects of violating this condition are
>> varied and quite difficult to debug. Seems only appropriate to emit a
>> warning and stack trace before the OS inevitably becomes unstable.
>
> I still have to try but wouldn't this be a problem for the boot-cpu?
> I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
> No?

This is what I mean:

 +      if (fcpu != lcpu) {
 +              WARN_ON(cpu_online(fcpu));
 +              map_cpu_to_node(fcpu, nid);
 +      }

I.e. if we're modifying the mapping for a remote cpu, warn if it's
online.

I don't think this would warn on the boot cpu -- I would expect fcpu and
lcpu to be the same and this branch would not be taken.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread
  2019-09-12 18:15           ` Nathan Lynch
@ 2019-09-13  5:44             ` Srikar Dronamraju
  0 siblings, 0 replies; 12+ messages in thread
From: Srikar Dronamraju @ 2019-09-13  5:44 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Satheesh Rajendran, linuxppc-dev, Nicholas Piggin

* Nathan Lynch <nathanl@linux.ibm.com> [2019-09-12 13:15:03]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> 
> >> 
> >> I think just WARN_ON(cpu_online(fcpu)) would be satisfactory. In my
> >> experience, the downstream effects of violating this condition are
> >> varied and quite difficult to debug. Seems only appropriate to emit a
> >> warning and stack trace before the OS inevitably becomes unstable.
> >
> > I still have to try but wouldn't this be a problem for the boot-cpu?
> > I mean boot-cpu would be marked online while it tries to do numa_setup_cpu.
> > No?
> 
> This is what I mean:
> 
>  +      if (fcpu != lcpu) {
>  +              WARN_ON(cpu_online(fcpu));
>  +              map_cpu_to_node(fcpu, nid);
>  +      }
> 

Yes this should work. Will send an updated patch with this change.

> I.e. if we're modifying the mapping for a remote cpu, warn if it's
> online.
> 
> I don't think this would warn on the boot cpu -- I would expect fcpu and
> lcpu to be the same and this branch would not be taken.

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, back to index

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06 13:50 [PATCH v3 0/5] Early node associativity Srikar Dronamraju
2019-09-06 13:50 ` [PATCH v3 1/5] powerpc/vphn: Check for error from hcall_vphn Srikar Dronamraju
2019-09-06 13:50 ` [PATCH v3 2/5] powerpc/numa: Handle extra hcall_vphn error cases Srikar Dronamraju
2019-09-06 13:50 ` [PATCH v3 3/5] powerpc/numa: Use cpu node map of first sibling thread Srikar Dronamraju
2019-09-11 14:48   ` Nathan Lynch
2019-09-11 17:05     ` Srikar Dronamraju
2019-09-12 16:41       ` Nathan Lynch
2019-09-12 17:23         ` Srikar Dronamraju
2019-09-12 18:15           ` Nathan Lynch
2019-09-13  5:44             ` Srikar Dronamraju
2019-09-06 13:50 ` [PATCH v3 4/5] powerpc/numa: Early request for home node associativity Srikar Dronamraju
2019-09-06 13:50 ` [PATCH v3 5/5] powerpc/numa: Remove late " Srikar Dronamraju

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org linuxppc-dev@archiver.kernel.org
	public-inbox-index linuxppc-dev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox