All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug
@ 2018-07-06 11:02 Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 1/7] arm64: topology: refactor reset_cpu_topology to add support for removing topology Sudeep Holla
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Commit 37c3ec2d810f ("arm64: topology: divorce MC scheduling domain from core_siblings")
was reported to cause some hotplug and system suspend regressions when
it was merged. On investigation, it was found that unlike x86/PPC,
arm64 doesn't update the cpu and numa masks on CPU hotplug. That's
somewhat expected from the scheduler.

Since these changes were bit invasive as a solution to the above
mentioned regression, as small change was temporarily applied as a fix.
This series updates the cpu and numa masks on CPU hotplug and reverts
that temporary fix.

It would be good to get this tested(CPU hotplug - few and all CPUs in
a socket) on multi-socket/NUMA systems from Cavium and Huawei/Hisilicon.

Regards,
Sudeep

v2->v3:
	- Removed confusing boolean reset argument and inlined reset
	  part in reset_cpu_topology
	- Moved cpu_llc_shared_mask to asm/topology.h along with other
	  similar macros and renamed it topology_llc_cpumask
	- Use NUMA_NO_NODE instead of checking for any -ve integer

v1->v2:
	- Rebased on v4.18-rc1 and hence do revert of the temporary fix
	  that was merged for v4.18
	- Removed one of the wrong use of possible_mask

Sudeep Holla (7):
  arm64: topology: refactor reset_cpu_topology to add support for removing topology
  arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap
  arm64: topology: add support to remove cpu topology sibling masks
  arm64: topology: restrict updating siblings_masks to online cpus only
  arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  arm64: topology: rename llc_siblings to align with other struct members
  arm64: topology: re-introduce numa mask check for scheduler MC selection

 arch/arm64/include/asm/numa.h     |  4 +++
 arch/arm64/include/asm/topology.h |  4 ++-
 arch/arm64/kernel/smp.c           |  5 ++++
 arch/arm64/kernel/topology.c      | 58 +++++++++++++++++++++++++++------------
 arch/arm64/mm/numa.c              | 29 ++++++++++++++------
 5 files changed, 74 insertions(+), 26 deletions(-)

--
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/7] arm64: topology: refactor reset_cpu_topology to add support for removing topology
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 2/7] arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap Sudeep Holla
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Currently reset_cpu_topology clears all the CPU topology information
and resets to default values. However we may need to just clear the
information when we hotplug out the CPU. In preparation to add the
support the same, let's refactor reset_cpu_topology to just reset
the information and move clearing out the topology information to
clear_cpu_topology.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/kernel/topology.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index f845a8617812..b64733c5ea28 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -293,6 +293,19 @@ void store_cpu_topology(unsigned int cpuid)
 	update_siblings_masks(cpuid);
 }
 
+static void clear_cpu_topology(int cpu)
+{
+	struct cpu_topology *cpu_topo = &cpu_topology[cpu];
+
+	cpumask_clear(&cpu_topo->llc_siblings);
+	cpumask_set_cpu(cpu, &cpu_topo->llc_siblings);
+
+	cpumask_clear(&cpu_topo->core_sibling);
+	cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
+	cpumask_clear(&cpu_topo->thread_sibling);
+	cpumask_set_cpu(cpu, &cpu_topo->thread_sibling);
+}
+
 static void __init reset_cpu_topology(void)
 {
 	unsigned int cpu;
@@ -303,15 +316,9 @@ static void __init reset_cpu_topology(void)
 		cpu_topo->thread_id = -1;
 		cpu_topo->core_id = 0;
 		cpu_topo->package_id = -1;
-
 		cpu_topo->llc_id = -1;
-		cpumask_clear(&cpu_topo->llc_siblings);
-		cpumask_set_cpu(cpu, &cpu_topo->llc_siblings);
 
-		cpumask_clear(&cpu_topo->core_sibling);
-		cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
-		cpumask_clear(&cpu_topo->thread_sibling);
-		cpumask_set_cpu(cpu, &cpu_topo->thread_sibling);
+		clear_cpu_topology(cpu);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/7] arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 1/7] arm64: topology: refactor reset_cpu_topology to add support for removing topology Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 3/7] arm64: topology: add support to remove cpu topology sibling masks Sudeep Holla
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Currently numa_clear_node removes both cpu information from the NUMA
node cpumap as well as the NUMA node id from the cpu. Similarly
numa_store_cpu_info updates both percpu nodeid and NUMA cpumap.

However we need to retain the numa node id for the cpu and only remove
the cpu information from the numa node cpumap during CPU hotplug out.
The same can be extended for hotplugging in the CPU.

This patch separates out numa_{add,remove}_cpu from numa_clear_node and
numa_store_cpu_info.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/include/asm/numa.h |  4 ++++
 arch/arm64/kernel/smp.c       |  2 ++
 arch/arm64/mm/numa.c          | 29 +++++++++++++++++++++--------
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 01bc46d5b43a..626ad01e83bf 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -35,10 +35,14 @@ void __init numa_set_distance(int from, int to, int distance);
 void __init numa_free_distance(void);
 void __init early_map_cpu_to_node(unsigned int cpu, int nid);
 void numa_store_cpu_info(unsigned int cpu);
+void numa_add_cpu(unsigned int cpu);
+void numa_remove_cpu(unsigned int cpu);
 
 #else	/* CONFIG_NUMA */
 
 static inline void numa_store_cpu_info(unsigned int cpu) { }
+static inline void numa_add_cpu(unsigned int cpu) { }
+static inline void numa_remove_cpu(unsigned int cpu) { }
 static inline void arm64_numa_init(void) { }
 static inline void early_map_cpu_to_node(unsigned int cpu, int nid) { }
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 2faa9863d2e5..2a315f32fad3 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -225,6 +225,7 @@ asmlinkage notrace void secondary_start_kernel(void)
 	notify_cpu_starting(cpu);
 
 	store_cpu_topology(cpu);
+	numa_add_cpu(cpu);
 
 	/*
 	 * OK, now it's safe to let the boot CPU continue.  Wait for
@@ -679,6 +680,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	this_cpu = smp_processor_id();
 	store_cpu_topology(this_cpu);
 	numa_store_cpu_info(this_cpu);
+	numa_add_cpu(this_cpu);
 
 	/*
 	 * If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index dad128ba98bf..146c04ceaa51 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -70,19 +70,32 @@ EXPORT_SYMBOL(cpumask_of_node);
 
 #endif
 
-static void map_cpu_to_node(unsigned int cpu, int nid)
+static void numa_update_cpu(unsigned int cpu, bool remove)
 {
-	set_cpu_numa_node(cpu, nid);
-	if (nid >= 0)
+	int nid = cpu_to_node(cpu);
+
+	if (nid == NUMA_NO_NODE)
+		return;
+
+	if (remove)
+		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+	else
 		cpumask_set_cpu(cpu, node_to_cpumask_map[nid]);
 }
 
-void numa_clear_node(unsigned int cpu)
+void numa_add_cpu(unsigned int cpu)
 {
-	int nid = cpu_to_node(cpu);
+	numa_update_cpu(cpu, false);
+}
 
-	if (nid >= 0)
-		cpumask_clear_cpu(cpu, node_to_cpumask_map[nid]);
+void numa_remove_cpu(unsigned int cpu)
+{
+	numa_update_cpu(cpu, true);
+}
+
+void numa_clear_node(unsigned int cpu)
+{
+	numa_remove_cpu(cpu);
 	set_cpu_numa_node(cpu, NUMA_NO_NODE);
 }
 
@@ -116,7 +129,7 @@ static void __init setup_node_to_cpumask_map(void)
  */
 void numa_store_cpu_info(unsigned int cpu)
 {
-	map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
+	set_cpu_numa_node(cpu, cpu_to_node_map[cpu]);
 }
 
 void __init early_map_cpu_to_node(unsigned int cpu, int nid)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/7] arm64: topology: add support to remove cpu topology sibling masks
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 1/7] arm64: topology: refactor reset_cpu_topology to add support for removing topology Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 2/7] arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 4/7] arm64: topology: restrict updating siblings_masks to online cpus only Sudeep Holla
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support to remove all the CPU topology information using
clear_cpu_topology and also resetting the sibling information on other
sibling CPUs. This will be used in cpu_disable so that all the topology
sibling information is removed on CPU hotplug out.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/include/asm/topology.h |  2 ++
 arch/arm64/kernel/topology.c      | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index df48212f767b..4c073d28f9e4 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -20,9 +20,11 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
+#define topology_llc_cpumask(cpu)	(&cpu_topology[cpu].llc_siblings)
 
 void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
+void remove_cpu_topology(unsigned int cpuid);
 const struct cpumask *cpu_coregroup_mask(int cpu);
 
 #ifdef CONFIG_NUMA
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index b64733c5ea28..cb81df545359 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -322,6 +322,20 @@ static void __init reset_cpu_topology(void)
 	}
 }
 
+void remove_cpu_topology(unsigned int cpu)
+{
+	int sibling;
+
+	for_each_cpu(sibling, topology_core_cpumask(cpu))
+		cpumask_clear_cpu(cpu, topology_core_cpumask(sibling));
+	for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+		cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
+	for_each_cpu(sibling, topology_llc_cpumask(cpu))
+		cpumask_clear_cpu(cpu, topology_llc_cpumask(sibling));
+
+	clear_cpu_topology(cpu);
+}
+
 #ifdef CONFIG_ACPI
 /*
  * Propagate the topology information of the processor_topology_node tree to the
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/7] arm64: topology: restrict updating siblings_masks to online cpus only
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
                   ` (2 preceding siblings ...)
  2018-07-06 11:02 ` [PATCH v3 3/7] arm64: topology: add support to remove cpu topology sibling masks Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU Sudeep Holla
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

It's incorrect to iterate over all the possible CPUs to update the
sibling masks when any CPU is hotplugged in. In case the topology
siblings masks of the CPU is removed when is it hotplugged out, we
end up updating those masks when one of it's sibling is powered up
again. This will provide inconsistent view.

Further, since the CPU calling update_sibling_masks is yet to be set
online, there's no need to compare itself with each online CPU when
updating the siblings masks.

This patch restricts updation of sibling masks only for CPUs that are
already online. It also the drops the unnecessary cpuid check.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/kernel/topology.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index cb81df545359..1a3f8ab455d0 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -231,7 +231,7 @@ static void update_siblings_masks(unsigned int cpuid)
 	int cpu;
 
 	/* update core and thread sibling masks */
-	for_each_possible_cpu(cpu) {
+	for_each_online_cpu(cpu) {
 		cpu_topo = &cpu_topology[cpu];
 
 		if (cpuid_topo->llc_id == cpu_topo->llc_id) {
@@ -243,15 +243,13 @@ static void update_siblings_masks(unsigned int cpuid)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
-		if (cpu != cpuid)
-			cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
+		cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
 
 		if (cpuid_topo->core_id != cpu_topo->core_id)
 			continue;
 
 		cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling);
-		if (cpu != cpuid)
-			cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
+		cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
 	}
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
                   ` (3 preceding siblings ...)
  2018-07-06 11:02 ` [PATCH v3 4/7] arm64: topology: restrict updating siblings_masks to online cpus only Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-17 12:58     ` Geert Uytterhoeven
  2018-07-06 11:02 ` [PATCH v3 6/7] arm64: topology: rename llc_siblings to align with other struct members Sudeep Holla
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

We already repopulate the information on CPU hotplug-in, so we can safely
remove the CPU topology and NUMA cpumap information during CPU hotplug
out operation. This will help to provide the correct cpumask for
scheduler domains.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/kernel/smp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 2a315f32fad3..a44cc09059b5 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -279,6 +279,9 @@ int __cpu_disable(void)
 	if (ret)
 		return ret;
 
+	remove_cpu_topology(cpu);
+	numa_remove_cpu(cpu);
+
 	/*
 	 * Take this CPU offline.  Once we clear this, we can't return,
 	 * and we must not schedule until we're ready to give up the cpu.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/7] arm64: topology: rename llc_siblings to align with other struct members
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
                   ` (4 preceding siblings ...)
  2018-07-06 11:02 ` [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-06 11:02 ` [PATCH v3 7/7] arm64: topology: re-introduce numa mask check for scheduler MC selection Sudeep Holla
  2018-07-10 21:51 ` [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Jeremy Linton
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Similar to core_sibling and thread_sibling, it's better to align and
rename llc_siblings to llc_sibling.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/include/asm/topology.h |  4 ++--
 arch/arm64/kernel/topology.c      | 12 ++++++------
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 4c073d28f9e4..49a0fee4f89b 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -11,7 +11,7 @@ struct cpu_topology {
 	int llc_id;
 	cpumask_t thread_sibling;
 	cpumask_t core_sibling;
-	cpumask_t llc_siblings;
+	cpumask_t llc_sibling;
 };
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
@@ -20,7 +20,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
-#define topology_llc_cpumask(cpu)	(&cpu_topology[cpu].llc_siblings)
+#define topology_llc_cpumask(cpu)	(&cpu_topology[cpu].llc_sibling)
 
 void init_cpu_topology(void);
 void store_cpu_topology(unsigned int cpuid);
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 1a3f8ab455d0..03b0ed2480aa 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -218,8 +218,8 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 	const cpumask_t *core_mask = &cpu_topology[cpu].core_sibling;
 
 	if (cpu_topology[cpu].llc_id != -1) {
-		if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask))
-			core_mask = &cpu_topology[cpu].llc_siblings;
+		if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
+			core_mask = &cpu_topology[cpu].llc_sibling;
 	}
 
 	return core_mask;
@@ -235,8 +235,8 @@ static void update_siblings_masks(unsigned int cpuid)
 		cpu_topo = &cpu_topology[cpu];
 
 		if (cpuid_topo->llc_id == cpu_topo->llc_id) {
-			cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings);
-			cpumask_set_cpu(cpuid, &cpu_topo->llc_siblings);
+			cpumask_set_cpu(cpu, &cpuid_topo->llc_sibling);
+			cpumask_set_cpu(cpuid, &cpu_topo->llc_sibling);
 		}
 
 		if (cpuid_topo->package_id != cpu_topo->package_id)
@@ -295,8 +295,8 @@ static void clear_cpu_topology(int cpu)
 {
 	struct cpu_topology *cpu_topo = &cpu_topology[cpu];
 
-	cpumask_clear(&cpu_topo->llc_siblings);
-	cpumask_set_cpu(cpu, &cpu_topo->llc_siblings);
+	cpumask_clear(&cpu_topo->llc_sibling);
+	cpumask_set_cpu(cpu, &cpu_topo->llc_sibling);
 
 	cpumask_clear(&cpu_topo->core_sibling);
 	cpumask_set_cpu(cpu, &cpu_topo->core_sibling);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/7] arm64: topology: re-introduce numa mask check for scheduler MC selection
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
                   ` (5 preceding siblings ...)
  2018-07-06 11:02 ` [PATCH v3 6/7] arm64: topology: rename llc_siblings to align with other struct members Sudeep Holla
@ 2018-07-06 11:02 ` Sudeep Holla
  2018-07-10 21:51 ` [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Jeremy Linton
  7 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-06 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 37c3ec2d810f ("arm64: topology: divorce MC scheduling domain from
core_siblings") selected the smallest of LLC, socket siblings, and NUMA
node siblings to ensure that the sched domain we build for the MC layer
isn't larger than the DIE above it or it's shrunk to the socket or NUMA
node if LLC exist acrosis NUMA node/chiplets.

Commit acd32e52e4e0 ("arm64: topology: Avoid checking numa mask for
scheduler MC selection") reverted the NUMA siblings checks since the
CPU topology masks weren't updated on hotplug at that time.

This patch re-introduces numa mask check as the CPU and NUMA topology
is now updated in hotplug paths. Effectively, this patch does the
partial revert of commit acd32e52e4e0.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 arch/arm64/kernel/topology.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 03b0ed2480aa..0825c4a856e3 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -215,8 +215,13 @@ EXPORT_SYMBOL_GPL(cpu_topology);
 
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
-	const cpumask_t *core_mask = &cpu_topology[cpu].core_sibling;
+	const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));
 
+	/* Find the smaller of NUMA, core or LLC siblings */
+	if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
+		/* not numa in package, lets use the package siblings */
+		core_mask = &cpu_topology[cpu].core_sibling;
+	}
 	if (cpu_topology[cpu].llc_id != -1) {
 		if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
 			core_mask = &cpu_topology[cpu].llc_sibling;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug
  2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
                   ` (6 preceding siblings ...)
  2018-07-06 11:02 ` [PATCH v3 7/7] arm64: topology: re-introduce numa mask check for scheduler MC selection Sudeep Holla
@ 2018-07-10 21:51 ` Jeremy Linton
  7 siblings, 0 replies; 25+ messages in thread
From: Jeremy Linton @ 2018-07-10 21:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 07/06/2018 06:02 AM, Sudeep Holla wrote:
> Hi,
> 
> Commit 37c3ec2d810f ("arm64: topology: divorce MC scheduling domain from core_siblings")
> was reported to cause some hotplug and system suspend regressions when
> it was merged. On investigation, it was found that unlike x86/PPC,
> arm64 doesn't update the cpu and numa masks on CPU hotplug. That's
> somewhat expected from the scheduler.
> 
> Since these changes were bit invasive as a solution to the above
> mentioned regression, as small change was temporarily applied as a fix.
> This series updates the cpu and numa masks on CPU hotplug and reverts
> that temporary fix.
> 
> It would be good to get this tested(CPU hotplug - few and all CPUs in
> a socket) on multi-socket/NUMA systems from Cavium and Huawei/Hisilicon.

It looks good, and I tested it in ACPI/PPTT mode with a single socket 
thunderX2, booting with maxcpus and online/offlineing cores. No problems 
found. I did a similar set of tests with a juno, as well as with a 
kernel compiled with CONFIG_NUMA off on the juno, a configuration that 
previously had problems.


Reviewed-by: Jeremy Linton <jeremy.linton@arm.com>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>

Thanks,


> 
> Regards,
> Sudeep
> 
> v2->v3:
> 	- Removed confusing boolean reset argument and inlined reset
> 	  part in reset_cpu_topology
> 	- Moved cpu_llc_shared_mask to asm/topology.h along with other
> 	  similar macros and renamed it topology_llc_cpumask
> 	- Use NUMA_NO_NODE instead of checking for any -ve integer
> 
> v1->v2:
> 	- Rebased on v4.18-rc1 and hence do revert of the temporary fix
> 	  that was merged for v4.18
> 	- Removed one of the wrong use of possible_mask
> 
> Sudeep Holla (7):
>    arm64: topology: refactor reset_cpu_topology to add support for removing topology
>    arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap
>    arm64: topology: add support to remove cpu topology sibling masks
>    arm64: topology: restrict updating siblings_masks to online cpus only
>    arm64: smp: remove cpu and numa topology information when hotplugging out CPU
>    arm64: topology: rename llc_siblings to align with other struct members
>    arm64: topology: re-introduce numa mask check for scheduler MC selection
> 
>   arch/arm64/include/asm/numa.h     |  4 +++
>   arch/arm64/include/asm/topology.h |  4 ++-
>   arch/arm64/kernel/smp.c           |  5 ++++
>   arch/arm64/kernel/topology.c      | 58 +++++++++++++++++++++++++++------------
>   arch/arm64/mm/numa.c              | 29 ++++++++++++++------
>   5 files changed, 74 insertions(+), 26 deletions(-)
> 
> --
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-06 11:02 ` [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU Sudeep Holla
@ 2018-07-17 12:58     ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-17 12:58 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Jeremy Linton,
	Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen, Linux-Renesas

Hi Sudeep,

On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> We already repopulate the information on CPU hotplug-in, so we can safely
> remove the CPU topology and NUMA cpumap information during CPU hotplug
> out operation. This will help to provide the correct cpumask for
> scheduler domains.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
topology information when hotplugging out CPU") in arm64/for-next/core, to
which I bisected a PSCI checker regression on systems with two CPU clusters.

Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:

     psci_checker: PSCI checker started using 8 CPUs

8 CPU cores detected.

     psci_checker: Starting hotplug tests
     psci_checker: Trying to turn off and on again all CPUs
     CPU1: shutdown
     psci: CPU1 killed.
     CPU2: shutdown
     psci: CPU2 killed.
    -NOHZ: local_softirq_pending 55
     CPU3: shutdown
     psci: CPU3 killed.
    -NOHZ: local_softirq_pending 51
     CPU4: shutdown
     psci: CPU4 killed.
     NOHZ: local_softirq_pending 55
     CPU5: shutdown
     psci: CPU5 killed.
     NOHZ: local_softirq_pending 55
     CPU6: shutdown
     psci: CPU6 killed.
     NOHZ: local_softirq_pending 55
     CPU7: shutdown
     psci: CPU7 killed.
     Detected PIPT I-cache on CPU1
     CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
     Detected PIPT I-cache on CPU2
     CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
     Detected PIPT I-cache on CPU3
     CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
     Detected VIPT I-cache on CPU4
     CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
     cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
     cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed
to: 1200000 KHz
     Detected VIPT I-cache on CPU5
     CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
     Detected VIPT I-cache on CPU6
     CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
     Detected VIPT I-cache on CPU7
     CPU7: Booted secondary processor 0x0000000103 [0x410fd034]

All but CPU0 tested, as expected.

    psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)

4 big CPU cores detected.

     CPU1: shutdown
     psci: CPU1 killed.
    -NOHZ: local_softirq_pending 55
    +NOHZ: local_softirq_pending 51
     CPU2: shutdown
     psci: CPU2 killed.
     NOHZ: local_softirq_pending 51
     CPU3: shutdown
     psci: CPU3 killed.
     Detected PIPT I-cache on CPU1
     CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
     Detected PIPT I-cache on CPU2
     CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
     Detected PIPT I-cache on CPU3
     CPU3: Booted secondary processor 0x0000000003 [0x411fd073]

All but CPU0 tested, as expected.

    psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)

4 LITTLE CPU cores detected.

     CPU4: shutdown
     psci: CPU4 killed.
     NOHZ: local_softirq_pending 55
    -CPU5: shutdown
    -psci: CPU5 killed.
    -NOHZ: local_softirq_pending 55
    -CPU6: shutdown
    -psci: CPU6 killed.
    -NOHZ: local_softirq_pending 55
    -CPU7: shutdown
    -psci: CPU7 killed.
     Detected VIPT I-cache on CPU4
     CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
    -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
    -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed
to: 1200000 KHz
    -Detected VIPT I-cache on CPU5
    -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
    -Detected VIPT I-cache on CPU6
    -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
    -Detected VIPT I-cache on CPU7
    -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]

Woops, CPU5-7 are not tested.

    psci_checker: Hotplug tests passed OK


> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -279,6 +279,9 @@ int __cpu_disable(void)
>         if (ret)
>                 return ret;
>
> +       remove_cpu_topology(cpu);
> +       numa_remove_cpu(cpu);
> +
>         /*
>          * Take this CPU offline.  Once we clear this, we can't return,
>          * and we must not schedule until we're ready to give up the cpu.

A simple revert is not sufficient, as that causes

    watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [cpuhp/2:21]

Do you have an idea how to fix this?
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 12:58     ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-17 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sudeep,

On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> We already repopulate the information on CPU hotplug-in, so we can safely
> remove the CPU topology and NUMA cpumap information during CPU hotplug
> out operation. This will help to provide the correct cpumask for
> scheduler domains.
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
topology information when hotplugging out CPU") in arm64/for-next/core, to
which I bisected a PSCI checker regression on systems with two CPU clusters.

Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:

     psci_checker: PSCI checker started using 8 CPUs

8 CPU cores detected.

     psci_checker: Starting hotplug tests
     psci_checker: Trying to turn off and on again all CPUs
     CPU1: shutdown
     psci: CPU1 killed.
     CPU2: shutdown
     psci: CPU2 killed.
    -NOHZ: local_softirq_pending 55
     CPU3: shutdown
     psci: CPU3 killed.
    -NOHZ: local_softirq_pending 51
     CPU4: shutdown
     psci: CPU4 killed.
     NOHZ: local_softirq_pending 55
     CPU5: shutdown
     psci: CPU5 killed.
     NOHZ: local_softirq_pending 55
     CPU6: shutdown
     psci: CPU6 killed.
     NOHZ: local_softirq_pending 55
     CPU7: shutdown
     psci: CPU7 killed.
     Detected PIPT I-cache on CPU1
     CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
     Detected PIPT I-cache on CPU2
     CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
     Detected PIPT I-cache on CPU3
     CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
     Detected VIPT I-cache on CPU4
     CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
     cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
     cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed
to: 1200000 KHz
     Detected VIPT I-cache on CPU5
     CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
     Detected VIPT I-cache on CPU6
     CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
     Detected VIPT I-cache on CPU7
     CPU7: Booted secondary processor 0x0000000103 [0x410fd034]

All but CPU0 tested, as expected.

    psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)

4 big CPU cores detected.

     CPU1: shutdown
     psci: CPU1 killed.
    -NOHZ: local_softirq_pending 55
    +NOHZ: local_softirq_pending 51
     CPU2: shutdown
     psci: CPU2 killed.
     NOHZ: local_softirq_pending 51
     CPU3: shutdown
     psci: CPU3 killed.
     Detected PIPT I-cache on CPU1
     CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
     Detected PIPT I-cache on CPU2
     CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
     Detected PIPT I-cache on CPU3
     CPU3: Booted secondary processor 0x0000000003 [0x411fd073]

All but CPU0 tested, as expected.

    psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)

4 LITTLE CPU cores detected.

     CPU4: shutdown
     psci: CPU4 killed.
     NOHZ: local_softirq_pending 55
    -CPU5: shutdown
    -psci: CPU5 killed.
    -NOHZ: local_softirq_pending 55
    -CPU6: shutdown
    -psci: CPU6 killed.
    -NOHZ: local_softirq_pending 55
    -CPU7: shutdown
    -psci: CPU7 killed.
     Detected VIPT I-cache on CPU4
     CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
    -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
    -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed
to: 1200000 KHz
    -Detected VIPT I-cache on CPU5
    -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
    -Detected VIPT I-cache on CPU6
    -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
    -Detected VIPT I-cache on CPU7
    -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]

Woops, CPU5-7 are not tested.

    psci_checker: Hotplug tests passed OK


> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -279,6 +279,9 @@ int __cpu_disable(void)
>         if (ret)
>                 return ret;
>
> +       remove_cpu_topology(cpu);
> +       numa_remove_cpu(cpu);
> +
>         /*
>          * Take this CPU offline.  Once we clear this, we can't return,
>          * and we must not schedule until we're ready to give up the cpu.

A simple revert is not sufficient, as that causes

    watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [cpuhp/2:21]

Do you have an idea how to fix this?
Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 12:58     ` Geert Uytterhoeven
@ 2018-07-17 14:05       ` Sudeep Holla
  -1 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 14:05 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Jeremy Linton,
	Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen, Linux-Renesas

On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
> Hi Sudeep,
>
> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > We already repopulate the information on CPU hotplug-in, so we can safely
> > remove the CPU topology and NUMA cpumap information during CPU hotplug
> > out operation. This will help to provide the correct cpumask for
> > scheduler domains.
> >
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>
> This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
> topology information when hotplugging out CPU") in arm64/for-next/core, to
> which I bisected a PSCI checker regression on systems with two CPU clusters.
>
> Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:
>
>      psci_checker: PSCI checker started using 8 CPUs
>
> 8 CPU cores detected.
>
>      psci_checker: Starting hotplug tests
>      psci_checker: Trying to turn off and on again all CPUs
>      CPU1: shutdown
>      psci: CPU1 killed.
>      CPU2: shutdown
>      psci: CPU2 killed.
>     -NOHZ: local_softirq_pending 55
>      CPU3: shutdown
>      psci: CPU3 killed.
>     -NOHZ: local_softirq_pending 51
>      CPU4: shutdown
>      psci: CPU4 killed.
>      NOHZ: local_softirq_pending 55
>      CPU5: shutdown
>      psci: CPU5 killed.
>      NOHZ: local_softirq_pending 55
>      CPU6: shutdown
>      psci: CPU6 killed.
>      NOHZ: local_softirq_pending 55
>      CPU7: shutdown
>      psci: CPU7 killed.
>      Detected PIPT I-cache on CPU1
>      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>      Detected PIPT I-cache on CPU2
>      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>      Detected PIPT I-cache on CPU3
>      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
>      Detected VIPT I-cache on CPU4
>      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
>      cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
>      cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
>      Detected VIPT I-cache on CPU5
>      CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
>      Detected VIPT I-cache on CPU6
>      CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
>      Detected VIPT I-cache on CPU7
>      CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
>
> All but CPU0 tested, as expected.
>

OK, does the firmware on this system not allow CPU0 to be hotplugged out ?

>     psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
>
> 4 big CPU cores detected.
>
>      CPU1: shutdown
>      psci: CPU1 killed.
>     -NOHZ: local_softirq_pending 55
>     +NOHZ: local_softirq_pending 51
>      CPU2: shutdown
>      psci: CPU2 killed.
>      NOHZ: local_softirq_pending 51
>      CPU3: shutdown
>      psci: CPU3 killed.
>      Detected PIPT I-cache on CPU1
>      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>      Detected PIPT I-cache on CPU2
>      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>      Detected PIPT I-cache on CPU3
>      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
>
> All but CPU0 tested, as expected.
>
>     psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)
>
> 4 LITTLE CPU cores detected.
>
>      CPU4: shutdown
>      psci: CPU4 killed.
>      NOHZ: local_softirq_pending 55
>     -CPU5: shutdown
>     -psci: CPU5 killed.
>     -NOHZ: local_softirq_pending 55
>     -CPU6: shutdown
>     -psci: CPU6 killed.
>     -NOHZ: local_softirq_pending 55
>     -CPU7: shutdown
>     -psci: CPU7 killed.
>      Detected VIPT I-cache on CPU4
>      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
>     -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
>     -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
>     -Detected VIPT I-cache on CPU5
>     -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
>     -Detected VIPT I-cache on CPU6
>     -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
>     -Detected VIPT I-cache on CPU7
>     -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
>
> Woops, CPU5-7 are not tested.
>

I don't understand what you mean by that. From the logs, it looks fine.
What do you mean by "CPU5-7 are not tested" ?

>     psci_checker: Hotplug tests passed OK
>
>
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -279,6 +279,9 @@ int __cpu_disable(void)
> >         if (ret)
> >                 return ret;
> >
> > +       remove_cpu_topology(cpu);
> > +       numa_remove_cpu(cpu);
> > +
> >         /*
> >          * Take this CPU offline.  Once we clear this, we can't return,
> >          * and we must not schedule until we're ready to give up the cpu.
>
> A simple revert is not sufficient, as that causes
>
>     watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [cpuhp/2:21]
>
> Do you have an idea how to fix this?

Sorry, but I am finding it hard to understand the issue from the log.
Also, it would be good to know your config. Is it defconfig - NUMA as before ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 14:05       ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
> Hi Sudeep,
>
> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > We already repopulate the information on CPU hotplug-in, so we can safely
> > remove the CPU topology and NUMA cpumap information during CPU hotplug
> > out operation. This will help to provide the correct cpumask for
> > scheduler domains.
> >
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>
> This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
> topology information when hotplugging out CPU") in arm64/for-next/core, to
> which I bisected a PSCI checker regression on systems with two CPU clusters.
>
> Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:
>
>      psci_checker: PSCI checker started using 8 CPUs
>
> 8 CPU cores detected.
>
>      psci_checker: Starting hotplug tests
>      psci_checker: Trying to turn off and on again all CPUs
>      CPU1: shutdown
>      psci: CPU1 killed.
>      CPU2: shutdown
>      psci: CPU2 killed.
>     -NOHZ: local_softirq_pending 55
>      CPU3: shutdown
>      psci: CPU3 killed.
>     -NOHZ: local_softirq_pending 51
>      CPU4: shutdown
>      psci: CPU4 killed.
>      NOHZ: local_softirq_pending 55
>      CPU5: shutdown
>      psci: CPU5 killed.
>      NOHZ: local_softirq_pending 55
>      CPU6: shutdown
>      psci: CPU6 killed.
>      NOHZ: local_softirq_pending 55
>      CPU7: shutdown
>      psci: CPU7 killed.
>      Detected PIPT I-cache on CPU1
>      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>      Detected PIPT I-cache on CPU2
>      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>      Detected PIPT I-cache on CPU3
>      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
>      Detected VIPT I-cache on CPU4
>      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
>      cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
>      cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
>      Detected VIPT I-cache on CPU5
>      CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
>      Detected VIPT I-cache on CPU6
>      CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
>      Detected VIPT I-cache on CPU7
>      CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
>
> All but CPU0 tested, as expected.
>

OK, does the firmware on this system not allow CPU0 to be hotplugged out ?

>     psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
>
> 4 big CPU cores detected.
>
>      CPU1: shutdown
>      psci: CPU1 killed.
>     -NOHZ: local_softirq_pending 55
>     +NOHZ: local_softirq_pending 51
>      CPU2: shutdown
>      psci: CPU2 killed.
>      NOHZ: local_softirq_pending 51
>      CPU3: shutdown
>      psci: CPU3 killed.
>      Detected PIPT I-cache on CPU1
>      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>      Detected PIPT I-cache on CPU2
>      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>      Detected PIPT I-cache on CPU3
>      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
>
> All but CPU0 tested, as expected.
>
>     psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)
>
> 4 LITTLE CPU cores detected.
>
>      CPU4: shutdown
>      psci: CPU4 killed.
>      NOHZ: local_softirq_pending 55
>     -CPU5: shutdown
>     -psci: CPU5 killed.
>     -NOHZ: local_softirq_pending 55
>     -CPU6: shutdown
>     -psci: CPU6 killed.
>     -NOHZ: local_softirq_pending 55
>     -CPU7: shutdown
>     -psci: CPU7 killed.
>      Detected VIPT I-cache on CPU4
>      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
>     -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
>     -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
>     -Detected VIPT I-cache on CPU5
>     -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
>     -Detected VIPT I-cache on CPU6
>     -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
>     -Detected VIPT I-cache on CPU7
>     -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
>
> Woops, CPU5-7 are not tested.
>

I don't understand what you mean by that. From the logs, it looks fine.
What do you mean by "CPU5-7 are not tested" ?

>     psci_checker: Hotplug tests passed OK
>
>
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -279,6 +279,9 @@ int __cpu_disable(void)
> >         if (ret)
> >                 return ret;
> >
> > +       remove_cpu_topology(cpu);
> > +       numa_remove_cpu(cpu);
> > +
> >         /*
> >          * Take this CPU offline.  Once we clear this, we can't return,
> >          * and we must not schedule until we're ready to give up the cpu.
>
> A simple revert is not sufficient, as that causes
>
>     watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [cpuhp/2:21]
>
> Do you have an idea how to fix this?

Sorry, but I am finding it hard to understand the issue from the log.
Also, it would be good to know your config. Is it defconfig - NUMA as before ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 14:05       ` Sudeep Holla
@ 2018-07-17 15:06         ` Geert Uytterhoeven
  -1 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-17 15:06 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Jeremy Linton,
	Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen, Linux-Renesas

Hi Sudeep,

On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
> > On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > We already repopulate the information on CPU hotplug-in, so we can safely
> > > remove the CPU topology and NUMA cpumap information during CPU hotplug
> > > out operation. This will help to provide the correct cpumask for
> > > scheduler domains.
> > >
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> >
> > This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
> > topology information when hotplugging out CPU") in arm64/for-next/core, to
> > which I bisected a PSCI checker regression on systems with two CPU clusters.
> >
> > Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:
> >
> >      psci_checker: PSCI checker started using 8 CPUs
> >
> > 8 CPU cores detected.
> >
> >      psci_checker: Starting hotplug tests
> >      psci_checker: Trying to turn off and on again all CPUs
> >      CPU1: shutdown
> >      psci: CPU1 killed.
> >      CPU2: shutdown
> >      psci: CPU2 killed.
> >     -NOHZ: local_softirq_pending 55
> >      CPU3: shutdown
> >      psci: CPU3 killed.
> >     -NOHZ: local_softirq_pending 51
> >      CPU4: shutdown
> >      psci: CPU4 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU5: shutdown
> >      psci: CPU5 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU6: shutdown
> >      psci: CPU6 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU7: shutdown
> >      psci: CPU7 killed.
> >      Detected PIPT I-cache on CPU1
> >      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
> >      Detected PIPT I-cache on CPU2
> >      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
> >      Detected PIPT I-cache on CPU3
> >      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
> >      Detected VIPT I-cache on CPU4
> >      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
> >      cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
> >      cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
> >      Detected VIPT I-cache on CPU5
> >      CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
> >      Detected VIPT I-cache on CPU6
> >      CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
> >      Detected VIPT I-cache on CPU7
> >      CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
> >
> > All but CPU0 tested, as expected.
>
> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?

Correct. But that's not the issue.

> >     psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
> >
> > 4 big CPU cores detected.
> >
> >      CPU1: shutdown
> >      psci: CPU1 killed.
> >     -NOHZ: local_softirq_pending 55
> >     +NOHZ: local_softirq_pending 51
> >      CPU2: shutdown
> >      psci: CPU2 killed.
> >      NOHZ: local_softirq_pending 51
> >      CPU3: shutdown
> >      psci: CPU3 killed.
> >      Detected PIPT I-cache on CPU1
> >      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
> >      Detected PIPT I-cache on CPU2
> >      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
> >      Detected PIPT I-cache on CPU3
> >      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
> >
> > All but CPU0 tested, as expected.
> >
> >     psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)
> >
> > 4 LITTLE CPU cores detected.
> >
> >      CPU4: shutdown
> >      psci: CPU4 killed.
> >      NOHZ: local_softirq_pending 55
> >     -CPU5: shutdown
> >     -psci: CPU5 killed.
> >     -NOHZ: local_softirq_pending 55
> >     -CPU6: shutdown
> >     -psci: CPU6 killed.
> >     -NOHZ: local_softirq_pending 55
> >     -CPU7: shutdown
> >     -psci: CPU7 killed.
> >      Detected VIPT I-cache on CPU4
> >      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
> >     -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
> >     -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
> >     -Detected VIPT I-cache on CPU5
> >     -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
> >     -Detected VIPT I-cache on CPU6
> >     -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
> >     -Detected VIPT I-cache on CPU7
> >     -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
> >
> > Woops, CPU5-7 are not tested.
>
> I don't understand what you mean by that. From the logs, it looks fine.
> What do you mean by "CPU5-7 are not tested" ?
> Sorry, but I am finding it hard to understand the issue from the log.

The above is not the log, but the diff between the log before and after the bad
commit. So lines prefixed with "-" are no longer printed.
Before, CPU4-7 were tested for hotplug in group 1.
After, only CPU4 is tested for hotplug in group 1.

> Also, it would be good to know your config. Is it defconfig - NUMA as before ?

arm64 defconfig + CONFIG_ARM_PSCI_CHECKER=y should do.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 15:06         ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-17 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sudeep,

On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
> > On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > We already repopulate the information on CPU hotplug-in, so we can safely
> > > remove the CPU topology and NUMA cpumap information during CPU hotplug
> > > out operation. This will help to provide the correct cpumask for
> > > scheduler domains.
> > >
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will.deacon@arm.com>
> > > Tested-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> > > Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
> > > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> >
> > This is now commit 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa
> > topology information when hotplugging out CPU") in arm64/for-next/core, to
> > which I bisected a PSCI checker regression on systems with two CPU clusters.
> >
> > Dmesg on R-Car H3 (4xCA57+4xCA53) before/after:
> >
> >      psci_checker: PSCI checker started using 8 CPUs
> >
> > 8 CPU cores detected.
> >
> >      psci_checker: Starting hotplug tests
> >      psci_checker: Trying to turn off and on again all CPUs
> >      CPU1: shutdown
> >      psci: CPU1 killed.
> >      CPU2: shutdown
> >      psci: CPU2 killed.
> >     -NOHZ: local_softirq_pending 55
> >      CPU3: shutdown
> >      psci: CPU3 killed.
> >     -NOHZ: local_softirq_pending 51
> >      CPU4: shutdown
> >      psci: CPU4 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU5: shutdown
> >      psci: CPU5 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU6: shutdown
> >      psci: CPU6 killed.
> >      NOHZ: local_softirq_pending 55
> >      CPU7: shutdown
> >      psci: CPU7 killed.
> >      Detected PIPT I-cache on CPU1
> >      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
> >      Detected PIPT I-cache on CPU2
> >      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
> >      Detected PIPT I-cache on CPU3
> >      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
> >      Detected VIPT I-cache on CPU4
> >      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
> >      cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
> >      cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
> >      Detected VIPT I-cache on CPU5
> >      CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
> >      Detected VIPT I-cache on CPU6
> >      CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
> >      Detected VIPT I-cache on CPU7
> >      CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
> >
> > All but CPU0 tested, as expected.
>
> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?

Correct. But that's not the issue.

> >     psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
> >
> > 4 big CPU cores detected.
> >
> >      CPU1: shutdown
> >      psci: CPU1 killed.
> >     -NOHZ: local_softirq_pending 55
> >     +NOHZ: local_softirq_pending 51
> >      CPU2: shutdown
> >      psci: CPU2 killed.
> >      NOHZ: local_softirq_pending 51
> >      CPU3: shutdown
> >      psci: CPU3 killed.
> >      Detected PIPT I-cache on CPU1
> >      CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
> >      Detected PIPT I-cache on CPU2
> >      CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
> >      Detected PIPT I-cache on CPU3
> >      CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
> >
> > All but CPU0 tested, as expected.
> >
> >     psci_checker: Trying to turn off and on again group 1 (CPUs 4-7)
> >
> > 4 LITTLE CPU cores detected.
> >
> >      CPU4: shutdown
> >      psci: CPU4 killed.
> >      NOHZ: local_softirq_pending 55
> >     -CPU5: shutdown
> >     -psci: CPU5 killed.
> >     -NOHZ: local_softirq_pending 55
> >     -CPU6: shutdown
> >     -psci: CPU6 killed.
> >     -NOHZ: local_softirq_pending 55
> >     -CPU7: shutdown
> >     -psci: CPU7 killed.
> >      Detected VIPT I-cache on CPU4
> >      CPU4: Booted secondary processor 0x0000000100 [0x410fd034]
> >     -cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 1198080 KHz
> >     -cpufreq: cpufreq_online: CPU4: Unlisted initial frequency changed to: 1200000 KHz
> >     -Detected VIPT I-cache on CPU5
> >     -CPU5: Booted secondary processor 0x0000000101 [0x410fd034]
> >     -Detected VIPT I-cache on CPU6
> >     -CPU6: Booted secondary processor 0x0000000102 [0x410fd034]
> >     -Detected VIPT I-cache on CPU7
> >     -CPU7: Booted secondary processor 0x0000000103 [0x410fd034]
> >
> > Woops, CPU5-7 are not tested.
>
> I don't understand what you mean by that. From the logs, it looks fine.
> What do you mean by "CPU5-7 are not tested" ?
> Sorry, but I am finding it hard to understand the issue from the log.

The above is not the log, but the diff between the log before and after the bad
commit. So lines prefixed with "-" are no longer printed.
Before, CPU4-7 were tested for hotplug in group 1.
After, only CPU4 is tested for hotplug in group 1.

> Also, it would be good to know your config. Is it defconfig - NUMA as before ?

arm64 defconfig + CONFIG_ARM_PSCI_CHECKER=y should do.

Thanks!

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 15:06         ` Geert Uytterhoeven
@ 2018-07-17 15:34           ` Sudeep Holla
  -1 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 15:34 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sudeep Holla, Linux ARM, Catalin Marinas, Will Deacon,
	Jeremy Linton, Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen,
	Linux-Renesas



On 17/07/18 16:06, Geert Uytterhoeven wrote:
> Hi Sudeep,
> 
> On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
>>> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:

[...]

>>> All but CPU0 tested, as expected.
>>
>> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?
> 
> Correct. But that's not the issue.
> 

Yes,  was just trying to understand the behavior.

[..]

>>> Woops, CPU5-7 are not tested.
>>
>> I don't understand what you mean by that. From the logs, it looks fine.
>> What do you mean by "CPU5-7 are not tested" ?
>> Sorry, but I am finding it hard to understand the issue from the log.
> 
> The above is not the log, but the diff between the log before and after the bad
> commit. So lines prefixed with "-" are no longer printed.
> Before, CPU4-7 were tested for hotplug in group 1.
> After, only CPU4 is tested for hotplug in group 1.
> 

Ah OK, I think I now understand what is the issue, will try to replicate
at my end and get back to you.

>> Also, it would be good to know your config. Is it defconfig - NUMA as before ?
> 
> arm64 defconfig + CONFIG_ARM_PSCI_CHECKER=y should do.
> 

OK thanks, that is simpler.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 15:34           ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 15:34 UTC (permalink / raw)
  To: linux-arm-kernel



On 17/07/18 16:06, Geert Uytterhoeven wrote:
> Hi Sudeep,
> 
> On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
>>> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:

[...]

>>> All but CPU0 tested, as expected.
>>
>> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?
> 
> Correct. But that's not the issue.
> 

Yes,  was just trying to understand the behavior.

[..]

>>> Woops, CPU5-7 are not tested.
>>
>> I don't understand what you mean by that. From the logs, it looks fine.
>> What do you mean by "CPU5-7 are not tested" ?
>> Sorry, but I am finding it hard to understand the issue from the log.
> 
> The above is not the log, but the diff between the log before and after the bad
> commit. So lines prefixed with "-" are no longer printed.
> Before, CPU4-7 were tested for hotplug in group 1.
> After, only CPU4 is tested for hotplug in group 1.
> 

Ah OK, I think I now understand what is the issue, will try to replicate
at my end and get back to you.

>> Also, it would be good to know your config. Is it defconfig - NUMA as before ?
> 
> arm64 defconfig + CONFIG_ARM_PSCI_CHECKER=y should do.
> 

OK thanks, that is simpler.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 15:34           ` Sudeep Holla
@ 2018-07-17 15:55             ` Sudeep Holla
  -1 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 15:55 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sudeep Holla, Linux ARM, Catalin Marinas, Will Deacon,
	Jeremy Linton, Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen,
	Linux-Renesas



On 17/07/18 16:34, Sudeep Holla wrote:
> 
> 
> On 17/07/18 16:06, Geert Uytterhoeven wrote:
>> Hi Sudeep,
>>
>> On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>>> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
>>>> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> 
> [...]
> 
>>>> All but CPU0 tested, as expected.
>>>
>>> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?
>>
>> Correct. But that's not the issue.
>>
> 
> Yes,  was just trying to understand the behavior.
> 
> [..]
> 
>>>> Woops, CPU5-7 are not tested.
>>>
>>> I don't understand what you mean by that. From the logs, it looks fine.
>>> What do you mean by "CPU5-7 are not tested" ?
>>> Sorry, but I am finding it hard to understand the issue from the log.
>>
>> The above is not the log, but the diff between the log before and after the bad
>> commit. So lines prefixed with "-" are no longer printed.
>> Before, CPU4-7 were tested for hotplug in group 1.
>> After, only CPU4 is tested for hotplug in group 1.
>>
> 
> Ah OK, I think I now understand what is the issue, will try to replicate
> at my end and get back to you.
> 
Going through the code again, I think I understand the problem here.
We use the topology_core_mask pointers which are stashed in cpu_groups[]
But, the cpumask themselves will be getting modified as the cpus go up
and down, so we need to make a copy instead of just using the pointer.
I will see what we can do to fix that.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 15:55             ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 15:55 UTC (permalink / raw)
  To: linux-arm-kernel



On 17/07/18 16:34, Sudeep Holla wrote:
> 
> 
> On 17/07/18 16:06, Geert Uytterhoeven wrote:
>> Hi Sudeep,
>>
>> On Tue, Jul 17, 2018 at 4:05 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>>> On Tue, Jul 17, 2018 at 02:58:14PM +0200, Geert Uytterhoeven wrote:
>>>> On Fri, Jul 6, 2018 at 1:04 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> 
> [...]
> 
>>>> All but CPU0 tested, as expected.
>>>
>>> OK, does the firmware on this system not allow CPU0 to be hotplugged out ?
>>
>> Correct. But that's not the issue.
>>
> 
> Yes,  was just trying to understand the behavior.
> 
> [..]
> 
>>>> Woops, CPU5-7 are not tested.
>>>
>>> I don't understand what you mean by that. From the logs, it looks fine.
>>> What do you mean by "CPU5-7 are not tested" ?
>>> Sorry, but I am finding it hard to understand the issue from the log.
>>
>> The above is not the log, but the diff between the log before and after the bad
>> commit. So lines prefixed with "-" are no longer printed.
>> Before, CPU4-7 were tested for hotplug in group 1.
>> After, only CPU4 is tested for hotplug in group 1.
>>
> 
> Ah OK, I think I now understand what is the issue, will try to replicate
> at my end and get back to you.
> 
Going through the code again, I think I understand the problem here.
We use the topology_core_mask pointers which are stashed in cpu_groups[]
But, the cpumask themselves will be getting modified as the cpus go up
and down, so we need to make a copy instead of just using the pointer.
I will see what we can do to fix that.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 15:55             ` Sudeep Holla
@ 2018-07-17 17:01               ` Sudeep Holla
  -1 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 17:01 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Jeremy Linton,
	Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen, Sudeep Holla,
	Mark Rutland, Lorenzo Pieralisi, Linux-Renesas

On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:

[..]

> Going through the code again, I think I understand the problem here.
> We use the topology_core_mask pointers which are stashed in cpu_groups[]
> But, the cpumask themselves will be getting modified as the cpus go up
> and down, so we need to make a copy instead of just using the pointer.
> I will see what we can do to fix that.

This is what I could come up with. I haven't tested this but just compiled
it. Let me know if this resolves the issue.

-->8

From: Sudeep Holla <sudeep.holla@arm.com>
Date: Tue, 17 Jul 2018 17:45:20 +0100
Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests

Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
when hotplugging out CPU") updates the cpu topology when the CPU is
hotplugged out. However the PSCI checker code uses the topology_core_cpumask
pointers for some of the cpu hotplug testing. Since the pointer to the
core_cpumask of the first CPU in the group is used, which when that CPU
itself is hotpugged out is just set to itself, the testing terminates
after that particular CPU is tested out. But the intention of this tests
is to cover all the CPU in the group.

In order to support that, we need to stash the topology_core_cpumask
before the start of the test and use that value instead of pointer to a
cpumask which will be updated on CPU hotplug.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/firmware/psci_checker.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index bb1c068bff19..dcace6fa17b6 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -77,21 +77,20 @@ static int psci_ops_check(void)
 	return 0;
 }
 
-static int find_cpu_groups(const struct cpumask *cpus,
-			   const struct cpumask **cpu_groups)
+static int find_cpu_groups(cpumask_var_t *cpu_groups)
 {
 	unsigned int nb = 0;
 	cpumask_var_t tmp;
 
 	if (!alloc_cpumask_var(&tmp, GFP_KERNEL))
 		return -ENOMEM;
-	cpumask_copy(tmp, cpus);
+	cpumask_copy(tmp, cpu_online_mask);
 
 	while (!cpumask_empty(tmp)) {
 		const struct cpumask *cpu_group =
 			topology_core_cpumask(cpumask_any(tmp));
 
-		cpu_groups[nb++] = cpu_group;
+		cpumask_copy(cpu_groups[nb++], cpu_group);
 		cpumask_andnot(tmp, tmp, cpu_group);
 	}
 
@@ -169,16 +168,15 @@ static unsigned int down_and_up_cpus(const struct cpumask *cpus,
 static int hotplug_tests(void)
 {
 	int err;
-	cpumask_var_t offlined_cpus;
+	cpumask_var_t offlined_cpus, *cpu_groups;
 	int i, nb_cpu_group;
-	const struct cpumask **cpu_groups;
 	char *page_buf;
 
 	err = -ENOMEM;
 	if (!alloc_cpumask_var(&offlined_cpus, GFP_KERNEL))
 		return err;
 	/* We may have up to nb_available_cpus cpu_groups. */
-	cpu_groups = kmalloc_array(nb_available_cpus, sizeof(*cpu_groups),
+	cpu_groups = kmalloc_array(nb_available_cpus, sizeof(cpu_groups),
 				   GFP_KERNEL);
 	if (!cpu_groups)
 		goto out_free_cpus;
@@ -186,8 +184,12 @@ static int hotplug_tests(void)
 	if (!page_buf)
 		goto out_free_cpu_groups;
 
+	for (i = 0; i < nb_available_cpus; ++i)
+		if (!alloc_cpumask_var(&cpu_groups[i], GFP_KERNEL))
+			goto out_free_cpu_groups_var;
+
 	err = 0;
-	nb_cpu_group = find_cpu_groups(cpu_online_mask, cpu_groups);
+	nb_cpu_group = find_cpu_groups(cpu_groups);
 
 	/*
 	 * Of course the last CPU cannot be powered down and cpu_down() should
@@ -211,6 +213,9 @@ static int hotplug_tests(void)
 	}
 
 	free_page((unsigned long)page_buf);
+out_free_cpu_groups_var:
+	for (i = 0; i < nb_available_cpus; ++i)
+		free_cpumask_var(cpu_groups[i]);
 out_free_cpu_groups:
 	kfree(cpu_groups);
 out_free_cpus:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-17 17:01               ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-17 17:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:

[..]

> Going through the code again, I think I understand the problem here.
> We use the topology_core_mask pointers which are stashed in cpu_groups[]
> But, the cpumask themselves will be getting modified as the cpus go up
> and down, so we need to make a copy instead of just using the pointer.
> I will see what we can do to fix that.

This is what I could come up with. I haven't tested this but just compiled
it. Let me know if this resolves the issue.

-->8

From: Sudeep Holla <sudeep.holla@arm.com>
Date: Tue, 17 Jul 2018 17:45:20 +0100
Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests

Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
when hotplugging out CPU") updates the cpu topology when the CPU is
hotplugged out. However the PSCI checker code uses the topology_core_cpumask
pointers for some of the cpu hotplug testing. Since the pointer to the
core_cpumask of the first CPU in the group is used, which when that CPU
itself is hotpugged out is just set to itself, the testing terminates
after that particular CPU is tested out. But the intention of this tests
is to cover all the CPU in the group.

In order to support that, we need to stash the topology_core_cpumask
before the start of the test and use that value instead of pointer to a
cpumask which will be updated on CPU hotplug.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
---
 drivers/firmware/psci_checker.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/firmware/psci_checker.c b/drivers/firmware/psci_checker.c
index bb1c068bff19..dcace6fa17b6 100644
--- a/drivers/firmware/psci_checker.c
+++ b/drivers/firmware/psci_checker.c
@@ -77,21 +77,20 @@ static int psci_ops_check(void)
 	return 0;
 }
 
-static int find_cpu_groups(const struct cpumask *cpus,
-			   const struct cpumask **cpu_groups)
+static int find_cpu_groups(cpumask_var_t *cpu_groups)
 {
 	unsigned int nb = 0;
 	cpumask_var_t tmp;
 
 	if (!alloc_cpumask_var(&tmp, GFP_KERNEL))
 		return -ENOMEM;
-	cpumask_copy(tmp, cpus);
+	cpumask_copy(tmp, cpu_online_mask);
 
 	while (!cpumask_empty(tmp)) {
 		const struct cpumask *cpu_group =
 			topology_core_cpumask(cpumask_any(tmp));
 
-		cpu_groups[nb++] = cpu_group;
+		cpumask_copy(cpu_groups[nb++], cpu_group);
 		cpumask_andnot(tmp, tmp, cpu_group);
 	}
 
@@ -169,16 +168,15 @@ static unsigned int down_and_up_cpus(const struct cpumask *cpus,
 static int hotplug_tests(void)
 {
 	int err;
-	cpumask_var_t offlined_cpus;
+	cpumask_var_t offlined_cpus, *cpu_groups;
 	int i, nb_cpu_group;
-	const struct cpumask **cpu_groups;
 	char *page_buf;
 
 	err = -ENOMEM;
 	if (!alloc_cpumask_var(&offlined_cpus, GFP_KERNEL))
 		return err;
 	/* We may have up to nb_available_cpus cpu_groups. */
-	cpu_groups = kmalloc_array(nb_available_cpus, sizeof(*cpu_groups),
+	cpu_groups = kmalloc_array(nb_available_cpus, sizeof(cpu_groups),
 				   GFP_KERNEL);
 	if (!cpu_groups)
 		goto out_free_cpus;
@@ -186,8 +184,12 @@ static int hotplug_tests(void)
 	if (!page_buf)
 		goto out_free_cpu_groups;
 
+	for (i = 0; i < nb_available_cpus; ++i)
+		if (!alloc_cpumask_var(&cpu_groups[i], GFP_KERNEL))
+			goto out_free_cpu_groups_var;
+
 	err = 0;
-	nb_cpu_group = find_cpu_groups(cpu_online_mask, cpu_groups);
+	nb_cpu_group = find_cpu_groups(cpu_groups);
 
 	/*
 	 * Of course the last CPU cannot be powered down and cpu_down() should
@@ -211,6 +213,9 @@ static int hotplug_tests(void)
 	}
 
 	free_page((unsigned long)page_buf);
+out_free_cpu_groups_var:
+	for (i = 0; i < nb_available_cpus; ++i)
+		free_cpumask_var(cpu_groups[i]);
 out_free_cpu_groups:
 	kfree(cpu_groups);
 out_free_cpus:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-17 17:01               ` Sudeep Holla
@ 2018-07-18  7:15                 ` Geert Uytterhoeven
  -1 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-18  7:15 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Linux ARM, Catalin Marinas, Will Deacon, Jeremy Linton,
	Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen, Mark Rutland,
	Lorenzo Pieralisi, Linux-Renesas

Hi Sudeep,

On Tue, Jul 17, 2018 at 7:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:
> > Going through the code again, I think I understand the problem here.
> > We use the topology_core_mask pointers which are stashed in cpu_groups[]
> > But, the cpumask themselves will be getting modified as the cpus go up
> > and down, so we need to make a copy instead of just using the pointer.
> > I will see what we can do to fix that.
>
> This is what I could come up with. I haven't tested this but just compiled
> it. Let me know if this resolves the issue.

Thanks, it did!

> From: Sudeep Holla <sudeep.holla@arm.com>
> Date: Tue, 17 Jul 2018 17:45:20 +0100
> Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
>
> Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
> when hotplugging out CPU") updates the cpu topology when the CPU is
> hotplugged out. However the PSCI checker code uses the topology_core_cpumask
> pointers for some of the cpu hotplug testing. Since the pointer to the
> core_cpumask of the first CPU in the group is used, which when that CPU
> itself is hotpugged out is just set to itself, the testing terminates
> after that particular CPU is tested out. But the intention of this tests
> is to cover all the CPU in the group.
>
> In order to support that, we need to stash the topology_core_cpumask
> before the start of the test and use that value instead of pointer to a
> cpumask which will be updated on CPU hotplug.
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

Fixes: 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa topology
information when hotplugging out CPU")
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-18  7:15                 ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2018-07-18  7:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sudeep,

On Tue, Jul 17, 2018 at 7:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
> On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:
> > Going through the code again, I think I understand the problem here.
> > We use the topology_core_mask pointers which are stashed in cpu_groups[]
> > But, the cpumask themselves will be getting modified as the cpus go up
> > and down, so we need to make a copy instead of just using the pointer.
> > I will see what we can do to fix that.
>
> This is what I could come up with. I haven't tested this but just compiled
> it. Let me know if this resolves the issue.

Thanks, it did!

> From: Sudeep Holla <sudeep.holla@arm.com>
> Date: Tue, 17 Jul 2018 17:45:20 +0100
> Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
>
> Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
> when hotplugging out CPU") updates the cpu topology when the CPU is
> hotplugged out. However the PSCI checker code uses the topology_core_cpumask
> pointers for some of the cpu hotplug testing. Since the pointer to the
> core_cpumask of the first CPU in the group is used, which when that CPU
> itself is hotpugged out is just set to itself, the testing terminates
> after that particular CPU is tested out. But the intention of this tests
> is to cover all the CPU in the group.
>
> In order to support that, we need to stash the topology_core_cpumask
> before the start of the test and use that value instead of pointer to a
> cpumask which will be updated on CPU hotplug.
>
> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>

Fixes: 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa topology
information when hotplugging out CPU")
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
  2018-07-18  7:15                 ` Geert Uytterhoeven
@ 2018-07-18 10:33                   ` Sudeep Holla
  -1 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-18 10:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Sudeep Holla, Linux ARM, Catalin Marinas, Will Deacon,
	Jeremy Linton, Hanjun Guo, ganapatrao.kulkarni, morten.rasmussen,
	Mark Rutland, Lorenzo Pieralisi, Linux-Renesas



On 18/07/18 08:15, Geert Uytterhoeven wrote:
> Hi Sudeep,
> 
> On Tue, Jul 17, 2018 at 7:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:
>>> Going through the code again, I think I understand the problem here.
>>> We use the topology_core_mask pointers which are stashed in cpu_groups[]
>>> But, the cpumask themselves will be getting modified as the cpus go up
>>> and down, so we need to make a copy instead of just using the pointer.
>>> I will see what we can do to fix that.
>>
>> This is what I could come up with. I haven't tested this but just compiled
>> it. Let me know if this resolves the issue.
> 
> Thanks, it did!
> 
>> From: Sudeep Holla <sudeep.holla@arm.com>
>> Date: Tue, 17 Jul 2018 17:45:20 +0100
>> Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
>>
>> Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
>> when hotplugging out CPU") updates the cpu topology when the CPU is
>> hotplugged out. However the PSCI checker code uses the topology_core_cpumask
>> pointers for some of the cpu hotplug testing. Since the pointer to the
>> core_cpumask of the first CPU in the group is used, which when that CPU
>> itself is hotpugged out is just set to itself, the testing terminates
>> after that particular CPU is tested out. But the intention of this tests
>> is to cover all the CPU in the group.
>>
>> In order to support that, we need to stash the topology_core_cpumask
>> before the start of the test and use that value instead of pointer to a
>> cpumask which will be updated on CPU hotplug.
>>
>> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> 
> Fixes: 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa topology
> information when hotplugging out CPU")
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> 

Thanks, I will post the patch independently with small clean up to
reduce allocation of cpumask

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU
@ 2018-07-18 10:33                   ` Sudeep Holla
  0 siblings, 0 replies; 25+ messages in thread
From: Sudeep Holla @ 2018-07-18 10:33 UTC (permalink / raw)
  To: linux-arm-kernel



On 18/07/18 08:15, Geert Uytterhoeven wrote:
> Hi Sudeep,
> 
> On Tue, Jul 17, 2018 at 7:02 PM Sudeep Holla <sudeep.holla@arm.com> wrote:
>> On Tue, Jul 17, 2018 at 04:55:22PM +0100, Sudeep Holla wrote:
>>> Going through the code again, I think I understand the problem here.
>>> We use the topology_core_mask pointers which are stashed in cpu_groups[]
>>> But, the cpumask themselves will be getting modified as the cpus go up
>>> and down, so we need to make a copy instead of just using the pointer.
>>> I will see what we can do to fix that.
>>
>> This is what I could come up with. I haven't tested this but just compiled
>> it. Let me know if this resolves the issue.
> 
> Thanks, it did!
> 
>> From: Sudeep Holla <sudeep.holla@arm.com>
>> Date: Tue, 17 Jul 2018 17:45:20 +0100
>> Subject: [PATCH] drivers/firmware: psci_checker: stash and use topology_core_cpumask for hotplug tests
>>
>> Commit 7f9545aa1a91 ("arm64: smp: remove cpu and numa topology information
>> when hotplugging out CPU") updates the cpu topology when the CPU is
>> hotplugged out. However the PSCI checker code uses the topology_core_cpumask
>> pointers for some of the cpu hotplug testing. Since the pointer to the
>> core_cpumask of the first CPU in the group is used, which when that CPU
>> itself is hotpugged out is just set to itself, the testing terminates
>> after that particular CPU is tested out. But the intention of this tests
>> is to cover all the CPU in the group.
>>
>> In order to support that, we need to stash the topology_core_cpumask
>> before the start of the test and use that value instead of pointer to a
>> cpumask which will be updated on CPU hotplug.
>>
>> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> 
> Fixes: 7f9545aa1a91a9a4 ("arm64: smp: remove cpu and numa topology
> information when hotplugging out CPU")
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
> 

Thanks, I will post the patch independently with small clean up to
reduce allocation of cpumask

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-07-18 11:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 11:02 [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 1/7] arm64: topology: refactor reset_cpu_topology to add support for removing topology Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 2/7] arm64: numa: separate out updates to percpu nodeid and NUMA node cpumap Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 3/7] arm64: topology: add support to remove cpu topology sibling masks Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 4/7] arm64: topology: restrict updating siblings_masks to online cpus only Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 5/7] arm64: smp: remove cpu and numa topology information when hotplugging out CPU Sudeep Holla
2018-07-17 12:58   ` Geert Uytterhoeven
2018-07-17 12:58     ` Geert Uytterhoeven
2018-07-17 14:05     ` Sudeep Holla
2018-07-17 14:05       ` Sudeep Holla
2018-07-17 15:06       ` Geert Uytterhoeven
2018-07-17 15:06         ` Geert Uytterhoeven
2018-07-17 15:34         ` Sudeep Holla
2018-07-17 15:34           ` Sudeep Holla
2018-07-17 15:55           ` Sudeep Holla
2018-07-17 15:55             ` Sudeep Holla
2018-07-17 17:01             ` Sudeep Holla
2018-07-17 17:01               ` Sudeep Holla
2018-07-18  7:15               ` Geert Uytterhoeven
2018-07-18  7:15                 ` Geert Uytterhoeven
2018-07-18 10:33                 ` Sudeep Holla
2018-07-18 10:33                   ` Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 6/7] arm64: topology: rename llc_siblings to align with other struct members Sudeep Holla
2018-07-06 11:02 ` [PATCH v3 7/7] arm64: topology: re-introduce numa mask check for scheduler MC selection Sudeep Holla
2018-07-10 21:51 ` [PATCH v3 0/7] arm64: numa/topology/smp: update the cpumasks for CPU hotplug Jeremy Linton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.