All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Support for grouping cores
@ 2020-07-14  4:36 Srikar Dronamraju
  2020-07-14  4:36 ` [PATCH 01/11] powerpc/smp: Cache node for reuse Srikar Dronamraju
                   ` (11 more replies)
  0 siblings, 12 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Cleanup of existing powerpc topologies and add support for grouping cores.

Patch 7 of this patch series: "Powerpc/numa: Detect support for coregroup"
depends on
https://lore.kernel.org/linuxppc-dev/20200707140644.7241-1-srikar@linux.vnet.ibm.com/t/#u
However it should be easy to rebase the patch without the above patch.

On Power 8 Systems
------------------
$ tail /proc/cpuinfo
processor	: 255
cpu		: POWER8 (architected), altivec supported
clock		: 3724.000000MHz
revision	: 2.1 (pvr 004b 0201)

timebase	: 512000000
platform	: pSeries
model		: IBM,8408-E8E
machine		: CHRP IBM,8408-E8E
MMU		: Hash

Before the patchset
-------------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,00000000,00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After the patchset
------------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
DIE
NUMA
NUMA
$ head /proc/schedstat
version 15
timestamp 4295534931
cpu0 0 0 0 0 0 0 41389823338 17682779896 14117
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,00000000,00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 27087859050 152273672 10396
domain0 00000000,00000000,00000000,00000000,00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,00000000,00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

On Power 9 (with device-tree enablement to show coregroups).
-----------------------------------------------------------
$ tail /proc/cpuinfo
processor	: 127
cpu		: POWER9 (architected), altivec supported
clock		: 3000.000000MHz
revision	: 2.2 (pvr 004e 0202)

timebase	: 512000000
platform	: pSeries
model		: IBM,9008-22L
machine		: CHRP IBM,9008-22L
MMU		: Hash

Before patchset
--------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
CACHE
DIE
NUMA

$ head /proc/schedstat
version 15
timestamp 4318242208
cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 24177439200 413887604 75393
domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After patchset
--------------
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/name
SMT
CACHE
MC
DIE
NUMA

$ head /proc/schedstat
version 15
timestamp 4318242208
cpu0 0 0 0 0 0 0 28077107004 4773387362 78205
domain0 00000000,00000000,00000000,00000055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00000000,00000000,00000000,000000ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 00000000,00000000,00000000,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain3 00000000,00000000,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain4 ffffffff,ffffffff,ffffffff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 24177439200 413887604 75393
domain0 00000000,00000000,00000000,000000aa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>

Srikar Dronamraju (11):
  powerpc/smp: Cache node for reuse
  powerpc/smp: Merge Power9 topology with Power topology
  powerpc/smp: Move powerpc_topology above
  powerpc/smp: Enable small core scheduling sooner
  powerpc/smp: Dont assume l2-cache to be superset of sibling
  powerpc/smp: Generalize 2nd sched domain
  Powerpc/numa: Detect support for coregroup
  powerpc/smp: Allocate cpumask only after searching thread group
  Powerpc/smp: Create coregroup domain
  powerpc/smp: Implement cpu_to_coregroup_id
  powerpc/smp: Provide an ability to disable coregroup

 arch/powerpc/include/asm/smp.h      |   1 +
 arch/powerpc/include/asm/topology.h |  10 +
 arch/powerpc/kernel/smp.c           | 277 +++++++++++++++++-----------
 arch/powerpc/mm/numa.c              |  56 ++++--
 4 files changed, 226 insertions(+), 118 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 01/11] powerpc/smp: Cache node for reuse
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  4:51   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology Srikar Dronamraju
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

While cpu_to_node is inline function with access to per_cpu variable.
However when using repeatedly, it may be cleaner to cache it in a local
variable.

Also fix a build error in a some weird config.
"error: _numa_cpu_lookup_table_ undeclared"

No functional change

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 73199470c265..680c0edcc59d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -843,7 +843,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 
 	DBG("smp_prepare_cpus\n");
 
-	/* 
+	/*
 	 * setup_cpu may need to be called on the boot cpu. We havent
 	 * spun any cpus up but lets be paranoid.
 	 */
@@ -854,20 +854,24 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	cpu_callin_map[boot_cpuid] = 1;
 
 	for_each_possible_cpu(cpu) {
+		int node = cpu_to_node(cpu);
+
 		zalloc_cpumask_var_node(&per_cpu(cpu_sibling_map, cpu),
-					GFP_KERNEL, cpu_to_node(cpu));
+					GFP_KERNEL, node);
 		zalloc_cpumask_var_node(&per_cpu(cpu_l2_cache_map, cpu),
-					GFP_KERNEL, cpu_to_node(cpu));
+					GFP_KERNEL, node);
 		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
-					GFP_KERNEL, cpu_to_node(cpu));
+					GFP_KERNEL, node);
+#ifdef CONFIG_NEED_MULTIPLE_NODES
 		/*
 		 * numa_node_id() works after this.
 		 */
 		if (cpu_present(cpu)) {
-			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
-			set_cpu_numa_mem(cpu,
-				local_memory_node(numa_cpu_lookup_table[cpu]));
+			node = numa_cpu_lookup_table[cpu];
+			set_cpu_numa_node(cpu, node);
+			set_cpu_numa_mem(cpu, local_memory_node(node));
 		}
+#endif
 	}
 
 	/* Init the cpumasks so the boot CPU is related to itself */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
  2020-07-14  4:36 ` [PATCH 01/11] powerpc/smp: Cache node for reuse Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  5:44   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 03/11] powerpc/smp: Move powerpc_topology above Srikar Dronamraju
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

A new sched_domain_topology_level was added just for Power9. However the
same can be achieved by merging powerpc_topology with power9_topology
and makes the code more simpler especially when adding a new sched
domain.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 680c0edcc59d..069ea4b21c6d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
 }
 
 #ifdef CONFIG_SCHED_SMT
-/* cpumask of CPUs with asymetric SMT dependancy */
+/* cpumask of CPUs with asymmetric SMT dependency */
 static int powerpc_smt_flags(void)
 {
 	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
@@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
 }
 #endif
 
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
-	{ NULL, },
-};
-
 /*
  * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
  * This topology makes it *much* cheaper to migrate tasks between adjacent cores
@@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
  */
 static const struct cpumask *shared_cache_mask(int cpu)
 {
-	return cpu_l2_cache_mask(cpu);
+	if (shared_caches)
+		return cpu_l2_cache_mask(cpu);
+
+	if (has_big_cores)
+		return cpu_smallcore_mask(cpu);
+
+	return cpu_smt_mask(cpu);
 }
 
 #ifdef CONFIG_SCHED_SMT
@@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
 }
 #endif
 
-static struct sched_domain_topology_level power9_topology[] = {
+static struct sched_domain_topology_level powerpc_topology[] = {
 #ifdef CONFIG_SCHED_SMT
 	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
 #endif
@@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
 #ifdef CONFIG_SCHED_SMT
 	if (has_big_cores) {
 		pr_info("Big cores detected but using small core scheduling\n");
-		power9_topology[0].mask = smallcore_smt_mask;
 		powerpc_topology[0].mask = smallcore_smt_mask;
 	}
 #endif
-	/*
-	 * If any CPU detects that it's sharing a cache with another CPU then
-	 * use the deeper topology that is aware of this sharing.
-	 */
-	if (shared_caches) {
-		pr_info("Using shared cache scheduler topology\n");
-		set_sched_topology(power9_topology);
-	} else {
-		pr_info("Using standard scheduler topology\n");
-		set_sched_topology(powerpc_topology);
-	}
+	set_sched_topology(powerpc_topology);
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 03/11] powerpc/smp: Move powerpc_topology above
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
  2020-07-14  4:36 ` [PATCH 01/11] powerpc/smp: Cache node for reuse Srikar Dronamraju
  2020-07-14  4:36 ` [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  5:45   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner Srikar Dronamraju
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Just moving the powerpc_topology description above.
This will help in using functions in this file and avoid declarations.

No other functional changes

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 116 +++++++++++++++++++-------------------
 1 file changed, 58 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 069ea4b21c6d..24529f6134aa 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -818,6 +818,64 @@ static int init_cpu_l1_cache_map(int cpu)
 	return err;
 }
 
+static bool shared_caches;
+
+#ifdef CONFIG_SCHED_SMT
+/* cpumask of CPUs with asymmetric SMT dependency */
+static int powerpc_smt_flags(void)
+{
+	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
+
+	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
+		flags |= SD_ASYM_PACKING;
+	}
+	return flags;
+}
+#endif
+
+/*
+ * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
+ * This topology makes it *much* cheaper to migrate tasks between adjacent cores
+ * since the migrated task remains cache hot. We want to take advantage of this
+ * at the scheduler level so an extra topology level is required.
+ */
+static int powerpc_shared_cache_flags(void)
+{
+	return SD_SHARE_PKG_RESOURCES;
+}
+
+/*
+ * We can't just pass cpu_l2_cache_mask() directly because
+ * returns a non-const pointer and the compiler barfs on that.
+ */
+static const struct cpumask *shared_cache_mask(int cpu)
+{
+	if (shared_caches)
+		return cpu_l2_cache_mask(cpu);
+
+	if (has_big_cores)
+		return cpu_smallcore_mask(cpu);
+
+	return cpu_smt_mask(cpu);
+}
+
+#ifdef CONFIG_SCHED_SMT
+static const struct cpumask *smallcore_smt_mask(int cpu)
+{
+	return cpu_smallcore_mask(cpu);
+}
+#endif
+
+static struct sched_domain_topology_level powerpc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 static int init_big_cores(void)
 {
 	int cpu;
@@ -1249,8 +1307,6 @@ static void add_cpu_to_masks(int cpu)
 			set_cpus_related(cpu, i, cpu_core_mask);
 }
 
-static bool shared_caches;
-
 /* Activate a secondary processor. */
 void start_secondary(void *unused)
 {
@@ -1314,62 +1370,6 @@ int setup_profiling_timer(unsigned int multiplier)
 	return 0;
 }
 
-#ifdef CONFIG_SCHED_SMT
-/* cpumask of CPUs with asymmetric SMT dependency */
-static int powerpc_smt_flags(void)
-{
-	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
-
-	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
-		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-		flags |= SD_ASYM_PACKING;
-	}
-	return flags;
-}
-#endif
-
-/*
- * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
- * This topology makes it *much* cheaper to migrate tasks between adjacent cores
- * since the migrated task remains cache hot. We want to take advantage of this
- * at the scheduler level so an extra topology level is required.
- */
-static int powerpc_shared_cache_flags(void)
-{
-	return SD_SHARE_PKG_RESOURCES;
-}
-
-/*
- * We can't just pass cpu_l2_cache_mask() directly because
- * returns a non-const pointer and the compiler barfs on that.
- */
-static const struct cpumask *shared_cache_mask(int cpu)
-{
-	if (shared_caches)
-		return cpu_l2_cache_mask(cpu);
-
-	if (has_big_cores)
-		return cpu_smallcore_mask(cpu);
-
-	return cpu_smt_mask(cpu);
-}
-
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *smallcore_smt_mask(int cpu)
-{
-	return cpu_smallcore_mask(cpu);
-}
-#endif
-
-static struct sched_domain_topology_level powerpc_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
-#endif
-	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
-	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
-	{ NULL, },
-};
-
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 03/11] powerpc/smp: Move powerpc_topology above Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  5:48   ` Gautham R Shenoy
  2020-07-20  7:47   ` Jordan Niethe
  2020-07-14  4:36 ` [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling Srikar Dronamraju
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Enable small core scheduling as soon as we detect that we are in a
system that supports thread group. Doing so would avoid a redundant
check.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 24529f6134aa..7d430fc536cc 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -892,6 +892,12 @@ static int init_big_cores(void)
 	}
 
 	has_big_cores = true;
+
+#ifdef CONFIG_SCHED_SMT
+	pr_info("Big cores detected. Using small core scheduling\n");
+	powerpc_topology[0].mask = smallcore_smt_mask;
+#endif
+
 	return 0;
 }
 
@@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 	dump_numa_cpu_topology();
 
-#ifdef CONFIG_SCHED_SMT
-	if (has_big_cores) {
-		pr_info("Big cores detected but using small core scheduling\n");
-		powerpc_topology[0].mask = smallcore_smt_mask;
-	}
-#endif
 	set_sched_topology(powerpc_topology);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (3 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-14  5:40   ` Oliver O'Halloran
  2020-07-17  6:00   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain Srikar Dronamraju
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Current code assumes that cpumask of cpus sharing a l2-cache mask will
always be a superset of cpu_sibling_mask.

Lets stop that assumption.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 7d430fc536cc..875f57e41355 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
 	struct device_node *l2_cache, *np;
 	int i;
 
+	cpumask_set_cpu(cpu, mask_fn(cpu));
 	l2_cache = cpu_to_l2cache(cpu);
 	if (!l2_cache)
 		return false;
@@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
 	 * add it to it's own thread sibling mask.
 	 */
 	cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
+	cpumask_set_cpu(cpu, cpu_core_mask(cpu));
 
 	for (i = first_thread; i < first_thread + threads_per_core; i++)
 		if (cpu_online(i))
 			set_cpus_related(i, cpu, cpu_sibling_mask);
 
 	add_cpu_to_smallcore_masks(cpu);
-	/*
-	 * Copy the thread sibling mask into the cache sibling mask
-	 * and mark any CPUs that share an L2 with this CPU.
-	 */
-	for_each_cpu(i, cpu_sibling_mask(cpu))
-		set_cpus_related(cpu, i, cpu_l2_cache_mask);
 	update_mask_by_l2(cpu, cpu_l2_cache_mask);
 
-	/*
-	 * Copy the cache sibling mask into core sibling mask and mark
-	 * any CPUs on the same chip as this CPU.
-	 */
-	for_each_cpu(i, cpu_l2_cache_mask(cpu))
-		set_cpus_related(cpu, i, cpu_core_mask);
+	if (pkg_id == -1) {
+		struct cpumask *(*mask)(int) = cpu_sibling_mask;
+
+		/*
+		 * Copy the sibling mask into core sibling mask and
+		 * mark any CPUs on the same chip as this CPU.
+		 */
+		if (shared_caches)
+			mask = cpu_l2_cache_mask;
+
+		for_each_cpu(i, mask(cpu))
+			set_cpus_related(cpu, i, cpu_core_mask);
 
-	if (pkg_id == -1)
 		return;
+	}
 
 	for_each_cpu(i, cpu_online_mask)
 		if (get_physical_package_id(i) == pkg_id)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (4 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  6:37   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 07/11] Powerpc/numa: Detect support for coregroup Srikar Dronamraju
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Currently "CACHE" domain happens to be the 2nd sched domain as per
powerpc_topology. This domain will collapse if cpumask of l2-cache is
same as SMT domain. However we could generalize this domain such that it
could mean either be a "CACHE" domain or a "BIGCORE" domain.

While setting up the "CACHE" domain, check if shared_cache is already
set.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 48 +++++++++++++++++++++++++++------------
 1 file changed, 34 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 875f57e41355..f8faf75135af 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -85,6 +85,14 @@ EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 EXPORT_SYMBOL_GPL(has_big_cores);
 
+enum {
+#ifdef CONFIG_SCHED_SMT
+	smt_idx,
+#endif
+	bigcore_idx,
+	die_idx,
+};
+
 #define MAX_THREAD_LIST_SIZE	8
 #define THREAD_GROUP_SHARE_L1   1
 struct thread_groups {
@@ -851,13 +859,7 @@ static int powerpc_shared_cache_flags(void)
  */
 static const struct cpumask *shared_cache_mask(int cpu)
 {
-	if (shared_caches)
-		return cpu_l2_cache_mask(cpu);
-
-	if (has_big_cores)
-		return cpu_smallcore_mask(cpu);
-
-	return cpu_smt_mask(cpu);
+	return per_cpu(cpu_l2_cache_map, cpu);
 }
 
 #ifdef CONFIG_SCHED_SMT
@@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
 }
 #endif
 
+static const struct cpumask *cpu_bigcore_mask(int cpu)
+{
+	return cpu_core_mask(cpu);
+}
+
 static struct sched_domain_topology_level powerpc_topology[] = {
 #ifdef CONFIG_SCHED_SMT
 	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
 #endif
-	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
+	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
 };
@@ -895,7 +902,7 @@ static int init_big_cores(void)
 
 #ifdef CONFIG_SCHED_SMT
 	pr_info("Big cores detected. Using small core scheduling\n");
-	powerpc_topology[0].mask = smallcore_smt_mask;
+	powerpc_topology[smt_idx].mask = smallcore_smt_mask;
 #endif
 
 	return 0;
@@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
 void start_secondary(void *unused)
 {
 	unsigned int cpu = smp_processor_id();
-	struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
 
 	mmgrab(&init_mm);
 	current->active_mm = &init_mm;
@@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
 	/* Update topology CPU masks */
 	add_cpu_to_masks(cpu);
 
-	if (has_big_cores)
-		sibling_mask = cpu_smallcore_mask;
 	/*
 	 * Check for any shared caches. Note that this must be done on a
 	 * per-core basis because one core in the pair might be disabled.
 	 */
-	if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
-		shared_caches = true;
+	if (!shared_caches) {
+		struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
+		struct cpumask *mask = cpu_l2_cache_mask(cpu);
+
+		if (has_big_cores)
+			sibling_mask = cpu_smallcore_mask;
+
+		if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
+			shared_caches = true;
+	}
 
 	set_numa_node(numa_cpu_lookup_table[cpu]);
 	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
@@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
 		smp_ops->bringup_done();
 
 	dump_numa_cpu_topology();
+	if (shared_caches) {
+		pr_info("Using shared cache scheduler topology\n");
+		powerpc_topology[bigcore_idx].mask = shared_cache_mask;
+#ifdef CONFIG_SCHED_DEBUG
+		powerpc_topology[bigcore_idx].name = "CACHE";
+#endif
+		powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
+	}
 
 	set_sched_topology(powerpc_topology);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 07/11] Powerpc/numa: Detect support for coregroup
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (5 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  8:08   ` Gautham R Shenoy
  2020-07-20 13:56   ` Michael Ellerman
  2020-07-14  4:36 ` [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group Srikar Dronamraju
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Add support for grouping cores based on the device-tree classification.
- The last domain in the associativity domains always refers to the
core.
- If primary reference domain happens to be the penultimate domain in
the associativity domains device-tree property, then there are no
coregroups. However if its not a penultimate domain, then there are
coregroups. There can be more than one coregroup. For now we would be
interested in the last or the smallest coregroups.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/smp.h |  1 +
 arch/powerpc/kernel/smp.c      |  1 +
 arch/powerpc/mm/numa.c         | 34 +++++++++++++++++++++-------------
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 49a25e2400f2..5bdc17a7049f 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -28,6 +28,7 @@
 extern int boot_cpuid;
 extern int spinning_secondaries;
 extern u32 *cpu_to_phys_id;
+extern bool coregroup_enabled;
 
 extern void cpu_die(void);
 extern int cpu_to_chip_id(int cpu);
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index f8faf75135af..96e47450d9b3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -74,6 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 };
 
 struct task_struct *secondary_current;
 bool has_big_cores;
+bool coregroup_enabled;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index fc7b0505bdd8..a43eab455be4 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -887,7 +887,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 static void __init find_possible_nodes(void)
 {
 	struct device_node *rtas;
-	u32 numnodes, i;
+	const __be32 *domains;
+	int prop_length, max_nodes;
+	u32 i;
 
 	if (!numa_enabled)
 		return;
@@ -896,25 +898,31 @@ static void __init find_possible_nodes(void)
 	if (!rtas)
 		return;
 
-	if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
-				min_common_depth, &numnodes)) {
-		/*
-		 * ibm,current-associativity-domains is a fairly recent
-		 * property. If it doesn't exist, then fallback on
-		 * ibm,max-associativity-domains. Current denotes what the
-		 * platform can support compared to max which denotes what the
-		 * Hypervisor can support.
-		 */
-		if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains",
-				min_common_depth, &numnodes))
+	/*
+	 * ibm,current-associativity-domains is a fairly recent property. If
+	 * it doesn't exist, then fallback on ibm,max-associativity-domains.
+	 * Current denotes what the platform can support compared to max
+	 * which denotes what the Hypervisor can support.
+	 */
+	domains = of_get_property(rtas, "ibm,current-associativity-domains",
+					&prop_length);
+	if (!domains) {
+		domains = of_get_property(rtas, "ibm,max-associativity-domains",
+					&prop_length);
+		if (!domains)
 			goto out;
 	}
 
-	for (i = 0; i < numnodes; i++) {
+	max_nodes = of_read_number(&domains[min_common_depth], 1);
+	for (i = 0; i < max_nodes; i++) {
 		if (!node_possible(i))
 			node_set(i, node_possible_map);
 	}
 
+	prop_length /= sizeof(int);
+	if (prop_length > min_common_depth + 2)
+		coregroup_enabled = 1;
+
 out:
 	of_node_put(rtas);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (6 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 07/11] Powerpc/numa: Detect support for coregroup Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  8:08   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 09/11] Powerpc/smp: Create coregroup domain Srikar Dronamraju
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

If allocated earlier and the search fails, then cpumask need to be
freed. However cpu_l1_cache_map can be allocated after we search thread
group.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 96e47450d9b3..ef19eeccd21e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -797,10 +797,6 @@ static int init_cpu_l1_cache_map(int cpu)
 	if (err)
 		goto out;
 
-	zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
-				GFP_KERNEL,
-				cpu_to_node(cpu));
-
 	cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
 
 	if (unlikely(cpu_group_start == -1)) {
@@ -809,6 +805,9 @@ static int init_cpu_l1_cache_map(int cpu)
 		goto out;
 	}
 
+	zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
+				GFP_KERNEL, cpu_to_node(cpu));
+
 	for (i = first_thread; i < first_thread + threads_per_core; i++) {
 		int i_group_start = get_cpu_thread_group_start(i, &tg);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 09/11] Powerpc/smp: Create coregroup domain
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (7 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  8:19   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id Srikar Dronamraju
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Add percpu coregroup maps and masks to create coregroup domain.
If a coregroup doesn't exist, the coregroup domain will be degenerated
in favour of SMT/CACHE domain.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h | 10 ++++++++
 arch/powerpc/kernel/smp.c           | 37 +++++++++++++++++++++++++++++
 arch/powerpc/mm/numa.c              |  5 ++++
 3 files changed, 52 insertions(+)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 2db7ba789720..34812c35018e 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -98,6 +98,7 @@ extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 extern int timed_topology_update(int nsecs);
+extern int cpu_to_coregroup_id(int cpu);
 #else
 static inline int start_topology_update(void)
 {
@@ -120,6 +121,15 @@ static inline int timed_topology_update(int nsecs)
 	return 0;
 }
 
+static inline int cpu_to_coregroup_id(int cpu)
+{
+#ifdef CONFIG_SMP
+	return cpu_to_core_id(cpu);
+#else
+	return 0;
+#endif
+}
+
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ef19eeccd21e..bb25c13bbb79 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -80,6 +80,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
+DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
 
 EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
@@ -91,6 +92,7 @@ enum {
 	smt_idx,
 #endif
 	bigcore_idx,
+	mc_idx,
 	die_idx,
 };
 
@@ -869,6 +871,21 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
 }
 #endif
 
+static struct cpumask *cpu_coregroup_mask(int cpu)
+{
+	return per_cpu(cpu_coregroup_map, cpu);
+}
+
+static bool has_coregroup_support(void)
+{
+	return coregroup_enabled;
+}
+
+static const struct cpumask *cpu_mc_mask(int cpu)
+{
+	return cpu_coregroup_mask(cpu);
+}
+
 static const struct cpumask *cpu_bigcore_mask(int cpu)
 {
 	return cpu_core_mask(cpu);
@@ -879,6 +896,7 @@ static struct sched_domain_topology_level powerpc_topology[] = {
 	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
+	{ cpu_mc_mask, SD_INIT_NAME(MC) },
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
 };
@@ -933,6 +951,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 					GFP_KERNEL, node);
 		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
 					GFP_KERNEL, node);
+		if (has_coregroup_support())
+			zalloc_cpumask_var_node(&per_cpu(cpu_coregroup_map, cpu),
+						GFP_KERNEL, node);
+
 #ifdef CONFIG_NEED_MULTIPLE_NODES
 		/*
 		 * numa_node_id() works after this.
@@ -950,6 +972,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
 	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
 
+	if (has_coregroup_support())
+		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
+	else
+		powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
+
 	init_big_cores();
 	if (has_big_cores) {
 		cpumask_set_cpu(boot_cpuid,
@@ -1241,6 +1268,8 @@ static void remove_cpu_from_masks(int cpu)
 		set_cpus_unrelated(cpu, i, cpu_sibling_mask);
 		if (has_big_cores)
 			set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
+		if (has_coregroup_support())
+			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
 	}
 }
 #endif
@@ -1301,6 +1330,14 @@ static void add_cpu_to_masks(int cpu)
 	add_cpu_to_smallcore_masks(cpu);
 	update_mask_by_l2(cpu, cpu_l2_cache_mask);
 
+	if (has_coregroup_support()) {
+		cpumask_set_cpu(cpu, cpu_coregroup_mask(cpu));
+		for_each_cpu(i, cpu_online_mask) {
+			if (cpu_to_coregroup_id(cpu) == cpu_to_coregroup_id(i))
+				set_cpus_related(cpu, i, cpu_coregroup_mask);
+		}
+	}
+
 	if (pkg_id == -1) {
 		struct cpumask *(*mask)(int) = cpu_sibling_mask;
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a43eab455be4..d9ab9da85eab 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1695,6 +1695,11 @@ static const struct proc_ops topology_proc_ops = {
 	.proc_release	= single_release,
 };
 
+int cpu_to_coregroup_id(int cpu)
+{
+	return cpu_to_core_id(cpu);
+}
+
 static int topology_update_init(void)
 {
 	start_topology_update();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (8 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 09/11] Powerpc/smp: Create coregroup domain Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  8:26   ` Gautham R Shenoy
  2020-07-14  4:36 ` [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup Srikar Dronamraju
  2020-07-14  5:06 ` [PATCH 00/11] Support for grouping cores Srikar Dronamraju
  11 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

Lookup the coregroup id from the associativity array.

If unable to detect the coregroup id, fallback on the core id.
This way, ensure sched_domain degenerates and an extra sched domain is
not created.

Ideally this function should have been implemented in
arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
don't need to find the primary domain again.

If the device-tree mentions more than one coregroup, then kernel
implements only the last or the smallest coregroup, which currently
corresponds to the penultimate domain in the device-tree.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index d9ab9da85eab..4e85564ef62a 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
 
 int cpu_to_coregroup_id(int cpu)
 {
+	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	int index;
+
+	if (cpu < 0 || cpu > nr_cpu_ids)
+		return -1;
+
+	if (!firmware_has_feature(FW_FEATURE_VPHN))
+		goto out;
+
+	if (vphn_get_associativity(cpu, associativity))
+		goto out;
+
+	index = of_read_number(associativity, 1);
+	if ((index > min_common_depth + 1) && coregroup_enabled)
+		return of_read_number(&associativity[index - 1], 1);
+
+out:
 	return cpu_to_core_id(cpu);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (9 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id Srikar Dronamraju
@ 2020-07-14  4:36 ` Srikar Dronamraju
  2020-07-17  8:28   ` Gautham R Shenoy
  2020-07-20 13:57   ` Michael Ellerman
  2020-07-14  5:06 ` [PATCH 00/11] Support for grouping cores Srikar Dronamraju
  11 siblings, 2 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  4:36 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Michael Ellerman,
	Anton Blanchard, linuxppc-dev, Nick Piggin

If user wants to enable coregroup sched_domain then they can boot with
kernel parameter "coregroup_support=on"

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Cc: Nick Piggin <npiggin@au1.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@linux.ibm.com>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index bb25c13bbb79..c43909e6e8e9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -118,6 +118,23 @@ struct smp_ops_t *smp_ops;
 volatile unsigned int cpu_callin_map[NR_CPUS];
 
 int smt_enabled_at_boot = 1;
+int coregroup_support;
+
+static int __init early_coregroup(char *p)
+{
+	if (!p)
+		return 0;
+
+	if (strstr(p, "on"))
+		coregroup_support = 1;
+
+	if (strstr(p, "1"))
+		coregroup_support = 1;
+
+	return 0;
+}
+
+early_param("coregroup_support", early_coregroup);
 
 /*
  * Returns 1 if the specified cpu should be brought up during boot.
@@ -878,7 +895,7 @@ static struct cpumask *cpu_coregroup_mask(int cpu)
 
 static bool has_coregroup_support(void)
 {
-	return coregroup_enabled;
+	return coregroup_enabled && coregroup_support;
 }
 
 static const struct cpumask *cpu_mc_mask(int cpu)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 00/11] Support for grouping cores
  2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
                   ` (10 preceding siblings ...)
  2020-07-14  4:36 ` [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup Srikar Dronamraju
@ 2020-07-14  5:06 ` Srikar Dronamraju
  11 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  5:06 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

* Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2020-07-14 10:06:13]:

> 
> On Power 9 (with device-tree enablement to show coregroups).
> -----------------------------------------------------------

With help of Gautham, I tried to kexec into a newer kernel with a modified
dtb. However when passing with the dtb option, kexec would get stuck.

Hence alternatively, I hardcoded some assumptions to test the same.
These hunks apply on top of all the patches.

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8ec7ff05ae47..762aa573c313 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -56,6 +56,12 @@ static int n_mem_addr_cells, n_mem_size_cells;
 static int form1_affinity;
 
 #define MAX_DISTANCE_REF_POINTS 4
+/*
+ *  HAck2: On Power9, there are 12 SMT8 cores per chip.
+ *  Hard code this for now.
+ */
+#define CORES_PER_CHIP 12
+#define CORES_PER_COREGROUP (CORES_PER_CHIP / 2)
 static int distance_ref_points_depth;
 static const __be32 *distance_ref_points;
 static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS];
@@ -326,6 +332,13 @@ static int __init find_min_common_depth(void)
 
 	if (form1_affinity) {
 		depth = of_read_number(distance_ref_points, 1);
+		/*
+		 * Hack: Hack: Hack:
+		 * For now only on Phyp machines.
+		 */
+		if (!firmware_has_feature(FW_FEATURE_OPAL))
+			depth--;
 	} else {
 		if (distance_ref_points_depth < 2) {
 			printk(KERN_WARNING "NUMA: "
@@ -1247,8 +1260,8 @@ int cpu_to_coregroup_id(int cpu)
 		goto out;
 
 	index = of_read_number(associativity, 1);
-	if ((index > min_common_depth + 1) && coregroup_enabled)
-		return of_read_number(&associativity[index - 1], 1);
+	if ((index > min_common_depth + 1) && coregroup_enabled && has_big_cores)
+		return of_read_number(&associativity[index], 1) / CORES_PER_COREGROUP;
 
 out:
 	return cpu_to_core_id(cpu);
-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-14  4:36 ` [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling Srikar Dronamraju
@ 2020-07-14  5:40   ` Oliver O'Halloran
  2020-07-14  6:30     ` Srikar Dronamraju
  2020-07-17  6:00   ` Gautham R Shenoy
  1 sibling, 1 reply; 42+ messages in thread
From: Oliver O'Halloran @ 2020-07-14  5:40 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 2:45 PM Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> Current code assumes that cpumask of cpus sharing a l2-cache mask will
> always be a superset of cpu_sibling_mask.
>
> Lets stop that assumption.

It's been a while since I looked, but I'm pretty sure the scheduler
requires child domains to be subsets of their parents. Why is this
necessary or a good idea?

> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 7d430fc536cc..875f57e41355 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
>         struct device_node *l2_cache, *np;
>         int i;
>
> +       cpumask_set_cpu(cpu, mask_fn(cpu));

?

>         l2_cache = cpu_to_l2cache(cpu);
>         if (!l2_cache)
>                 return false;
> @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
>          * add it to it's own thread sibling mask.
>          */
>         cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> +       cpumask_set_cpu(cpu, cpu_core_mask(cpu));
>
>         for (i = first_thread; i < first_thread + threads_per_core; i++)
>                 if (cpu_online(i))
>                         set_cpus_related(i, cpu, cpu_sibling_mask);
>
>         add_cpu_to_smallcore_masks(cpu);
> -       /*
> -        * Copy the thread sibling mask into the cache sibling mask
> -        * and mark any CPUs that share an L2 with this CPU.
> -        */
> -       for_each_cpu(i, cpu_sibling_mask(cpu))
> -               set_cpus_related(cpu, i, cpu_l2_cache_mask);
>         update_mask_by_l2(cpu, cpu_l2_cache_mask);
>
> -       /*
> -        * Copy the cache sibling mask into core sibling mask and mark
> -        * any CPUs on the same chip as this CPU.
> -        */
> -       for_each_cpu(i, cpu_l2_cache_mask(cpu))
> -               set_cpus_related(cpu, i, cpu_core_mask);
> +       if (pkg_id == -1) {
> +               struct cpumask *(*mask)(int) = cpu_sibling_mask;
> +
> +               /*
> +                * Copy the sibling mask into core sibling mask and
> +                * mark any CPUs on the same chip as this CPU.
> +                */
> +               if (shared_caches)
> +                       mask = cpu_l2_cache_mask;
> +
> +               for_each_cpu(i, mask(cpu))
> +                       set_cpus_related(cpu, i, cpu_core_mask);
>
> -       if (pkg_id == -1)
>                 return;
> +       }
>
>         for_each_cpu(i, cpu_online_mask)
>                 if (get_physical_package_id(i) == pkg_id)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-14  5:40   ` Oliver O'Halloran
@ 2020-07-14  6:30     ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-14  6:30 UTC (permalink / raw)
  To: Oliver O'Halloran
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

* Oliver O'Halloran <oohall@gmail.com> [2020-07-14 15:40:09]:

> On Tue, Jul 14, 2020 at 2:45 PM Srikar Dronamraju
> <srikar@linux.vnet.ibm.com> wrote:
> >
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> >
> > Lets stop that assumption.
> 
> It's been a while since I looked, but I'm pretty sure the scheduler
> requires child domains to be subsets of their parents. Why is this
> necessary or a good idea?

Thanks for looking into the patches.

Yes the scheduler requires the child domains to be subsets of their parents.

Current code assumes that the l2_cache is always a superset of sibling mask.
However there may be processors in future whose sibling mask maynot be a
superset. 

Lets for example we have a chip with 16 threads and 8 threads share
l2-cache, i.e 8 threads are acting like a small core and 16 threads are
acting like a big core. Then the assumption that l2-cache mask is a superset
of cpu_sibling mask would be wrong.

> 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
> >  1 file changed, 15 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 7d430fc536cc..875f57e41355 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
> >         struct device_node *l2_cache, *np;
> >         int i;
> >
> > +       cpumask_set_cpu(cpu, mask_fn(cpu));
> 
> ?

At the time the cpumasks are updated, the cpu is not yet part of the
cpu_online_mask. So when we online/offline the cpus, the masks will end up
not having itself and causes the scheduler to bork.

Previously (as we can note in code below thats removed), we were doing as
part of updating all cpus that were part of the cpu_sibling_mask before
calling update_mask_by_l2.

> 
> >         l2_cache = cpu_to_l2cache(cpu);
> >         if (!l2_cache)
> >                 return false;
> > @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
> >          * add it to it's own thread sibling mask.
> >          */
> >         cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > +       cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> >
> >         for (i = first_thread; i < first_thread + threads_per_core; i++)
> >                 if (cpu_online(i))
> >                         set_cpus_related(i, cpu, cpu_sibling_mask);
> >
> >         add_cpu_to_smallcore_masks(cpu);
> > -       /*
> > -        * Copy the thread sibling mask into the cache sibling mask
> > -        * and mark any CPUs that share an L2 with this CPU.
> > -        */
> > -       for_each_cpu(i, cpu_sibling_mask(cpu))
> > -               set_cpus_related(cpu, i, cpu_l2_cache_mask);

I am referring to this code above. This would have updated the self in its
cpumask. For the rest of the cpus in the cpu_sibling_mask, they get updated
correctly in the update_mask_by_l2.

> >         update_mask_by_l2(cpu, cpu_l2_cache_mask);
> >
> > -       /*
> > -        * Copy the cache sibling mask into core sibling mask and mark
> > -        * any CPUs on the same chip as this CPU.
> > -        */

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 01/11] powerpc/smp: Cache node for reuse
  2020-07-14  4:36 ` [PATCH 01/11] powerpc/smp: Cache node for reuse Srikar Dronamraju
@ 2020-07-17  4:51   ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  4:51 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:14AM +0530, Srikar Dronamraju wrote:
> While cpu_to_node is inline function with access to per_cpu variable.
> However when using repeatedly, it may be cleaner to cache it in a local
> variable.
> 
> Also fix a build error in a some weird config.
> "error: _numa_cpu_lookup_table_ undeclared"
> 
> No functional change
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


LGTM.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/smp.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 73199470c265..680c0edcc59d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -843,7 +843,7 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> 
>  	DBG("smp_prepare_cpus\n");
> 
> -	/* 
> +	/*
>  	 * setup_cpu may need to be called on the boot cpu. We havent
>  	 * spun any cpus up but lets be paranoid.
>  	 */
> @@ -854,20 +854,24 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	cpu_callin_map[boot_cpuid] = 1;
> 
>  	for_each_possible_cpu(cpu) {
> +		int node = cpu_to_node(cpu);
> +
>  		zalloc_cpumask_var_node(&per_cpu(cpu_sibling_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
>  		zalloc_cpumask_var_node(&per_cpu(cpu_l2_cache_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
>  		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
> -					GFP_KERNEL, cpu_to_node(cpu));
> +					GFP_KERNEL, node);
> +#ifdef CONFIG_NEED_MULTIPLE_NODES
>  		/*
>  		 * numa_node_id() works after this.
>  		 */
>  		if (cpu_present(cpu)) {
> -			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> -			set_cpu_numa_mem(cpu,
> -				local_memory_node(numa_cpu_lookup_table[cpu]));
> +			node = numa_cpu_lookup_table[cpu];
> +			set_cpu_numa_node(cpu, node);
> +			set_cpu_numa_mem(cpu, local_memory_node(node));
>  		}
> +#endif
>  	}
> 
>  	/* Init the cpumasks so the boot CPU is related to itself */
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
  2020-07-14  4:36 ` [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology Srikar Dronamraju
@ 2020-07-17  5:44   ` Gautham R Shenoy
  2020-07-20  8:10     ` Srikar Dronamraju
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  5:44 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

Hi Srikar,

On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> A new sched_domain_topology_level was added just for Power9. However the
> same can be achieved by merging powerpc_topology with power9_topology
> and makes the code more simpler especially when adding a new sched
> domain.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
>  1 file changed, 10 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 680c0edcc59d..069ea4b21c6d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymetric SMT dependancy */
> +/* cpumask of CPUs with asymmetric SMT dependency */
>  static int powerpc_smt_flags(void)
>  {
>  	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -	{ NULL, },
> -};
> -
>  /*
>   * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
>   * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
>   */
>  static const struct cpumask *shared_cache_mask(int cpu)
>  {
> -	return cpu_l2_cache_mask(cpu);
> +	if (shared_caches)
> +		return cpu_l2_cache_mask(cpu);
> +
> +	if (has_big_cores)
> +		return cpu_smallcore_mask(cpu);
> +
> +	return cpu_smt_mask(cpu);
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> -static struct sched_domain_topology_level power9_topology[] = {
> +static struct sched_domain_topology_level powerpc_topology[] = {


>  #ifdef CONFIG_SCHED_SMT
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
> @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  #ifdef CONFIG_SCHED_SMT
>  	if (has_big_cores) {
>  		pr_info("Big cores detected but using small core scheduling\n");
I> -		power9_topology[0].mask = smallcore_smt_mask;
>  		powerpc_topology[0].mask = smallcore_smt_mask;
>  	}
>  #endif
> -	/*
> -	 * If any CPU detects that it's sharing a cache with another CPU then
> -	 * use the deeper topology that is aware of this sharing.
> -	 */
> -	if (shared_caches) {
> -		pr_info("Using shared cache scheduler topology\n");
> -		set_sched_topology(power9_topology);
> -	} else {
> -		pr_info("Using standard scheduler topology\n");
> -		set_sched_topology(powerpc_topology);


Ok, so we will go with the three level topology by default (SMT,
CACHE, DIE) and will rely on the sched-domain creation code to
degenerate CACHE domain in case SMT and CACHE have the same set of
CPUs (POWER8 for eg).

From a cleanup perspective this is better, since we won't have to
worry about defining multiple topology structures, but from a
performance point of view, wouldn't we now pay an extra penalty of
degenerating the CACHE domains on POWER8 kind of systems, each time
when a CPU comes online ?

Do we know how bad it is ? If the degeneration takes a few extra
microseconds, that should be ok I suppose.

--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 03/11] powerpc/smp: Move powerpc_topology above
  2020-07-14  4:36 ` [PATCH 03/11] powerpc/smp: Move powerpc_topology above Srikar Dronamraju
@ 2020-07-17  5:45   ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  5:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:16AM +0530, Srikar Dronamraju wrote:
> Just moving the powerpc_topology description above.
> This will help in using functions in this file and avoid declarations.
> 
> No other functional changes
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/smp.c | 116 +++++++++++++++++++-------------------
>  1 file changed, 58 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 069ea4b21c6d..24529f6134aa 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -818,6 +818,64 @@ static int init_cpu_l1_cache_map(int cpu)
>  	return err;
>  }
> 
> +static bool shared_caches;
> +
> +#ifdef CONFIG_SCHED_SMT
> +/* cpumask of CPUs with asymmetric SMT dependency */
> +static int powerpc_smt_flags(void)
> +{
> +	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> +
> +	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> +		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> +		flags |= SD_ASYM_PACKING;
> +	}
> +	return flags;
> +}
> +#endif
> +
> +/*
> + * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> + * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> + * since the migrated task remains cache hot. We want to take advantage of this
> + * at the scheduler level so an extra topology level is required.
> + */
> +static int powerpc_shared_cache_flags(void)
> +{
> +	return SD_SHARE_PKG_RESOURCES;
> +}
> +
> +/*
> + * We can't just pass cpu_l2_cache_mask() directly because
> + * returns a non-const pointer and the compiler barfs on that.
> + */
> +static const struct cpumask *shared_cache_mask(int cpu)
> +{
> +	if (shared_caches)
> +		return cpu_l2_cache_mask(cpu);
> +
> +	if (has_big_cores)
> +		return cpu_smallcore_mask(cpu);
> +
> +	return cpu_smt_mask(cpu);
> +}
> +
> +#ifdef CONFIG_SCHED_SMT
> +static const struct cpumask *smallcore_smt_mask(int cpu)
> +{
> +	return cpu_smallcore_mask(cpu);
> +}
> +#endif
> +
> +static struct sched_domain_topology_level powerpc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> +	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +	{ NULL, },
> +};
> +
>  static int init_big_cores(void)
>  {
>  	int cpu;
> @@ -1249,8 +1307,6 @@ static void add_cpu_to_masks(int cpu)
>  			set_cpus_related(cpu, i, cpu_core_mask);
>  }
> 
> -static bool shared_caches;
> -
>  /* Activate a secondary processor. */
>  void start_secondary(void *unused)
>  {
> @@ -1314,62 +1370,6 @@ int setup_profiling_timer(unsigned int multiplier)
>  	return 0;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -/* cpumask of CPUs with asymmetric SMT dependency */
> -static int powerpc_smt_flags(void)
> -{
> -	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> -
> -	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> -		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> -		flags |= SD_ASYM_PACKING;
> -	}
> -	return flags;
> -}
> -#endif
> -
> -/*
> - * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> - * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> - * since the migrated task remains cache hot. We want to take advantage of this
> - * at the scheduler level so an extra topology level is required.
> - */
> -static int powerpc_shared_cache_flags(void)
> -{
> -	return SD_SHARE_PKG_RESOURCES;
> -}
> -
> -/*
> - * We can't just pass cpu_l2_cache_mask() directly because
> - * returns a non-const pointer and the compiler barfs on that.
> - */
> -static const struct cpumask *shared_cache_mask(int cpu)
> -{
> -	if (shared_caches)
> -		return cpu_l2_cache_mask(cpu);
> -
> -	if (has_big_cores)
> -		return cpu_smallcore_mask(cpu);
> -
> -	return cpu_smt_mask(cpu);
> -}
> -
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *smallcore_smt_mask(int cpu)
> -{
> -	return cpu_smallcore_mask(cpu);
> -}
> -#endif
> -
> -static struct sched_domain_topology_level powerpc_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> -#endif
> -	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -	{ NULL, },
> -};
> -
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>  	/*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
  2020-07-14  4:36 ` [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner Srikar Dronamraju
@ 2020-07-17  5:48   ` Gautham R Shenoy
  2020-07-20  7:20     ` Srikar Dronamraju
  2020-07-20  7:47   ` Jordan Niethe
  1 sibling, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  5:48 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:17AM +0530, Srikar Dronamraju wrote:
> Enable small core scheduling as soon as we detect that we are in a
> system that supports thread group. Doing so would avoid a redundant
> check.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

I don't see a problem with this.

However, since we are now going to be maintaining a single topology
structure, wouldn't it be better to collate all the changes being made
to the mask_functions/flags/names of this structure within a single
function so that it becomes easier to keep track of what all changes
are going into the topology and why are we doing it?


> ---
>  arch/powerpc/kernel/smp.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 24529f6134aa..7d430fc536cc 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -892,6 +892,12 @@ static int init_big_cores(void)
>  	}
> 
>  	has_big_cores = true;
> +
> +#ifdef CONFIG_SCHED_SMT
> +	pr_info("Big cores detected. Using small core scheduling\n");
> +	powerpc_topology[0].mask = smallcore_smt_mask;
> +#endif
> +
>  	return 0;
>  }
> 
> @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>  	dump_numa_cpu_topology();
> 
> -#ifdef CONFIG_SCHED_SMT
> -	if (has_big_cores) {
> -		pr_info("Big cores detected but using small core scheduling\n");
> -		powerpc_topology[0].mask = smallcore_smt_mask;
> -	}
> -#endif
>  	set_sched_topology(powerpc_topology);
>  }
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-14  4:36 ` [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling Srikar Dronamraju
  2020-07-14  5:40   ` Oliver O'Halloran
@ 2020-07-17  6:00   ` Gautham R Shenoy
  2020-07-20  6:45     ` Srikar Dronamraju
  1 sibling, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  6:00 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

Hi Srikar,

On Tue, Jul 14, 2020 at 10:06:18AM +0530, Srikar Dronamraju wrote:
> Current code assumes that cpumask of cpus sharing a l2-cache mask will
> always be a superset of cpu_sibling_mask.
> 
> Lets stop that assumption.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 7d430fc536cc..875f57e41355 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
>  	struct device_node *l2_cache, *np;
>  	int i;
> 
> +	cpumask_set_cpu(cpu, mask_fn(cpu));

It would be good to comment why do we need to do set the CPU in the
l2-mask if we don't have a l2cache domain.

>  	l2_cache = cpu_to_l2cache(cpu);
>  	if (!l2_cache)
>  		return false;
> @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
>  	 * add it to it's own thread sibling mask.
>  	 */
>  	cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> +	cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> 
>  	for (i = first_thread; i < first_thread + threads_per_core; i++)
>  		if (cpu_online(i))
>  			set_cpus_related(i, cpu, cpu_sibling_mask);
> 
>  	add_cpu_to_smallcore_masks(cpu);
> -	/*
> -	 * Copy the thread sibling mask into the cache sibling mask
> -	 * and mark any CPUs that share an L2 with this CPU.
> -	 */
> -	for_each_cpu(i, cpu_sibling_mask(cpu))
> -		set_cpus_related(cpu, i, cpu_l2_cache_mask);
>  	update_mask_by_l2(cpu, cpu_l2_cache_mask);
> 
> -	/*
> -	 * Copy the cache sibling mask into core sibling mask and mark
> -	 * any CPUs on the same chip as this CPU.
> -	 */
> -	for_each_cpu(i, cpu_l2_cache_mask(cpu))
> -		set_cpus_related(cpu, i, cpu_core_mask);
> +	if (pkg_id == -1) {
> +		struct cpumask *(*mask)(int) = cpu_sibling_mask;
> +
> +		/*
> +		 * Copy the sibling mask into core sibling mask and
> +		 * mark any CPUs on the same chip as this CPU.
> +		 */
> +		if (shared_caches)
> +			mask = cpu_l2_cache_mask;
> +


Now that we decoupling the containment relationship between
sibling_mask and l2-cache mask, should we set all the CPUs that are
both in cpu_sibling_mask(cpu) as well as cpu_l2_mask(cpu) in
cpu_core_mask ? 

> +		for_each_cpu(i, mask(cpu))
> +			set_cpus_related(cpu, i, cpu_core_mask);
> 
> -	if (pkg_id == -1)
>  		return;
> +	}
> 
>  	for_each_cpu(i, cpu_online_mask)
>  		if (get_physical_package_id(i) == pkg_id)
> -- 
> 2.17.1
> 
--
Thanks and Regards
gautham.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
  2020-07-14  4:36 ` [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain Srikar Dronamraju
@ 2020-07-17  6:37   ` Gautham R Shenoy
  2020-07-20  6:19     ` Srikar Dronamraju
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  6:37 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:19AM +0530, Srikar Dronamraju wrote:
> Currently "CACHE" domain happens to be the 2nd sched domain as per
> powerpc_topology. This domain will collapse if cpumask of l2-cache is
> same as SMT domain. However we could generalize this domain such that it
> could mean either be a "CACHE" domain or a "BIGCORE" domain.
> 
> While setting up the "CACHE" domain, check if shared_cache is already
> set.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 48 +++++++++++++++++++++++++++------------
>  1 file changed, 34 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 875f57e41355..f8faf75135af 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -85,6 +85,14 @@ EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
>  EXPORT_PER_CPU_SYMBOL(cpu_core_map);
>  EXPORT_SYMBOL_GPL(has_big_cores);
> 
> +enum {
> +#ifdef CONFIG_SCHED_SMT
> +	smt_idx,
> +#endif
> +	bigcore_idx,
> +	die_idx,
> +};
> +
>  #define MAX_THREAD_LIST_SIZE	8
>  #define THREAD_GROUP_SHARE_L1   1
>  struct thread_groups {
> @@ -851,13 +859,7 @@ static int powerpc_shared_cache_flags(void)
>   */
>  static const struct cpumask *shared_cache_mask(int cpu)
>  {
> -	if (shared_caches)
> -		return cpu_l2_cache_mask(cpu);
> -
> -	if (has_big_cores)
> -		return cpu_smallcore_mask(cpu);
> -
> -	return cpu_smt_mask(cpu);
> +	return per_cpu(cpu_l2_cache_map, cpu);
>  }
> 
>  #ifdef CONFIG_SCHED_SMT
> @@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> +static const struct cpumask *cpu_bigcore_mask(int cpu)
> +{
> +	return cpu_core_mask(cpu);

It should be cpu_smt_mask() if we want the redundant big-core to be
degenerated in favour of the SMT level on P8, no? Because
cpu_core_mask refers to all the CPUs that are in the same chip.


> +}
> +
>  static struct sched_domain_topology_level powerpc_topology[] = {
>  #ifdef CONFIG_SCHED_SMT
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
> -	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> +	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
>  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
>  	{ NULL, },
>  };
> @@ -895,7 +902,7 @@ static int init_big_cores(void)
> 
>  #ifdef CONFIG_SCHED_SMT
>  	pr_info("Big cores detected. Using small core scheduling\n");
> -	powerpc_topology[0].mask = smallcore_smt_mask;
> +	powerpc_topology[smt_idx].mask = smallcore_smt_mask;
>  #endif
> 
>  	return 0;
> @@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
>  void start_secondary(void *unused)
>  {
>  	unsigned int cpu = smp_processor_id();
> -	struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> 
>  	mmgrab(&init_mm);
>  	current->active_mm = &init_mm;
> @@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
>  	/* Update topology CPU masks */
>  	add_cpu_to_masks(cpu);
> 
> -	if (has_big_cores)
> -		sibling_mask = cpu_smallcore_mask;
>  	/*
>  	 * Check for any shared caches. Note that this must be done on a
>  	 * per-core basis because one core in the pair might be disabled.
>  	 */
> -	if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> -		shared_caches = true;
> +	if (!shared_caches) {
> +		struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> +		struct cpumask *mask = cpu_l2_cache_mask(cpu);
> +
> +		if (has_big_cores)
> +			sibling_mask = cpu_smallcore_mask;
> +
> +		if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> +			shared_caches = true;

Shouldn't we use cpumask_subset() here ?

  			
> +	}
> 
>  	set_numa_node(numa_cpu_lookup_table[cpu]);
>  	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> @@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  		smp_ops->bringup_done();
> 
>  	dump_numa_cpu_topology();
> +	if (shared_caches) {
> +		pr_info("Using shared cache scheduler topology\n");
> +		powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> +#ifdef CONFIG_SCHED_DEBUG
> +		powerpc_topology[bigcore_idx].name = "CACHE";
> +#endif
> +		powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> +	}


I would much rather that we have all the topology-fixups done in one
function.

fixup_topology(void) {
     if (has_big_core)
        powerpc_topology[smt_idx].mask = smallcore_smt_mask;

    if (shared_caches) {
       const char *name = "CACHE";
       powerpc_topology[bigcore_idx].mask = shared_cache_mask;
       strlcpy(powerpc_topology[bigcore_idx].name, name,
       				strlen(name));
       powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
    }

    /* Any other changes to the topology structure here */

And also as an optimization, get rid of degenerate structures here
itself so that we don't pay additional penalty while building the
sched-domains each time.

}

> 
>  	set_sched_topology(powerpc_topology);
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/11] Powerpc/numa: Detect support for coregroup
  2020-07-14  4:36 ` [PATCH 07/11] Powerpc/numa: Detect support for coregroup Srikar Dronamraju
@ 2020-07-17  8:08   ` Gautham R Shenoy
  2020-07-20 13:56   ` Michael Ellerman
  1 sibling, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:20AM +0530, Srikar Dronamraju wrote:
> Add support for grouping cores based on the device-tree classification.
> - The last domain in the associativity domains always refers to the
> core.
> - If primary reference domain happens to be the penultimate domain in
> the associativity domains device-tree property, then there are no
> coregroups. However if its not a penultimate domain, then there are
> coregroups. There can be more than one coregroup. For now we would be
> interested in the last or the smallest coregroups.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

This looks good to me from the discovery of the core-group point of
view.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/include/asm/smp.h |  1 +
>  arch/powerpc/kernel/smp.c      |  1 +
>  arch/powerpc/mm/numa.c         | 34 +++++++++++++++++++++-------------
>  3 files changed, 23 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 49a25e2400f2..5bdc17a7049f 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -28,6 +28,7 @@
>  extern int boot_cpuid;
>  extern int spinning_secondaries;
>  extern u32 *cpu_to_phys_id;
> +extern bool coregroup_enabled;
> 
>  extern void cpu_die(void);
>  extern int cpu_to_chip_id(int cpu);
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index f8faf75135af..96e47450d9b3 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -74,6 +74,7 @@ static DEFINE_PER_CPU(int, cpu_state) = { 0 };
> 
>  struct task_struct *secondary_current;
>  bool has_big_cores;
> +bool coregroup_enabled;
> 
>  DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index fc7b0505bdd8..a43eab455be4 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -887,7 +887,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  static void __init find_possible_nodes(void)
>  {
>  	struct device_node *rtas;
> -	u32 numnodes, i;
> +	const __be32 *domains;
> +	int prop_length, max_nodes;
> +	u32 i;
> 
>  	if (!numa_enabled)
>  		return;
> @@ -896,25 +898,31 @@ static void __init find_possible_nodes(void)
>  	if (!rtas)
>  		return;
> 
> -	if (of_property_read_u32_index(rtas, "ibm,current-associativity-domains",
> -				min_common_depth, &numnodes)) {
> -		/*
> -		 * ibm,current-associativity-domains is a fairly recent
> -		 * property. If it doesn't exist, then fallback on
> -		 * ibm,max-associativity-domains. Current denotes what the
> -		 * platform can support compared to max which denotes what the
> -		 * Hypervisor can support.
> -		 */
> -		if (of_property_read_u32_index(rtas, "ibm,max-associativity-domains",
> -				min_common_depth, &numnodes))
> +	/*
> +	 * ibm,current-associativity-domains is a fairly recent property. If
> +	 * it doesn't exist, then fallback on ibm,max-associativity-domains.
> +	 * Current denotes what the platform can support compared to max
> +	 * which denotes what the Hypervisor can support.
> +	 */
> +	domains = of_get_property(rtas, "ibm,current-associativity-domains",
> +					&prop_length);
> +	if (!domains) {
> +		domains = of_get_property(rtas, "ibm,max-associativity-domains",
> +					&prop_length);
> +		if (!domains)
>  			goto out;
>  	}
> 
> -	for (i = 0; i < numnodes; i++) {
> +	max_nodes = of_read_number(&domains[min_common_depth], 1);
> +	for (i = 0; i < max_nodes; i++) {
>  		if (!node_possible(i))
>  			node_set(i, node_possible_map);
>  	}
> 
> +	prop_length /= sizeof(int);
> +	if (prop_length > min_common_depth + 2)
> +		coregroup_enabled = 1;
> +
>  out:
>  	of_node_put(rtas);
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group
  2020-07-14  4:36 ` [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group Srikar Dronamraju
@ 2020-07-17  8:08   ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:21AM +0530, Srikar Dronamraju wrote:
> If allocated earlier and the search fails, then cpumask need to be
> freed. However cpu_l1_cache_map can be allocated after we search thread
> group.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Good fix.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/smp.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 96e47450d9b3..ef19eeccd21e 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -797,10 +797,6 @@ static int init_cpu_l1_cache_map(int cpu)
>  	if (err)
>  		goto out;
> 
> -	zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> -				GFP_KERNEL,
> -				cpu_to_node(cpu));
> -
>  	cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
> 
>  	if (unlikely(cpu_group_start == -1)) {
> @@ -809,6 +805,9 @@ static int init_cpu_l1_cache_map(int cpu)
>  		goto out;
>  	}
> 
> +	zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
> +				GFP_KERNEL, cpu_to_node(cpu));
> +
>  	for (i = first_thread; i < first_thread + threads_per_core; i++) {
>  		int i_group_start = get_cpu_thread_group_start(i, &tg);
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/11] Powerpc/smp: Create coregroup domain
  2020-07-14  4:36 ` [PATCH 09/11] Powerpc/smp: Create coregroup domain Srikar Dronamraju
@ 2020-07-17  8:19   ` Gautham R Shenoy
  2020-07-17  8:23     ` Gautham R Shenoy
  2020-07-20  6:02     ` Srikar Dronamraju
  0 siblings, 2 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:19 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:22AM +0530, Srikar Dronamraju wrote:
> Add percpu coregroup maps and masks to create coregroup domain.
> If a coregroup doesn't exist, the coregroup domain will be degenerated
> in favour of SMT/CACHE domain.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/topology.h | 10 ++++++++
>  arch/powerpc/kernel/smp.c           | 37 +++++++++++++++++++++++++++++
>  arch/powerpc/mm/numa.c              |  5 ++++
>  3 files changed, 52 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 2db7ba789720..34812c35018e 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -98,6 +98,7 @@ extern int stop_topology_update(void);
>  extern int prrn_is_enabled(void);
>  extern int find_and_online_cpu_nid(int cpu);
>  extern int timed_topology_update(int nsecs);
> +extern int cpu_to_coregroup_id(int cpu);
>  #else
>  static inline int start_topology_update(void)
>  {
> @@ -120,6 +121,15 @@ static inline int timed_topology_update(int nsecs)
>  	return 0;
>  }
> 
> +static inline int cpu_to_coregroup_id(int cpu)
> +{
> +#ifdef CONFIG_SMP
> +	return cpu_to_core_id(cpu);
> +#else
> +	return 0;
> +#endif
> +}
> +
>  #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
> 
>  #include <asm-generic/topology.h>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index ef19eeccd21e..bb25c13bbb79 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -80,6 +80,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_l2_cache_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_core_map);
> +DEFINE_PER_CPU(cpumask_var_t, cpu_coregroup_map);
> 
>  EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
>  EXPORT_PER_CPU_SYMBOL(cpu_l2_cache_map);
> @@ -91,6 +92,7 @@ enum {
>  	smt_idx,
>  #endif
>  	bigcore_idx,
> +	mc_idx,
>  	die_idx,
>  };
> 
> @@ -869,6 +871,21 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
>  }
>  #endif
> 
> +static struct cpumask *cpu_coregroup_mask(int cpu)
> +{
> +	return per_cpu(cpu_coregroup_map, cpu);
> +}
> +
> +static bool has_coregroup_support(void)
> +{
> +	return coregroup_enabled;
> +}
> +
> +static const struct cpumask *cpu_mc_mask(int cpu)
> +{
> +	return cpu_coregroup_mask(cpu);
> +}
> +
>  static const struct cpumask *cpu_bigcore_mask(int cpu)
>  {
>  	return cpu_core_mask(cpu);
> @@ -879,6 +896,7 @@ static struct sched_domain_topology_level powerpc_topology[] = {
>  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
>  #endif
>  	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> +	{ cpu_mc_mask, SD_INIT_NAME(MC) },
>  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
>  	{ NULL, },
>  };
> @@ -933,6 +951,10 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  					GFP_KERNEL, node);
>  		zalloc_cpumask_var_node(&per_cpu(cpu_core_map, cpu),
>  					GFP_KERNEL, node);
> +		if (has_coregroup_support())
> +			zalloc_cpumask_var_node(&per_cpu(cpu_coregroup_map, cpu),
> +						GFP_KERNEL, node);
> +
>  #ifdef CONFIG_NEED_MULTIPLE_NODES
>  		/*
>  		 * numa_node_id() works after this.
> @@ -950,6 +972,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
>  	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
> 
> +	if (has_coregroup_support())
> +		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> +	else
> +		powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
> +

The else part could be moved to the common function where we are
modifying the other attributes of the topology array.


>  	init_big_cores();
>  	if (has_big_cores) {
>  		cpumask_set_cpu(boot_cpuid,
> @@ -1241,6 +1268,8 @@ static void remove_cpu_from_masks(int cpu)
>  		set_cpus_unrelated(cpu, i, cpu_sibling_mask);
>  		if (has_big_cores)
>  			set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
> +		if (has_coregroup_support())
> +			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
>  	}
>  }
>  #endif
> @@ -1301,6 +1330,14 @@ static void add_cpu_to_masks(int cpu)
>  	add_cpu_to_smallcore_masks(cpu);
>  	update_mask_by_l2(cpu, cpu_l2_cache_mask);
> 
> +	if (has_coregroup_support()) {
> +		cpumask_set_cpu(cpu, cpu_coregroup_mask(cpu));
> +		for_each_cpu(i, cpu_online_mask) {
> +			if (cpu_to_coregroup_id(cpu) == cpu_to_coregroup_id(i))
> +				set_cpus_related(cpu, i, cpu_coregroup_mask);
> +		}
> +	}
> +
>  	if (pkg_id == -1) {
>  		struct cpumask *(*mask)(int) = cpu_sibling_mask;
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index a43eab455be4..d9ab9da85eab 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1695,6 +1695,11 @@ static const struct proc_ops topology_proc_ops = {
>  	.proc_release	= single_release,
>  };
> 
> +int cpu_to_coregroup_id(int cpu)
> +{
> +	return cpu_to_core_id(cpu);
> +}


So, if has_coregroup_support() returns true, then since the core_group
identification is currently done through the core-id, the
coregroup_mask is going to be the same as the
cpu_core_mask/cpu_cpu_mask. Thus, we will be degenerating the DIE
domain. Right ? Instead we could have kept the core-group to be a
single bigcore by default, so that those domains can get degenerated
preserving the legacy SMT, DIE, NUMA hierarchy.




> +
>  static int topology_update_init(void)
>  {
>  	start_topology_update();
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/11] Powerpc/smp: Create coregroup domain
  2020-07-17  8:19   ` Gautham R Shenoy
@ 2020-07-17  8:23     ` Gautham R Shenoy
  2020-07-20  6:02     ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:23 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Srikar Dronamraju, Michael Ellerman, Anton Blanchard,
	linuxppc-dev, Nick Piggin

On Fri, Jul 17, 2020 at 01:49:26PM +0530, Gautham R Shenoy wrote:

> > +int cpu_to_coregroup_id(int cpu)
> > +{
> > +	return cpu_to_core_id(cpu);
> > +}
> 
> 
> So, if has_coregroup_support() returns true, then since the core_group
> identification is currently done through the core-id, the
> coregroup_mask is going to be the same as the
> cpu_core_mask/cpu_cpu_mask. Thus, we will be degenerating the DIE
> domain. Right ? Instead we could have kept the core-group to be a
> single bigcore by default, so that those domains can get degenerated
> preserving the legacy SMT, DIE, NUMA hierarchy.

Never mind, got confused with core_id and cpu_core_mask (which
corresponds to the cores of a chip).  cpu_to_core_id() does exactly
what we have described above. Sorry for the noise.

I am ok with this patch, except that I would request if all the
changes to the topology structure be done within a single function for
ease of tracking it instead of distributing it all over the code.


> 
> 
> 
> 
> > +
> >  static int topology_update_init(void)
> >  {
> >  	start_topology_update();
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
  2020-07-14  4:36 ` [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id Srikar Dronamraju
@ 2020-07-17  8:26   ` Gautham R Shenoy
  2020-07-20  5:48     ` Srikar Dronamraju
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:26 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> Lookup the coregroup id from the associativity array.
> 
> If unable to detect the coregroup id, fallback on the core id.
> This way, ensure sched_domain degenerates and an extra sched domain is
> not created.
> 
> Ideally this function should have been implemented in
> arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> don't need to find the primary domain again.
> 
> If the device-tree mentions more than one coregroup, then kernel
> implements only the last or the smallest coregroup, which currently
> corresponds to the penultimate domain in the device-tree.
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/mm/numa.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index d9ab9da85eab..4e85564ef62a 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> 
>  int cpu_to_coregroup_id(int cpu)
>  {
> +	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> +	int index;
> +

It would be good to have an assert here to ensure that we are calling
this function only when coregroups are enabled.

Else, we may end up returning the penultimate index which maps to the
chip-id.



> +	if (cpu < 0 || cpu > nr_cpu_ids)
> +		return -1;
> +
> +	if (!firmware_has_feature(FW_FEATURE_VPHN))
> +		goto out;
> +
> +	if (vphn_get_associativity(cpu, associativity))
> +		goto out;
> +
> +	index = of_read_number(associativity, 1);
> +	if ((index > min_common_depth + 1) && coregroup_enabled)
> +		return of_read_number(&associativity[index - 1], 1);
> +
> +out:
>  	return cpu_to_core_id(cpu);
>  }
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup
  2020-07-14  4:36 ` [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup Srikar Dronamraju
@ 2020-07-17  8:28   ` Gautham R Shenoy
  2020-07-20 13:57   ` Michael Ellerman
  1 sibling, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-17  8:28 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 10:06:24AM +0530, Srikar Dronamraju wrote:
> If user wants to enable coregroup sched_domain then they can boot with
> kernel parameter "coregroup_support=on"
> 
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>


We need this documented in the
Documentation/admin-guide/kernel-parameters.txt

Other than that, the patch looks good to me.
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/smp.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index bb25c13bbb79..c43909e6e8e9 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -118,6 +118,23 @@ struct smp_ops_t *smp_ops;
>  volatile unsigned int cpu_callin_map[NR_CPUS];
> 
>  int smt_enabled_at_boot = 1;
> +int coregroup_support;
> +
> +static int __init early_coregroup(char *p)
> +{
> +	if (!p)
> +		return 0;
> +
> +	if (strstr(p, "on"))
> +		coregroup_support = 1;
> +
> +	if (strstr(p, "1"))
> +		coregroup_support = 1;
> +
> +	return 0;
> +}
> +
> +early_param("coregroup_support", early_coregroup);
> 
>  /*
>   * Returns 1 if the specified cpu should be brought up during boot.
> @@ -878,7 +895,7 @@ static struct cpumask *cpu_coregroup_mask(int cpu)
> 
>  static bool has_coregroup_support(void)
>  {
> -	return coregroup_enabled;
> +	return coregroup_enabled && coregroup_support;
>  }
> 
>  static const struct cpumask *cpu_mc_mask(int cpu)
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
  2020-07-17  8:26   ` Gautham R Shenoy
@ 2020-07-20  5:48     ` Srikar Dronamraju
  2020-07-20  9:10       ` Gautham R Shenoy
  0 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  5:48 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:56:53]:

> On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> > Lookup the coregroup id from the associativity array.
> > 
> > If unable to detect the coregroup id, fallback on the core id.
> > This way, ensure sched_domain degenerates and an extra sched domain is
> > not created.
> > 
> > Ideally this function should have been implemented in
> > arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> > don't need to find the primary domain again.
> > 
> > If the device-tree mentions more than one coregroup, then kernel
> > implements only the last or the smallest coregroup, which currently
> > corresponds to the penultimate domain in the device-tree.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/mm/numa.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index d9ab9da85eab..4e85564ef62a 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> > 
> >  int cpu_to_coregroup_id(int cpu)
> >  {
> > +	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > +	int index;
> > +
> 
> It would be good to have an assert here to ensure that we are calling
> this function only when coregroups are enabled.
> 
> Else, we may end up returning the penultimate index which maps to the
> chip-id.
> 

We have a check below exactly for the same reason. Please look below.

> 
> 
> > +	if (cpu < 0 || cpu > nr_cpu_ids)
> > +		return -1;
> > +
> > +	if (!firmware_has_feature(FW_FEATURE_VPHN))
> > +		goto out;
> > +
> > +	if (vphn_get_associativity(cpu, associativity))
> > +		goto out;
> > +
> > +	index = of_read_number(associativity, 1);
> > +	if ((index > min_common_depth + 1) && coregroup_enabled)
> > +		return of_read_number(&associativity[index - 1], 1);

See ^above.

index would be the all the domains in the associativity array, 
min_common_depth would be where the primary domain or the chip-id is
defined. So we are reading the penultimate domain if and only if the
min_common_depth isn't the primary domain aka chip-id. 

What other check /assertions can we add?


> > +
> > +out:
> >  	return cpu_to_core_id(cpu);
> >  }
> > 
> > -- 
> > 2.17.1
> > 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 09/11] Powerpc/smp: Create coregroup domain
  2020-07-17  8:19   ` Gautham R Shenoy
  2020-07-17  8:23     ` Gautham R Shenoy
@ 2020-07-20  6:02     ` Srikar Dronamraju
  1 sibling, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  6:02 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:49:26]:

> On Tue, Jul 14, 2020 at 10:06:22AM +0530, Srikar Dronamraju wrote:
> > Add percpu coregroup maps and masks to create coregroup domain.
> > If a coregroup doesn't exist, the coregroup domain will be degenerated
> > in favour of SMT/CACHE domain.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/topology.h | 10 ++++++++
> >  arch/powerpc/kernel/smp.c           | 37 +++++++++++++++++++++++++++++
> >  arch/powerpc/mm/numa.c              |  5 ++++
> >  3 files changed, 52 insertions(+)
> > 
> > @@ -950,6 +972,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
> >  	cpumask_set_cpu(boot_cpuid, cpu_l2_cache_mask(boot_cpuid));
> >  	cpumask_set_cpu(boot_cpuid, cpu_core_mask(boot_cpuid));
> > 
> > +	if (has_coregroup_support())
> > +		cpumask_set_cpu(boot_cpuid, cpu_coregroup_mask(boot_cpuid));
> > +	else
> > +		powerpc_topology[mc_idx].mask = cpu_bigcore_mask;
> > +
> 
> The else part could be moved to the common function where we are
> modifying the other attributes of the topology array.
> 

My intent is to make all the changes to the topology attributes in
smp_prepare_cpus. It makes more sense to change the attributes immediately
after we define / detect a particular topology change.

The only thing that is left out currently is shared_cache,
We should be able to detect shared_cache also around this time. Just that it
needs more cleanups. Once we do update the topology attributes even for
shared_cache here itself. 

> >  	init_big_cores();
> >  	if (has_big_cores) {
> >  		cpumask_set_cpu(boot_cpuid,
> > @@ -1241,6 +1268,8 @@ static void remove_cpu_from_masks(int cpu)
> >  		set_cpus_unrelated(cpu, i, cpu_sibling_mask);
> >  		if (has_big_cores)
> >  			set_cpus_unrelated(cpu, i, cpu_smallcore_mask);
> > +		if (has_coregroup_support())
> > +			set_cpus_unrelated(cpu, i, cpu_coregroup_mask);
> >  	}
> >  }
> >  #endif
> > @@ -1301,6 +1330,14 @@ static void add_cpu_to_masks(int cpu)
> >  	add_cpu_to_smallcore_masks(cpu);
> >  	update_mask_by_l2(cpu, cpu_l2_cache_mask);
> > 
> > +	if (has_coregroup_support()) {
> > +		cpumask_set_cpu(cpu, cpu_coregroup_mask(cpu));
> > +		for_each_cpu(i, cpu_online_mask) {
> > +			if (cpu_to_coregroup_id(cpu) == cpu_to_coregroup_id(i))
> > +				set_cpus_related(cpu, i, cpu_coregroup_mask);
> > +		}
> > +	}
> > +
> >  	if (pkg_id == -1) {
> >  		struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index a43eab455be4..d9ab9da85eab 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1695,6 +1695,11 @@ static const struct proc_ops topology_proc_ops = {
> >  	.proc_release	= single_release,
> >  };
> > 
> > +int cpu_to_coregroup_id(int cpu)
> > +{
> > +	return cpu_to_core_id(cpu);
> > +}
> 
> 
> So, if has_coregroup_support() returns true, then since the core_group
> identification is currently done through the core-id, the
> coregroup_mask is going to be the same as the
> cpu_core_mask/cpu_cpu_mask. Thus, we will be degenerating the DIE
> domain. Right ? Instead we could have kept the core-group to be a
> single bigcore by default, so that those domains can get degenerated
> preserving the legacy SMT, DIE, NUMA hierarchy.
> 
> 

I think you have confused between cpu_core_mask and cpu_to_core_id.
cpu_to_core_id() returns a unique id for every SMT8 thread group.
If coregroup_support is true and the system doesn't support coregroup, then
We would not be degenerating the DIE but degenerating new MC domain only.
This is because the MC domain and the previous domain (SHARED_CACHE/BIGCORE/ 
SMT) would match the MC domain.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
  2020-07-17  6:37   ` Gautham R Shenoy
@ 2020-07-20  6:19     ` Srikar Dronamraju
  2020-07-20  9:07       ` Gautham R Shenoy
  0 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  6:19 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 12:07:55]:

> On Tue, Jul 14, 2020 at 10:06:19AM +0530, Srikar Dronamraju wrote:
> > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > same as SMT domain. However we could generalize this domain such that it
> > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> > 
> > While setting up the "CACHE" domain, check if shared_cache is already
> > set.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> > @@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> >  }
> >  #endif
> > 
> > +static const struct cpumask *cpu_bigcore_mask(int cpu)
> > +{
> > +	return cpu_core_mask(cpu);
> 
> It should be cpu_smt_mask() if we want the redundant big-core to be
> degenerated in favour of the SMT level on P8, no? Because
> cpu_core_mask refers to all the CPUs that are in the same chip.
> 

Right, but it cant be cpu_smt_mask since cpu_smt_mask is only enabled in
CONFIG_SCHED_SMT. I was looking at using sibling_map, but we have to careful
for power9 / PowerNV mode. Guess that should be fine.

> > +}
> > +
> >  static struct sched_domain_topology_level powerpc_topology[] = {
> >  #ifdef CONFIG_SCHED_SMT
> >  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> >  #endif
> > -	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> > +	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> >  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> >  	{ NULL, },
> >  };
> > @@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
> >  void start_secondary(void *unused)
> >  {
> >  	unsigned int cpu = smp_processor_id();
> > -	struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > 
> >  	mmgrab(&init_mm);
> >  	current->active_mm = &init_mm;
> > @@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
> >  	/* Update topology CPU masks */
> >  	add_cpu_to_masks(cpu);
> > 
> > -	if (has_big_cores)
> > -		sibling_mask = cpu_smallcore_mask;
> >  	/*
> >  	 * Check for any shared caches. Note that this must be done on a
> >  	 * per-core basis because one core in the pair might be disabled.
> >  	 */
> > -	if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> > -		shared_caches = true;
> > +	if (!shared_caches) {
> > +		struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > +		struct cpumask *mask = cpu_l2_cache_mask(cpu);
> > +
> > +		if (has_big_cores)
> > +			sibling_mask = cpu_smallcore_mask;
> > +
> > +		if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> > +			shared_caches = true;
> 
> Shouldn't we use cpumask_subset() here ?

Wouldn't cpumask_subset should return 1 if both are same?
We dont want to have shared_caches set if both the masks are equal. 

>   			
> > +	}
> > 
> >  	set_numa_node(numa_cpu_lookup_table[cpu]);
> >  	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> > @@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
> >  		smp_ops->bringup_done();
> > 
> >  	dump_numa_cpu_topology();
> > +	if (shared_caches) {
> > +		pr_info("Using shared cache scheduler topology\n");
> > +		powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> > +#ifdef CONFIG_SCHED_DEBUG
> > +		powerpc_topology[bigcore_idx].name = "CACHE";
> > +#endif
> > +		powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> > +	}
> 
> 
> I would much rather that we have all the topology-fixups done in one
> function.
> 
> fixup_topology(void) {
>      if (has_big_core)
>         powerpc_topology[smt_idx].mask = smallcore_smt_mask;
> 
>     if (shared_caches) {
>        const char *name = "CACHE";
>        powerpc_topology[bigcore_idx].mask = shared_cache_mask;
>        strlcpy(powerpc_topology[bigcore_idx].name, name,
>        				strlen(name));
>        powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
>     }
> 
>     /* Any other changes to the topology structure here */

We could do this.

> 
> And also as an optimization, get rid of degenerate structures here
> itself so that we don't pay additional penalty while building the
> sched-domains each time.
> 

Yes this is definitely in plan, but slightly later in time.

Thanks for the review and comments.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-17  6:00   ` Gautham R Shenoy
@ 2020-07-20  6:45     ` Srikar Dronamraju
  2020-07-20  8:58       ` Gautham R Shenoy
  0 siblings, 1 reply; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  6:45 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:30:11]:

> Hi Srikar,
> 
> On Tue, Jul 14, 2020 at 10:06:18AM +0530, Srikar Dronamraju wrote:
> > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > always be a superset of cpu_sibling_mask.
> > 
> > Lets stop that assumption.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
> >  1 file changed, 15 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 7d430fc536cc..875f57e41355 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
> >  	struct device_node *l2_cache, *np;
> >  	int i;
> > 
> > +	cpumask_set_cpu(cpu, mask_fn(cpu));
> 
> It would be good to comment why do we need to do set the CPU in the
> l2-mask if we don't have a l2cache domain.
> 

Good Catch, 
We should move this after the cpu_to_l2cache.

> >  	l2_cache = cpu_to_l2cache(cpu);
> >  	if (!l2_cache)
> >  		return false;
> > @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
> >  	 * add it to it's own thread sibling mask.
> >  	 */
> >  	cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > +	cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> > 
> >  	for (i = first_thread; i < first_thread + threads_per_core; i++)
> >  		if (cpu_online(i))
> >  			set_cpus_related(i, cpu, cpu_sibling_mask);
> > 
> >  	add_cpu_to_smallcore_masks(cpu);
> > -	/*
> > -	 * Copy the thread sibling mask into the cache sibling mask
> > -	 * and mark any CPUs that share an L2 with this CPU.
> > -	 */
> > -	for_each_cpu(i, cpu_sibling_mask(cpu))
> > -		set_cpus_related(cpu, i, cpu_l2_cache_mask);
> >  	update_mask_by_l2(cpu, cpu_l2_cache_mask);
> > 
> > -	/*
> > -	 * Copy the cache sibling mask into core sibling mask and mark
> > -	 * any CPUs on the same chip as this CPU.
> > -	 */
> > -	for_each_cpu(i, cpu_l2_cache_mask(cpu))
> > -		set_cpus_related(cpu, i, cpu_core_mask);
> > +	if (pkg_id == -1) {
> > +		struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > +
> > +		/*
> > +		 * Copy the sibling mask into core sibling mask and
> > +		 * mark any CPUs on the same chip as this CPU.
> > +		 */
> > +		if (shared_caches)
> > +			mask = cpu_l2_cache_mask;
> > +
> 
> 
> Now that we decoupling the containment relationship between
> sibling_mask and l2-cache mask, should we set all the CPUs that are
> both in cpu_sibling_mask(cpu) as well as cpu_l2_mask(cpu) in
> cpu_core_mask ? 
> 

Are you saying instead of setting this cpu in this cpu_core_mask, can we set
all the cpus in the mask in cpu_core_mask?
Currently we dont know if any of the cpus of the mask were already set or
not. Plus we need to anyway update cpumask of all other cpus to says they
are related. So setting a mask instead of cpu at a time will not change
anything for our side.

> > +		for_each_cpu(i, mask(cpu))
> > +			set_cpus_related(cpu, i, cpu_core_mask);
> > 
> > -	if (pkg_id == -1)
> >  		return;
> > +	}
> > 
> >  	for_each_cpu(i, cpu_online_mask)
> >  		if (get_physical_package_id(i) == pkg_id)
> > -- 
> > 2.17.1
> > 
> --
> Thanks and Regards
> gautham.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
  2020-07-17  5:48   ` Gautham R Shenoy
@ 2020-07-20  7:20     ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  7:20 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:18:21]:

> On Tue, Jul 14, 2020 at 10:06:17AM +0530, Srikar Dronamraju wrote:
> > Enable small core scheduling as soon as we detect that we are in a
> > system that supports thread group. Doing so would avoid a redundant
> > check.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> 
> I don't see a problem with this.
> 
> However, since we are now going to be maintaining a single topology
> structure, wouldn't it be better to collate all the changes being made
> to the mask_functions/flags/names of this structure within a single
> function so that it becomes easier to keep track of what all changes
> are going into the topology and why are we doing it?
> 

My intent was to move the topology updates early as soon as they are
detected. Currently the shared_cache want cannot be detected early. 
But I think its possible to detect shared_cache early with some cleanups.
And if we do, then we should be able to call this up pretty early.

> 
> > ---
> >  arch/powerpc/kernel/smp.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 24529f6134aa..7d430fc536cc 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -892,6 +892,12 @@ static int init_big_cores(void)
> >  	}
> > 
> >  	has_big_cores = true;
> > +
> > +#ifdef CONFIG_SCHED_SMT
> > +	pr_info("Big cores detected. Using small core scheduling\n");
> > +	powerpc_topology[0].mask = smallcore_smt_mask;
> > +#endif
> > +
> >  	return 0;
> >  }
> > 
> > @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > 
> >  	dump_numa_cpu_topology();
> > 
> > -#ifdef CONFIG_SCHED_SMT
> > -	if (has_big_cores) {
> > -		pr_info("Big cores detected but using small core scheduling\n");
> > -		powerpc_topology[0].mask = smallcore_smt_mask;
> > -	}
> > -#endif
> >  	set_sched_topology(powerpc_topology);
> >  }
> > 
> > -- 
> > 2.17.1
> > 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
  2020-07-14  4:36 ` [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner Srikar Dronamraju
  2020-07-17  5:48   ` Gautham R Shenoy
@ 2020-07-20  7:47   ` Jordan Niethe
  2020-07-20  8:52     ` Srikar Dronamraju
  1 sibling, 1 reply; 42+ messages in thread
From: Jordan Niethe @ 2020-07-20  7:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Tue, Jul 14, 2020 at 2:44 PM Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> Enable small core scheduling as soon as we detect that we are in a
> system that supports thread group. Doing so would avoid a redundant
> check.
>
> Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> Cc: Michael Ellerman <michaele@au1.ibm.com>
> Cc: Nick Piggin <npiggin@au1.ibm.com>
> Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> Cc: Michael Neuling <mikey@linux.ibm.com>
> Cc: Anton Blanchard <anton@au1.ibm.com>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/smp.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 24529f6134aa..7d430fc536cc 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -892,6 +892,12 @@ static int init_big_cores(void)
>         }
>
>         has_big_cores = true;
> +
> +#ifdef CONFIG_SCHED_SMT
> +       pr_info("Big cores detected. Using small core scheduling\n");
Why change the wording from "Big cores detected but using small core
scheduling\n"?
> +       powerpc_topology[0].mask = smallcore_smt_mask;
> +#endif
> +
>         return 0;
>  }
>
> @@ -1383,12 +1389,6 @@ void __init smp_cpus_done(unsigned int max_cpus)
>
>         dump_numa_cpu_topology();
>
> -#ifdef CONFIG_SCHED_SMT
> -       if (has_big_cores) {
> -               pr_info("Big cores detected but using small core scheduling\n");
> -               powerpc_topology[0].mask = smallcore_smt_mask;
> -       }
> -#endif
>         set_sched_topology(powerpc_topology);
>  }
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology
  2020-07-17  5:44   ` Gautham R Shenoy
@ 2020-07-20  8:10     ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  8:10 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:14:36]:

> Hi Srikar,
> 
> On Tue, Jul 14, 2020 at 10:06:15AM +0530, Srikar Dronamraju wrote:
> > A new sched_domain_topology_level was added just for Power9. However the
> > same can be achieved by merging powerpc_topology with power9_topology
> > and makes the code more simpler especially when adding a new sched
> > domain.
> > 
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/smp.c | 33 ++++++++++-----------------------
> >  1 file changed, 10 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 680c0edcc59d..069ea4b21c6d 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1315,7 +1315,7 @@ int setup_profiling_timer(unsigned int multiplier)
> >  }
> > 
> >  #ifdef CONFIG_SCHED_SMT
> > -/* cpumask of CPUs with asymetric SMT dependancy */
> > +/* cpumask of CPUs with asymmetric SMT dependency */
> >  static int powerpc_smt_flags(void)
> >  {
> >  	int flags = SD_SHARE_CPUCAPACITY | SD_SHARE_PKG_RESOURCES;
> > @@ -1328,14 +1328,6 @@ static int powerpc_smt_flags(void)
> >  }
> >  #endif
> > 
> > -static struct sched_domain_topology_level powerpc_topology[] = {
> > -#ifdef CONFIG_SCHED_SMT
> > -	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > -#endif
> > -	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > -	{ NULL, },
> > -};
> > -
> >  /*
> >   * P9 has a slightly odd architecture where pairs of cores share an L2 cache.
> >   * This topology makes it *much* cheaper to migrate tasks between adjacent cores
> > @@ -1353,7 +1345,13 @@ static int powerpc_shared_cache_flags(void)
> >   */
> >  static const struct cpumask *shared_cache_mask(int cpu)
> >  {
> > -	return cpu_l2_cache_mask(cpu);
> > +	if (shared_caches)
> > +		return cpu_l2_cache_mask(cpu);
> > +
> > +	if (has_big_cores)
> > +		return cpu_smallcore_mask(cpu);
> > +
> > +	return cpu_smt_mask(cpu);
> >  }
> > 
> >  #ifdef CONFIG_SCHED_SMT
> > @@ -1363,7 +1361,7 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> >  }
> >  #endif
> > 
> > -static struct sched_domain_topology_level power9_topology[] = {
> > +static struct sched_domain_topology_level powerpc_topology[] = {
> 
> 
> >  #ifdef CONFIG_SCHED_SMT
> >  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> >  #endif
> > @@ -1388,21 +1386,10 @@ void __init smp_cpus_done(unsigned int max_cpus)
> >  #ifdef CONFIG_SCHED_SMT
> >  	if (has_big_cores) {
> >  		pr_info("Big cores detected but using small core scheduling\n");
> I> -		power9_topology[0].mask = smallcore_smt_mask;
> >  		powerpc_topology[0].mask = smallcore_smt_mask;
> >  	}
> >  #endif
> > -	/*
> > -	 * If any CPU detects that it's sharing a cache with another CPU then
> > -	 * use the deeper topology that is aware of this sharing.
> > -	 */
> > -	if (shared_caches) {
> > -		pr_info("Using shared cache scheduler topology\n");
> > -		set_sched_topology(power9_topology);
> > -	} else {
> > -		pr_info("Using standard scheduler topology\n");
> > -		set_sched_topology(powerpc_topology);
> 
> 
> Ok, so we will go with the three level topology by default (SMT,
> CACHE, DIE) and will rely on the sched-domain creation code to
> degenerate CACHE domain in case SMT and CACHE have the same set of
> CPUs (POWER8 for eg).
> 

Right.

> From a cleanup perspective this is better, since we won't have to
> worry about defining multiple topology structures, but from a
> performance point of view, wouldn't we now pay an extra penalty of
> degenerating the CACHE domains on POWER8 kind of systems, each time
> when a CPU comes online ?
> 

So if we end up either adding a topology definition for each of the new
topologies we support or we have to take the extra penalty.

But going ahead

> Do we know how bad it is ? If the degeneration takes a few extra
> microseconds, that should be ok I suppose.
> 

It certainly will add to the penalty, I haven't captured per degeneration
statistics. However I ran an experiment where I run ppc64_cpu --smt=8 ,
followed by ppc64_cpu --smt=1 in a loop of 100 iterations.

On a Power8 System with 256 cpus 8 nodes.

Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  8
Core(s) per socket:  4
Socket(s):           8
NUMA node(s):        8
Model:               2.1 (pvr 004b 0201)
Model name:          POWER8 (architected), altivec supported
Hypervisor vendor:   pHyp
Virtualization type: para
L1d cache:           64K
L1i cache:           32K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-31
NUMA node1 CPU(s):   32-63
NUMA node2 CPU(s):   64-95
NUMA node3 CPU(s):   96-127
NUMA node4 CPU(s):   128-159
NUMA node5 CPU(s):   160-191
NUMA node6 CPU(s):   192-223
NUMA node7 CPU(s):   224-255

ppc64_cpu --smt=1
    N           Min           Max        Median           Avg        Stddev
x 100         38.17         53.78         46.81       46.6766     2.8421603

x 100         41.34         58.24         48.35       47.9649     3.6866087

ppc64_cpu --smt=8
    N           Min           Max        Median           Avg        Stddev
x 100         57.43         75.88         60.61       61.0246      2.418685

x 100         58.21         79.24         62.59       63.3326     3.4094558

But once we cleanup, we could add ways to fixup topologies so that we
reverse the overhead.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner
  2020-07-20  7:47   ` Jordan Niethe
@ 2020-07-20  8:52     ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20  8:52 UTC (permalink / raw)
  To: Jordan Niethe
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

* Jordan Niethe <jniethe5@gmail.com> [2020-07-20 17:47:27]:

> On Tue, Jul 14, 2020 at 2:44 PM Srikar Dronamraju
> <srikar@linux.vnet.ibm.com> wrote:
> >
> > Enable small core scheduling as soon as we detect that we are in a
> > system that supports thread group. Doing so would avoid a redundant
> > check.
> >
> > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > Cc: Michael Neuling <mikey@linux.ibm.com>
> > Cc: Anton Blanchard <anton@au1.ibm.com>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/kernel/smp.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 24529f6134aa..7d430fc536cc 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -892,6 +892,12 @@ static int init_big_cores(void)
> >         }
> >
> >         has_big_cores = true;
> > +
> > +#ifdef CONFIG_SCHED_SMT
> > +       pr_info("Big cores detected. Using small core scheduling\n");
> Why change the wording from "Big cores detected but using small core
> scheduling\n"?
> > +       powerpc_topology[0].mask = smallcore_smt_mask;
> > +#endif
> > +

To me, "but" in the message sounded as if we detected Big cores *but* we want
to continue using smallcore scheduling. However the code was always meaning
to say, "Since we detected big core, i.e thread grouping within a core, System would
benefit by using small core scheduling".

If you think, "but" was adding more info and not misleading, then I can add it back.

> >         return 0;
> >  }
> >

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling
  2020-07-20  6:45     ` Srikar Dronamraju
@ 2020-07-20  8:58       ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-20  8:58 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

Hi Srikar,

On Mon, Jul 20, 2020 at 12:15:04PM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 11:30:11]:
> 
> > Hi Srikar,
> > 
> > On Tue, Jul 14, 2020 at 10:06:18AM +0530, Srikar Dronamraju wrote:
> > > Current code assumes that cpumask of cpus sharing a l2-cache mask will
> > > always be a superset of cpu_sibling_mask.
> > > 
> > > Lets stop that assumption.
> > > 
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > >  arch/powerpc/kernel/smp.c | 28 +++++++++++++++-------------
> > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > > index 7d430fc536cc..875f57e41355 100644
> > > --- a/arch/powerpc/kernel/smp.c
> > > +++ b/arch/powerpc/kernel/smp.c
> > > @@ -1198,6 +1198,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
> > >  	struct device_node *l2_cache, *np;
> > >  	int i;
> > > 
> > > +	cpumask_set_cpu(cpu, mask_fn(cpu));
> > 
> > It would be good to comment why do we need to do set the CPU in the
> > l2-mask if we don't have a l2cache domain.
> > 
> 
> Good Catch, 
> We should move this after the cpu_to_l2cache.
> 
> > >  	l2_cache = cpu_to_l2cache(cpu);
> > >  	if (!l2_cache)
> > >  		return false;
> > > @@ -1284,29 +1285,30 @@ static void add_cpu_to_masks(int cpu)
> > >  	 * add it to it's own thread sibling mask.
> > >  	 */
> > >  	cpumask_set_cpu(cpu, cpu_sibling_mask(cpu));
> > > +	cpumask_set_cpu(cpu, cpu_core_mask(cpu));
> > > 
> > >  	for (i = first_thread; i < first_thread + threads_per_core; i++)
> > >  		if (cpu_online(i))
> > >  			set_cpus_related(i, cpu, cpu_sibling_mask);
> > > 
> > >  	add_cpu_to_smallcore_masks(cpu);
> > > -	/*
> > > -	 * Copy the thread sibling mask into the cache sibling mask
> > > -	 * and mark any CPUs that share an L2 with this CPU.
> > > -	 */
> > > -	for_each_cpu(i, cpu_sibling_mask(cpu))
> > > -		set_cpus_related(cpu, i, cpu_l2_cache_mask);
> > >  	update_mask_by_l2(cpu, cpu_l2_cache_mask);
> > > 
> > > -	/*
> > > -	 * Copy the cache sibling mask into core sibling mask and mark
> > > -	 * any CPUs on the same chip as this CPU.
> > > -	 */
> > > -	for_each_cpu(i, cpu_l2_cache_mask(cpu))
> > > -		set_cpus_related(cpu, i, cpu_core_mask);
> > > +	if (pkg_id == -1) {
> > > +		struct cpumask *(*mask)(int) = cpu_sibling_mask;
> > > +
> > > +		/*
> > > +		 * Copy the sibling mask into core sibling mask and
> > > +		 * mark any CPUs on the same chip as this CPU.
> > > +		 */
> > > +		if (shared_caches)
> > > +			mask = cpu_l2_cache_mask;
> > > +
> > 
> > 
> > Now that we decoupling the containment relationship between
> > sibling_mask and l2-cache mask, should we set all the CPUs that are
> > both in cpu_sibling_mask(cpu) as well as cpu_l2_mask(cpu) in
> > cpu_core_mask ? 
> > 
> 
> Are you saying instead of setting this cpu in this cpu_core_mask, can we set
> all the cpus in the mask in cpu_core_mask?

No. What I am referring to is in the for-loop below, you are setting
the CPUs that are set in mask(cpu) in the cpu_core_mask.

Now, the above code sets
mask(cpu) == cpu_sibling_mask(cpu) in the absence of shared_caches, and 
          == cpu_l2_cache_mask(cpu) in the presence of shared_cache.

Since we have decoupled the assumption that cpu_sibling_mask(cpu) may not
be contained within cpu_l2_cache_mask(cpu), in the presence of a
shared-cache, why are we only picking the CPUs in
cpu_l2_cache_mask(cpu) to be set in cpu_core_maks(cpu) ? It should
ideally be the superset whose CPUs should be set in
cpu_core_mask(cpu). And the correct cpuset is
cpumask_or(cpu_sibling_mask(cpu), cpu_l2_cache_mask(cpu))


> Currently we dont know if any of the cpus of the mask were already set or
> not. Plus we need to anyway update cpumask of all other cpus to says they
> are related. So setting a mask instead of cpu at a time will not change
> anything for our side.
> 
> > > +		for_each_cpu(i, mask(cpu))
> > > +			set_cpus_related(cpu, i, cpu_core_mask);
> > > 
> > > -	if (pkg_id == -1)
> > >  		return;
> > > +	}
> > > 
> > >  	for_each_cpu(i, cpu_online_mask)
> > >  		if (get_physical_package_id(i) == pkg_id)
> > > -- 
> > > 2.17.1
> > > 
> > --
> > Thanks and Regards
> > gautham.
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain
  2020-07-20  6:19     ` Srikar Dronamraju
@ 2020-07-20  9:07       ` Gautham R Shenoy
  0 siblings, 0 replies; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-20  9:07 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

On Mon, Jul 20, 2020 at 11:49:11AM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 12:07:55]:
> 
> > On Tue, Jul 14, 2020 at 10:06:19AM +0530, Srikar Dronamraju wrote:
> > > Currently "CACHE" domain happens to be the 2nd sched domain as per
> > > powerpc_topology. This domain will collapse if cpumask of l2-cache is
> > > same as SMT domain. However we could generalize this domain such that it
> > > could mean either be a "CACHE" domain or a "BIGCORE" domain.
> > > 
> > > While setting up the "CACHE" domain, check if shared_cache is already
> > > set.
> > > 
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > > @@ -867,11 +869,16 @@ static const struct cpumask *smallcore_smt_mask(int cpu)
> > >  }
> > >  #endif
> > > 
> > > +static const struct cpumask *cpu_bigcore_mask(int cpu)
> > > +{
> > > +	return cpu_core_mask(cpu);
> > 
> > It should be cpu_smt_mask() if we want the redundant big-core to be
> > degenerated in favour of the SMT level on P8, no? Because
> > cpu_core_mask refers to all the CPUs that are in the same chip.
> > 
> 
> Right, but it cant be cpu_smt_mask since cpu_smt_mask is only enabled in
> CONFIG_SCHED_SMT. I was looking at using sibling_map, but we have to careful
> for power9 / PowerNV mode. Guess that should be fine.

Ok.

> 
> > > +}
> > > +
> > >  static struct sched_domain_topology_level powerpc_topology[] = {
> > >  #ifdef CONFIG_SCHED_SMT
> > >  	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> > >  #endif
> > > -	{ shared_cache_mask, powerpc_shared_cache_flags, SD_INIT_NAME(CACHE) },
> > > +	{ cpu_bigcore_mask, SD_INIT_NAME(BIGCORE) },
> > >  	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > >  	{ NULL, },
> > >  };
> > > @@ -1319,7 +1326,6 @@ static void add_cpu_to_masks(int cpu)
> > >  void start_secondary(void *unused)
> > >  {
> > >  	unsigned int cpu = smp_processor_id();
> > > -	struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > > 
> > >  	mmgrab(&init_mm);
> > >  	current->active_mm = &init_mm;
> > > @@ -1345,14 +1351,20 @@ void start_secondary(void *unused)
> > >  	/* Update topology CPU masks */
> > >  	add_cpu_to_masks(cpu);
> > > 
> > > -	if (has_big_cores)
> > > -		sibling_mask = cpu_smallcore_mask;
> > >  	/*
> > >  	 * Check for any shared caches. Note that this must be done on a
> > >  	 * per-core basis because one core in the pair might be disabled.
> > >  	 */
> > > -	if (!cpumask_equal(cpu_l2_cache_mask(cpu), sibling_mask(cpu)))
> > > -		shared_caches = true;
> > > +	if (!shared_caches) {
> > > +		struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
> > > +		struct cpumask *mask = cpu_l2_cache_mask(cpu);
> > > +
> > > +		if (has_big_cores)
> > > +			sibling_mask = cpu_smallcore_mask;
> > > +
> > > +		if (cpumask_weight(mask) > cpumask_weight(sibling_mask(cpu)))
> > > +			shared_caches = true;
> > 
> > Shouldn't we use cpumask_subset() here ?
> 
> Wouldn't cpumask_subset should return 1 if both are same?

When are caches shared ? When the sibling_mask(cpu)  is a
strict-subset of cpu_l2_cache_mask(cpu). cpumask_weight() only
checks if the number of CPUs in cpu_l2_cache_mask(cpu) is greater than
sibling_mask(cpu) but not if constituent CPUs of the former forms
a strict superset of the latter.

We are better off using
if (!cpumask_equal(sibling_mask(cpu), mask) &&
    cpumask_subset(sibling_mask(cpu), mask))

which is accurate.



> We dont want to have shared_caches set if both the masks are equal.


> 
> >   			
> > > +	}
> > > 
> > >  	set_numa_node(numa_cpu_lookup_table[cpu]);
> > >  	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
> > > @@ -1390,6 +1402,14 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > >  		smp_ops->bringup_done();
> > > 
> > >  	dump_numa_cpu_topology();
> > > +	if (shared_caches) {
> > > +		pr_info("Using shared cache scheduler topology\n");
> > > +		powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> > > +#ifdef CONFIG_SCHED_DEBUG
> > > +		powerpc_topology[bigcore_idx].name = "CACHE";
> > > +#endif
> > > +		powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> > > +	}
> > 
> > 
> > I would much rather that we have all the topology-fixups done in one
> > function.
> > 
> > fixup_topology(void) {
> >      if (has_big_core)
> >         powerpc_topology[smt_idx].mask = smallcore_smt_mask;
> > 
> >     if (shared_caches) {
> >        const char *name = "CACHE";
> >        powerpc_topology[bigcore_idx].mask = shared_cache_mask;
> >        strlcpy(powerpc_topology[bigcore_idx].name, name,
> >        				strlen(name));
> >        powerpc_topology[bigcore_idx].sd_flags = powerpc_shared_cache_flags;
> >     }
> > 
> >     /* Any other changes to the topology structure here */
> 
> We could do this.
> 
> > 
> > And also as an optimization, get rid of degenerate structures here
> > itself so that we don't pay additional penalty while building the
> > sched-domains each time.
> > 
> 
> Yes this is definitely in plan, but slightly later in time.
>

Ah, ok. Fine in that case.

> Thanks for the review and comments.
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
  2020-07-20  5:48     ` Srikar Dronamraju
@ 2020-07-20  9:10       ` Gautham R Shenoy
  2020-07-20 10:26         ` Srikar Dronamraju
  0 siblings, 1 reply; 42+ messages in thread
From: Gautham R Shenoy @ 2020-07-20  9:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Michael Ellerman, Anton Blanchard, linuxppc-dev,
	Nick Piggin

Hi Srikar,


On Mon, Jul 20, 2020 at 11:18:16AM +0530, Srikar Dronamraju wrote:
> * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:56:53]:
> 
> > On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> > > Lookup the coregroup id from the associativity array.
> > > 
> > > If unable to detect the coregroup id, fallback on the core id.
> > > This way, ensure sched_domain degenerates and an extra sched domain is
> > > not created.
> > > 
> > > Ideally this function should have been implemented in
> > > arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> > > don't need to find the primary domain again.
> > > 
> > > If the device-tree mentions more than one coregroup, then kernel
> > > implements only the last or the smallest coregroup, which currently
> > > corresponds to the penultimate domain in the device-tree.
> > > 
> > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > ---
> > >  arch/powerpc/mm/numa.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > index d9ab9da85eab..4e85564ef62a 100644
> > > --- a/arch/powerpc/mm/numa.c
> > > +++ b/arch/powerpc/mm/numa.c
> > > @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> > > 
> > >  int cpu_to_coregroup_id(int cpu)
> > >  {
> > > +	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > > +	int index;
> > > +
> > 
> > It would be good to have an assert here to ensure that we are calling
> > this function only when coregroups are enabled.
> > 
> > Else, we may end up returning the penultimate index which maps to the
> > chip-id.
> > 
> 
> We have a check below exactly for the same reason. Please look
below.

I saw that. However, it would be better to assert within the function
so that we don't call it from any other context without ascertaining
first that core_groups are enabled. Or at least a comment in the
function saying that we should call this only after ascertaining that
core_groups are enabled.



> 
> > 
> > 
> > > +	if (cpu < 0 || cpu > nr_cpu_ids)
> > > +		return -1;
> > > +
> > > +	if (!firmware_has_feature(FW_FEATURE_VPHN))
> > > +		goto out;
> > > +
> > > +	if (vphn_get_associativity(cpu, associativity))
> > > +		goto out;
> > > +
> > > +	index = of_read_number(associativity, 1);
> > > +	if ((index > min_common_depth + 1) && coregroup_enabled)
> > > +		return of_read_number(&associativity[index - 1], 1);
> 
> See ^above.
> 
> index would be the all the domains in the associativity array, 
> min_common_depth would be where the primary domain or the chip-id is
> defined. So we are reading the penultimate domain if and only if the
> min_common_depth isn't the primary domain aka chip-id. 
> 
> What other check /assertions can we add?
> 
> 
> > > +
> > > +out:
> > >  	return cpu_to_core_id(cpu);
> > >  }
> > > 
> > > -- 
> > > 2.17.1
> > > 
> 
> -- 
> Thanks and Regards
> Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id
  2020-07-20  9:10       ` Gautham R Shenoy
@ 2020-07-20 10:26         ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-20 10:26 UTC (permalink / raw)
  To: Gautham R Shenoy
  Cc: Nathan Lynch, Oliver OHalloran, Michael Neuling,
	Michael Ellerman, Anton Blanchard, linuxppc-dev, Nick Piggin

* Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-20 14:40:25]:

> Hi Srikar,
> 
> 
> On Mon, Jul 20, 2020 at 11:18:16AM +0530, Srikar Dronamraju wrote:
> > * Gautham R Shenoy <ego@linux.vnet.ibm.com> [2020-07-17 13:56:53]:
> > 
> > > On Tue, Jul 14, 2020 at 10:06:23AM +0530, Srikar Dronamraju wrote:
> > > > Lookup the coregroup id from the associativity array.
> > > > 
> > > > If unable to detect the coregroup id, fallback on the core id.
> > > > This way, ensure sched_domain degenerates and an extra sched domain is
> > > > not created.
> > > > 
> > > > Ideally this function should have been implemented in
> > > > arch/powerpc/kernel/smp.c. However if its implemented in mm/numa.c, we
> > > > don't need to find the primary domain again.
> > > > 
> > > > If the device-tree mentions more than one coregroup, then kernel
> > > > implements only the last or the smallest coregroup, which currently
> > > > corresponds to the penultimate domain in the device-tree.
> > > > 
> > > > Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
> > > > Cc: Michael Ellerman <michaele@au1.ibm.com>
> > > > Cc: Nick Piggin <npiggin@au1.ibm.com>
> > > > Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
> > > > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> > > > Cc: Michael Neuling <mikey@linux.ibm.com>
> > > > Cc: Anton Blanchard <anton@au1.ibm.com>
> > > > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > > > Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
> > > > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > > > ---
> > > >  arch/powerpc/mm/numa.c | 17 +++++++++++++++++
> > > >  1 file changed, 17 insertions(+)
> > > > 
> > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > > > index d9ab9da85eab..4e85564ef62a 100644
> > > > --- a/arch/powerpc/mm/numa.c
> > > > +++ b/arch/powerpc/mm/numa.c
> > > > @@ -1697,6 +1697,23 @@ static const struct proc_ops topology_proc_ops = {
> > > > 
> > > >  int cpu_to_coregroup_id(int cpu)
> > > >  {
> > > > +	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
> > > > +	int index;
> > > > +
> > > 
> > > It would be good to have an assert here to ensure that we are calling
> > > this function only when coregroups are enabled.
> > > 
> > > Else, we may end up returning the penultimate index which maps to the
> > > chip-id.
> > > 
> > 
> > We have a check below exactly for the same reason. Please look
> below.
> 
> I saw that. However, it would be better to assert within the function
> so that we don't call it from any other context without ascertaining
> first that core_groups are enabled. Or at least a comment in the
> function saying that we should call this only after ascertaining that
> core_groups are enabled.
> 
> 

Okay, I can move the coregroup_enabled check above.

> 
> > 
> > > 
> > > 
> > > > +	if (cpu < 0 || cpu > nr_cpu_ids)
> > > > +		return -1;
> > > > +
> > > > +	if (!firmware_has_feature(FW_FEATURE_VPHN))
> > > > +		goto out;
> > > > +
> > > > +	if (vphn_get_associativity(cpu, associativity))
> > > > +		goto out;
> > > > +
> > > > +	index = of_read_number(associativity, 1);
> > > > +	if ((index > min_common_depth + 1) && coregroup_enabled)
> > > > +		return of_read_number(&associativity[index - 1], 1);
> > 
> > See ^above.
> > 
> > index would be the all the domains in the associativity array, 
> > min_common_depth would be where the primary domain or the chip-id is
> > defined. So we are reading the penultimate domain if and only if the
> > min_common_depth isn't the primary domain aka chip-id. 
> > 
> > What other check /assertions can we add?
> > 
> > 
> > > > +
> > > > +out:
> > > >  	return cpu_to_core_id(cpu);
> > > >  }
> > > > 
> > > > -- 
> > > > 2.17.1
> > > > 
> > 
> > -- 
> > Thanks and Regards
> > Srikar Dronamraju

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/11] Powerpc/numa: Detect support for coregroup
  2020-07-14  4:36 ` [PATCH 07/11] Powerpc/numa: Detect support for coregroup Srikar Dronamraju
  2020-07-17  8:08   ` Gautham R Shenoy
@ 2020-07-20 13:56   ` Michael Ellerman
  2020-07-21  2:57     ` Srikar Dronamraju
  1 sibling, 1 reply; 42+ messages in thread
From: Michael Ellerman @ 2020-07-20 13:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Anton Blanchard,
	linuxppc-dev, Nick Piggin

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:

> Add support for grouping cores based on the device-tree classification.
> - The last domain in the associativity domains always refers to the
> core.
> - If primary reference domain happens to be the penultimate domain in
> the associativity domains device-tree property, then there are no
> coregroups. However if its not a penultimate domain, then there are
> coregroups. There can be more than one coregroup. For now we would be
> interested in the last or the smallest coregroups.

Should I know what a "coregroup" is? It's not a term I'm familiar with.

When you repost can you expand the Cc list to include lkml and
scheduler/topology folks please.

cheers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup
  2020-07-14  4:36 ` [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup Srikar Dronamraju
  2020-07-17  8:28   ` Gautham R Shenoy
@ 2020-07-20 13:57   ` Michael Ellerman
  1 sibling, 0 replies; 42+ messages in thread
From: Michael Ellerman @ 2020-07-20 13:57 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Srikar Dronamraju, Anton Blanchard,
	linuxppc-dev, Nick Piggin

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> If user wants to enable coregroup sched_domain then they can boot with
> kernel parameter "coregroup_support=on"

Why would they want to do that?

Adding options like this increases our test matrix by 2x (though in
reality the non-default case will never be tested).

cheers

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 07/11] Powerpc/numa: Detect support for coregroup
  2020-07-20 13:56   ` Michael Ellerman
@ 2020-07-21  2:57     ` Srikar Dronamraju
  0 siblings, 0 replies; 42+ messages in thread
From: Srikar Dronamraju @ 2020-07-21  2:57 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Oliver OHalloran,
	Michael Neuling, Anton Blanchard, linuxppc-dev, Nick Piggin

* Michael Ellerman <mpe@ellerman.id.au> [2020-07-20 23:56:19]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> 
> > Add support for grouping cores based on the device-tree classification.
> > - The last domain in the associativity domains always refers to the
> > core.
> > - If primary reference domain happens to be the penultimate domain in
> > the associativity domains device-tree property, then there are no
> > coregroups. However if its not a penultimate domain, then there are
> > coregroups. There can be more than one coregroup. For now we would be
> > interested in the last or the smallest coregroups.
> 
> Should I know what a "coregroup" is? It's not a term I'm familiar with.
> 

Coregroup is a group or subset of cores from the same chip/DIE that share
some resources. This is very similar to MC domain aka Multi-Core Cache (MC),
(kernel/sched/topology.c) where cores share the same cache. In the
Multi-Core Cache domain, all the cores of that domain share the last level
of cache.

Will add the description to the commit message.

> When you repost can you expand the Cc list to include lkml and
> scheduler/topology folks please.
> 

Okay, will add them, I shall copy to LKML, Peter Zijlstra and Ingo Molnar.
Do let me know if you want anyone else to added.

> cheers

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2020-07-21  2:59 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-14  4:36 [PATCH 00/11] Support for grouping cores Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 01/11] powerpc/smp: Cache node for reuse Srikar Dronamraju
2020-07-17  4:51   ` Gautham R Shenoy
2020-07-14  4:36 ` [PATCH 02/11] powerpc/smp: Merge Power9 topology with Power topology Srikar Dronamraju
2020-07-17  5:44   ` Gautham R Shenoy
2020-07-20  8:10     ` Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 03/11] powerpc/smp: Move powerpc_topology above Srikar Dronamraju
2020-07-17  5:45   ` Gautham R Shenoy
2020-07-14  4:36 ` [PATCH 04/11] powerpc/smp: Enable small core scheduling sooner Srikar Dronamraju
2020-07-17  5:48   ` Gautham R Shenoy
2020-07-20  7:20     ` Srikar Dronamraju
2020-07-20  7:47   ` Jordan Niethe
2020-07-20  8:52     ` Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 05/11] powerpc/smp: Dont assume l2-cache to be superset of sibling Srikar Dronamraju
2020-07-14  5:40   ` Oliver O'Halloran
2020-07-14  6:30     ` Srikar Dronamraju
2020-07-17  6:00   ` Gautham R Shenoy
2020-07-20  6:45     ` Srikar Dronamraju
2020-07-20  8:58       ` Gautham R Shenoy
2020-07-14  4:36 ` [PATCH 06/11] powerpc/smp: Generalize 2nd sched domain Srikar Dronamraju
2020-07-17  6:37   ` Gautham R Shenoy
2020-07-20  6:19     ` Srikar Dronamraju
2020-07-20  9:07       ` Gautham R Shenoy
2020-07-14  4:36 ` [PATCH 07/11] Powerpc/numa: Detect support for coregroup Srikar Dronamraju
2020-07-17  8:08   ` Gautham R Shenoy
2020-07-20 13:56   ` Michael Ellerman
2020-07-21  2:57     ` Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 08/11] powerpc/smp: Allocate cpumask only after searching thread group Srikar Dronamraju
2020-07-17  8:08   ` Gautham R Shenoy
2020-07-14  4:36 ` [PATCH 09/11] Powerpc/smp: Create coregroup domain Srikar Dronamraju
2020-07-17  8:19   ` Gautham R Shenoy
2020-07-17  8:23     ` Gautham R Shenoy
2020-07-20  6:02     ` Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 10/11] powerpc/smp: Implement cpu_to_coregroup_id Srikar Dronamraju
2020-07-17  8:26   ` Gautham R Shenoy
2020-07-20  5:48     ` Srikar Dronamraju
2020-07-20  9:10       ` Gautham R Shenoy
2020-07-20 10:26         ` Srikar Dronamraju
2020-07-14  4:36 ` [PATCH 11/11] powerpc/smp: Provide an ability to disable coregroup Srikar Dronamraju
2020-07-17  8:28   ` Gautham R Shenoy
2020-07-20 13:57   ` Michael Ellerman
2020-07-14  5:06 ` [PATCH 00/11] Support for grouping cores Srikar Dronamraju

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.