linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property
@ 2021-07-28 17:56 Parth Shah
  2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy

Changes from v1 -> v2:
- Based on Gautham's comments, use a separate thread_group_l3_cache_map
  and modify parsing code to build cache_map for L3. This makes the
  cache_map building code isolated from the parsing code.
v1 can be found at:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html

On POWER10 big-core system, the L3 cache reflected by sysfs contains all
the CPUs in the big-core.

grep . /sys/devices/system/cpu/cpu0/cache/index*/shared_cpu_list
/sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list:0-7

In the above example, CPU-0 observes CPU 0-7 in L3 (index3) cache, which
is not correct as only the CPUs in small core share the L3 cache.

The "ibm,thread-groups" contains property "2" to indicate that the CPUs
share both the L2 and L3 caches. This patch-set uses this property to
reflect correct L3 topology to a cache-object.

After applying this patch-set, the topology looks like:
$> ppc64_cpu --smt=8
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11,13,15


$> ppc64_cpu --smt=4
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11

$> ppc64_cpu --smt=2
$> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9

$> ppc64_cpu --smt=1
grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8

Patches Organization:
=====================
This patch-set series is based on top of v5.14-rc2

- Patch 1-2: Add functionality to introduce awareness for
"ibm,thread-groups". Original (not merged) posted version can be found at:
https://lore.kernel.org/linuxppc-dev/1611041780-8640-1-git-send-email-ego@linux.vnet.ibm.co
- Patch 3: Use existing L2 cache_map to detect L3 cache siblings


Gautham R. Shenoy (2):
  powerpc/cacheinfo: Lookup cache by dt node and thread-group id
  powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()

Parth Shah (1):
  powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache
    siblings

 arch/powerpc/include/asm/smp.h  |   6 ++
 arch/powerpc/kernel/cacheinfo.c | 124 ++++++++++++++++----------------
 arch/powerpc/kernel/smp.c       |  70 ++++++++++++------
 3 files changed, 115 insertions(+), 85 deletions(-)

-- 
2.26.3


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id
  2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah
@ 2021-07-28 17:56 ` Parth Shah
  2021-08-06  5:43   ` Srikar Dronamraju
  2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

Currently the cacheinfo code on powerpc indexes the "cache" objects
(modelling the L1/L2/L3 caches) where the key is device-tree node
corresponding to that cache. On some of the POWER server platforms
thread-groups within the core share different sets of caches (Eg: On
SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
threads 1,3,5,7 of the same core share another L1 cache). On such
platforms, there is a single device-tree node corresponding to that
cache and the cache-configuration within the threads of the core is
indicated via "ibm,thread-groups" device-tree property.

Since the current code is not aware of the "ibm,thread-groups"
property, on the aforementoined systems, cacheinfo code still treats
all the threads in the core to be sharing the cache because of the
single device-tree node (In the earlier example, the cacheinfo code
would says CPUs 0-7 share L1 cache).

In this patch, we make the powerpc cacheinfo code aware of the
"ibm,thread-groups" property. We indexe the "cache" objects by the
key-pair (device-tree node, thread-group id). For any CPUX, for a
given level of cache, the thread-group id is defined to be the first
CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
of cache which are not represented in "ibm,thread-groups" property,
the thread-group id is -1.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
and "thread_group_l2_cache_map" to get rid of the compile error.]
Signed-off-by: Parth Shah <parth@linux.ibm.com>
---
 arch/powerpc/include/asm/smp.h  |  3 ++
 arch/powerpc/kernel/cacheinfo.c | 80 ++++++++++++++++++++++++---------
 arch/powerpc/kernel/smp.c       |  4 +-
 3 files changed, 63 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 03b3d010cbab..1259040cc3a4 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -33,6 +33,9 @@ extern bool coregroup_enabled;
 extern int cpu_to_chip_id(int cpu);
 extern int *chip_id_lookup_table;
 
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+
 #ifdef CONFIG_SMP
 
 struct smp_ops_t {
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 6f903e9aa20b..5a6925d87424 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -120,6 +120,7 @@ struct cache {
 	struct cpumask shared_cpu_map; /* online CPUs using this cache */
 	int type;                      /* split cache disambiguation */
 	int level;                     /* level not explicit in device tree */
+	int group_id;                  /* id of the group of threads that share this cache */
 	struct list_head list;         /* global list of cache objects */
 	struct cache *next_local;      /* next cache of >= level */
 };
@@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache *cache)
 }
 
 static void cache_init(struct cache *cache, int type, int level,
-		       struct device_node *ofnode)
+		       struct device_node *ofnode, int group_id)
 {
 	cache->type = type;
 	cache->level = level;
 	cache->ofnode = of_node_get(ofnode);
+	cache->group_id = group_id;
 	INIT_LIST_HEAD(&cache->list);
 	list_add(&cache->list, &cache_list);
 }
 
-static struct cache *new_cache(int type, int level, struct device_node *ofnode)
+static struct cache *new_cache(int type, int level,
+			       struct device_node *ofnode, int group_id)
 {
 	struct cache *cache;
 
 	cache = kzalloc(sizeof(*cache), GFP_KERNEL);
 	if (cache)
-		cache_init(cache, type, level, ofnode);
+		cache_init(cache, type, level, ofnode, group_id);
 
 	return cache;
 }
@@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct cache *cache)
 		return cache;
 
 	list_for_each_entry(iter, &cache_list, list)
-		if (iter->ofnode == cache->ofnode && iter->next_local == cache)
+		if (iter->ofnode == cache->ofnode &&
+		    iter->group_id == cache->group_id &&
+		    iter->next_local == cache)
 			return iter;
 
 	return cache;
 }
 
-/* return the first cache on a local list matching node */
-static struct cache *cache_lookup_by_node(const struct device_node *node)
+/* return the first cache on a local list matching node and thread-group id */
+static struct cache *cache_lookup_by_node_group(const struct device_node *node,
+						int group_id)
 {
 	struct cache *cache = NULL;
 	struct cache *iter;
 
 	list_for_each_entry(iter, &cache_list, list) {
-		if (iter->ofnode != node)
+		if (iter->ofnode != node ||
+		    iter->group_id != group_id)
 			continue;
 		cache = cache_find_first_sibling(iter);
 		break;
@@ -352,14 +359,15 @@ static int cache_is_unified_d(const struct device_node *np)
 		CACHE_TYPE_UNIFIED_D : CACHE_TYPE_UNIFIED;
 }
 
-static struct cache *cache_do_one_devnode_unified(struct device_node *node, int level)
+static struct cache *cache_do_one_devnode_unified(struct device_node *node, int group_id,
+						  int level)
 {
 	pr_debug("creating L%d ucache for %pOFP\n", level, node);
 
-	return new_cache(cache_is_unified_d(node), level, node);
+	return new_cache(cache_is_unified_d(node), level, node, group_id);
 }
 
-static struct cache *cache_do_one_devnode_split(struct device_node *node,
+static struct cache *cache_do_one_devnode_split(struct device_node *node, int group_id,
 						int level)
 {
 	struct cache *dcache, *icache;
@@ -367,8 +375,8 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node,
 	pr_debug("creating L%d dcache and icache for %pOFP\n", level,
 		 node);
 
-	dcache = new_cache(CACHE_TYPE_DATA, level, node);
-	icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node);
+	dcache = new_cache(CACHE_TYPE_DATA, level, node, group_id);
+	icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node, group_id);
 
 	if (!dcache || !icache)
 		goto err;
@@ -382,31 +390,32 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node,
 	return NULL;
 }
 
-static struct cache *cache_do_one_devnode(struct device_node *node, int level)
+static struct cache *cache_do_one_devnode(struct device_node *node, int group_id, int level)
 {
 	struct cache *cache;
 
 	if (cache_node_is_unified(node))
-		cache = cache_do_one_devnode_unified(node, level);
+		cache = cache_do_one_devnode_unified(node, group_id, level);
 	else
-		cache = cache_do_one_devnode_split(node, level);
+		cache = cache_do_one_devnode_split(node, group_id, level);
 
 	return cache;
 }
 
 static struct cache *cache_lookup_or_instantiate(struct device_node *node,
+						 int group_id,
 						 int level)
 {
 	struct cache *cache;
 
-	cache = cache_lookup_by_node(node);
+	cache = cache_lookup_by_node_group(node, group_id);
 
 	WARN_ONCE(cache && cache->level != level,
 		  "cache level mismatch on lookup (got %d, expected %d)\n",
 		  cache->level, level);
 
 	if (!cache)
-		cache = cache_do_one_devnode(node, level);
+		cache = cache_do_one_devnode(node, group_id, level);
 
 	return cache;
 }
@@ -443,7 +452,27 @@ static void do_subsidiary_caches_debugcheck(struct cache *cache)
 		  of_node_get_device_type(cache->ofnode));
 }
 
-static void do_subsidiary_caches(struct cache *cache)
+/*
+ * If sub-groups of threads in a core containing @cpu_id share the
+ * L@level-cache (information obtained via "ibm,thread-groups"
+ * device-tree property), then we identify the group by the first
+ * thread-sibling in the group. We define this to be the group-id.
+ *
+ * In the absence of any thread-group information for L@level-cache,
+ * this function returns -1.
+ */
+static int get_group_id(unsigned int cpu_id, int level)
+{
+	if (has_big_cores && level == 1)
+		return cpumask_first(per_cpu(thread_group_l1_cache_map,
+					     cpu_id));
+	else if (thread_group_shares_l2 && level == 2)
+		return cpumask_first(per_cpu(thread_group_l2_cache_map,
+					     cpu_id));
+	return -1;
+}
+
+static void do_subsidiary_caches(struct cache *cache, unsigned int cpu_id)
 {
 	struct device_node *subcache_node;
 	int level = cache->level;
@@ -452,9 +481,11 @@ static void do_subsidiary_caches(struct cache *cache)
 
 	while ((subcache_node = of_find_next_cache_node(cache->ofnode))) {
 		struct cache *subcache;
+		int group_id;
 
 		level++;
-		subcache = cache_lookup_or_instantiate(subcache_node, level);
+		group_id = get_group_id(cpu_id, level);
+		subcache = cache_lookup_or_instantiate(subcache_node, group_id, level);
 		of_node_put(subcache_node);
 		if (!subcache)
 			break;
@@ -468,6 +499,7 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id)
 {
 	struct device_node *cpu_node;
 	struct cache *cpu_cache = NULL;
+	int group_id;
 
 	pr_debug("creating cache object(s) for CPU %i\n", cpu_id);
 
@@ -476,11 +508,13 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id)
 	if (!cpu_node)
 		goto out;
 
-	cpu_cache = cache_lookup_or_instantiate(cpu_node, 1);
+	group_id = get_group_id(cpu_id, 1);
+
+	cpu_cache = cache_lookup_or_instantiate(cpu_node, group_id, 1);
 	if (!cpu_cache)
 		goto out;
 
-	do_subsidiary_caches(cpu_cache);
+	do_subsidiary_caches(cpu_cache, cpu_id);
 
 	cache_cpu_set(cpu_cache, cpu_id);
 out:
@@ -848,13 +882,15 @@ static struct cache *cache_lookup_by_cpu(unsigned int cpu_id)
 {
 	struct device_node *cpu_node;
 	struct cache *cache;
+	int group_id;
 
 	cpu_node = of_get_cpu_node(cpu_id, NULL);
 	WARN_ONCE(!cpu_node, "no OF node found for CPU %i\n", cpu_id);
 	if (!cpu_node)
 		return NULL;
 
-	cache = cache_lookup_by_node(cpu_node);
+	group_id = get_group_id(cpu_id, 1);
+	cache = cache_lookup_by_node_group(cpu_node, group_id);
 	of_node_put(cpu_node);
 
 	return cache;
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 447b78a87c8f..a7fcac44a8e2 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -122,14 +122,14 @@ static struct thread_groups_list tgl[NR_CPUS] __initdata;
  * On big-cores system, thread_group_l1_cache_map for each CPU corresponds to
  * the set its siblings that share the L1-cache.
  */
-static DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
+DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
 
 /*
  * On some big-cores system, thread_group_l2_cache_map for each CPU
  * corresponds to the set its siblings within the core that share the
  * L2-cache.
  */
-static DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
 
 /* SMP operations for this machine */
 struct smp_ops_t *smp_ops;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()
  2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah
  2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah
@ 2021-07-28 17:56 ` Parth Shah
  2021-08-06  5:44   ` Srikar Dronamraju
  2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah
  2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman
  3 siblings, 1 reply; 8+ messages in thread
From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The helper function get_shared_cpu_map() was added in

'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct
shared_cpu_map on big-cores")'

and subsequently expanded upon in

'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling
map/list for L2 cache")'

in order to help report the correct groups of threads sharing these caches
on big-core systems where groups of threads within a core can share
different sets of caches.

Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
cache->shared_cpu_map contains the correct set of thread-siblings
sharing the cache. Hence we no longer need the functions
get_shared_cpu_map(). This patch removes this function. We also remove
the helper function index_dir_to_cpu() which was only called by
get_shared_cpu_map().

With these functions removed, we can still see the correct
cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
caches distributed among groups of threads in a core.

With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
are split between the two groups of threads in a core, for CPUs 8,9,
the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
follows:

$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15

$ ppc64_cpu --smt=4
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11

$ ppc64_cpu --smt=2
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
/sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
/sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9

$ ppc64_cpu --smt=1
$ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
/sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
/sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/cacheinfo.c | 41 +--------------------------------
 1 file changed, 1 insertion(+), 40 deletions(-)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 5a6925d87424..20d91693eac1 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char *
 static struct kobj_attribute cache_level_attr =
 	__ATTR(level, 0444, level_show, NULL);
 
-static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
-{
-	struct kobject *index_dir_kobj = &index->kobj;
-	struct kobject *cache_dir_kobj = index_dir_kobj->parent;
-	struct kobject *cpu_dev_kobj = cache_dir_kobj->parent;
-	struct device *dev = kobj_to_dev(cpu_dev_kobj);
-
-	return dev->id;
-}
-
-/*
- * On big-core systems, each core has two groups of CPUs each of which
- * has its own L1-cache. The thread-siblings which share l1-cache with
- * @cpu can be obtained via cpu_smallcore_mask().
- *
- * On some big-core systems, the L2 cache is shared only between some
- * groups of siblings. This is already parsed and encoded in
- * cpu_l2_cache_mask().
- *
- * TODO: cache_lookup_or_instantiate() needs to be made aware of the
- *       "ibm,thread-groups" property so that cache->shared_cpu_map
- *       reflects the correct siblings on platforms that have this
- *       device-tree property. This helper function is only a stop-gap
- *       solution so that we report the correct siblings to the
- *       userspace via sysfs.
- */
-static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache)
-{
-	if (has_big_cores) {
-		int cpu = index_dir_to_cpu(index);
-		if (cache->level == 1)
-			return cpu_smallcore_mask(cpu);
-		if (cache->level == 2 && thread_group_shares_l2)
-			return cpu_l2_cache_mask(cpu);
-	}
-
-	return &cache->shared_cpu_map;
-}
-
 static ssize_t
 show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bool list)
 {
@@ -724,7 +685,7 @@ show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bo
 	index = kobj_to_cache_index_dir(k);
 	cache = index->cache;
 
-	mask = get_shared_cpu_map(index, cache);
+	mask = &cache->shared_cpu_map;
 
 	return cpumap_print_to_pagebuf(list, buf, mask);
 }
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
  2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah
  2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah
  2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah
@ 2021-07-28 17:56 ` Parth Shah
  2021-07-30 10:08   ` Gautham R Shenoy
  2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman
  3 siblings, 1 reply; 8+ messages in thread
From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy

On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
itself to find both the L2 and L3 cache siblings.
Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
but fill the mask using same property "2" array.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
---
 arch/powerpc/include/asm/smp.h  |  3 ++
 arch/powerpc/kernel/cacheinfo.c |  3 ++
 arch/powerpc/kernel/smp.c       | 66 ++++++++++++++++++++++-----------
 3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 1259040cc3a4..7ef1cd8168a0 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -35,6 +35,7 @@ extern int *chip_id_lookup_table;
 
 DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
 DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
 
 #ifdef CONFIG_SMP
 
@@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu);
 
 extern bool has_big_cores;
 extern bool thread_group_shares_l2;
+extern bool thread_group_shares_l3;
 
 #define cpu_smt_mask cpu_smt_mask
 #ifdef CONFIG_SCHED_SMT
@@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu);
 #define hard_smp_processor_id()		get_hard_smp_processor_id(0)
 #define smp_setup_cpu_maps()
 #define thread_group_shares_l2  0
+#define thread_group_shares_l3	0
 static inline void inhibit_secondary_onlining(void) {}
 static inline void uninhibit_secondary_onlining(void) {}
 static inline const struct cpumask *cpu_sibling_mask(int cpu)
diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 20d91693eac1..cf1be75b7833 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level)
 	else if (thread_group_shares_l2 && level == 2)
 		return cpumask_first(per_cpu(thread_group_l2_cache_map,
 					     cpu_id));
+	else if (thread_group_shares_l3 && level == 3)
+		return cpumask_first(per_cpu(thread_group_l3_cache_map,
+					     cpu_id));
 	return -1;
 }
 
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a7fcac44a8e2..f2abd88e0c25 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -78,6 +78,7 @@ struct task_struct *secondary_current;
 bool has_big_cores;
 bool coregroup_enabled;
 bool thread_group_shares_l2;
+bool thread_group_shares_l3;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
@@ -101,7 +102,7 @@ enum {
 
 #define MAX_THREAD_LIST_SIZE	8
 #define THREAD_GROUP_SHARE_L1   1
-#define THREAD_GROUP_SHARE_L2   2
+#define THREAD_GROUP_SHARE_L2_L3 2
 struct thread_groups {
 	unsigned int property;
 	unsigned int nr_groups;
@@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
  */
 DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
 
+/*
+ * On P10, thread_group_l3_cache_map for each CPU is equal to the
+ * thread_group_l2_cache_map
+ */
+DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
+
 /* SMP operations for this machine */
 struct smp_ops_t *smp_ops;
 
@@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int cpu,
 	return tg;
 }
 
+static int update_mask_from_threadgroup(cpumask_var_t *mask, struct thread_groups *tg, int cpu, int cpu_group_start)
+{
+	int first_thread = cpu_first_thread_sibling(cpu);
+	int i;
+
+	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
+
+	for (i = first_thread; i < first_thread + threads_per_core; i++) {
+		int i_group_start = get_cpu_thread_group_start(i, tg);
+
+		if (unlikely(i_group_start == -1)) {
+			WARN_ON_ONCE(1);
+			return -ENODATA;
+		}
+
+		if (i_group_start == cpu_group_start)
+			cpumask_set_cpu(i, *mask);
+	}
+
+	return 0;
+}
+
 static int __init init_thread_group_cache_map(int cpu, int cache_property)
 
 {
-	int first_thread = cpu_first_thread_sibling(cpu);
-	int i, cpu_group_start = -1, err = 0;
+	int cpu_group_start = -1, err = 0;
 	struct thread_groups *tg = NULL;
 	cpumask_var_t *mask = NULL;
 
 	if (cache_property != THREAD_GROUP_SHARE_L1 &&
-	    cache_property != THREAD_GROUP_SHARE_L2)
+	    cache_property != THREAD_GROUP_SHARE_L2_L3)
 		return -EINVAL;
 
 	tg = get_thread_groups(cpu, cache_property, &err);
+
 	if (!tg)
 		return err;
 
@@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property)
 		return -ENODATA;
 	}
 
-	if (cache_property == THREAD_GROUP_SHARE_L1)
+	if (cache_property == THREAD_GROUP_SHARE_L1) {
 		mask = &per_cpu(thread_group_l1_cache_map, cpu);
-	else if (cache_property == THREAD_GROUP_SHARE_L2)
+		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
+	}
+	else if (cache_property == THREAD_GROUP_SHARE_L2_L3) {
 		mask = &per_cpu(thread_group_l2_cache_map, cpu);
-
-	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
-
-	for (i = first_thread; i < first_thread + threads_per_core; i++) {
-		int i_group_start = get_cpu_thread_group_start(i, tg);
-
-		if (unlikely(i_group_start == -1)) {
-			WARN_ON_ONCE(1);
-			return -ENODATA;
-		}
-
-		if (i_group_start == cpu_group_start)
-			cpumask_set_cpu(i, *mask);
+		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
+		mask = &per_cpu(thread_group_l3_cache_map, cpu);
+		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
 	}
 
+
 	return 0;
 }
 
@@ -1020,14 +1042,16 @@ static int __init init_big_cores(void)
 	has_big_cores = true;
 
 	for_each_possible_cpu(cpu) {
-		int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2);
+		int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2_L3);
 
 		if (err)
 			return err;
 	}
 
 	thread_group_shares_l2 = true;
-	pr_debug("L2 cache only shared by the threads in the small core\n");
+	thread_group_shares_l3 = true;
+	pr_debug("L2/L3 cache only shared by the threads in the small core\n");
+
 	return 0;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
  2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah
@ 2021-07-30 10:08   ` Gautham R Shenoy
  0 siblings, 0 replies; 8+ messages in thread
From: Gautham R Shenoy @ 2021-07-30 10:08 UTC (permalink / raw)
  To: Parth Shah; +Cc: ego, mikey, srikar, parths1229, svaidy, linuxppc-dev

On Wed, Jul 28, 2021 at 11:26:07PM +0530, Parth Shah wrote:
> On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus
> in thread-group share both L2 and L3 caches. Hence, use cache_property = 2
> itself to find both the L2 and L3 cache siblings.
> Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings,
> but fill the mask using same property "2" array.

This version looks good to me.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> 
> Signed-off-by: Parth Shah <parth@linux.ibm.com>

> ---
>  arch/powerpc/include/asm/smp.h  |  3 ++
>  arch/powerpc/kernel/cacheinfo.c |  3 ++
>  arch/powerpc/kernel/smp.c       | 66 ++++++++++++++++++++++-----------
>  3 files changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 1259040cc3a4..7ef1cd8168a0 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -35,6 +35,7 @@ extern int *chip_id_lookup_table;
> 
>  DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
>  DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> +DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
> 
>  #ifdef CONFIG_SMP
> 
> @@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu);
> 
>  extern bool has_big_cores;
>  extern bool thread_group_shares_l2;
> +extern bool thread_group_shares_l3;
> 
>  #define cpu_smt_mask cpu_smt_mask
>  #ifdef CONFIG_SCHED_SMT
> @@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu);
>  #define hard_smp_processor_id()		get_hard_smp_processor_id(0)
>  #define smp_setup_cpu_maps()
>  #define thread_group_shares_l2  0
> +#define thread_group_shares_l3	0
>  static inline void inhibit_secondary_onlining(void) {}
>  static inline void uninhibit_secondary_onlining(void) {}
>  static inline const struct cpumask *cpu_sibling_mask(int cpu)
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 20d91693eac1..cf1be75b7833 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level)
>  	else if (thread_group_shares_l2 && level == 2)
>  		return cpumask_first(per_cpu(thread_group_l2_cache_map,
>  					     cpu_id));
> +	else if (thread_group_shares_l3 && level == 3)
> +		return cpumask_first(per_cpu(thread_group_l3_cache_map,
> +					     cpu_id));
>  	return -1;
>  }
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index a7fcac44a8e2..f2abd88e0c25 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -78,6 +78,7 @@ struct task_struct *secondary_current;
>  bool has_big_cores;
>  bool coregroup_enabled;
>  bool thread_group_shares_l2;
> +bool thread_group_shares_l3;
> 
>  DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
>  DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
> @@ -101,7 +102,7 @@ enum {
> 
>  #define MAX_THREAD_LIST_SIZE	8
>  #define THREAD_GROUP_SHARE_L1   1
> -#define THREAD_GROUP_SHARE_L2   2
> +#define THREAD_GROUP_SHARE_L2_L3 2
>  struct thread_groups {
>  	unsigned int property;
>  	unsigned int nr_groups;
> @@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
>   */
>  DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> 
> +/*
> + * On P10, thread_group_l3_cache_map for each CPU is equal to the
> + * thread_group_l2_cache_map
> + */
> +DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map);
> +
>  /* SMP operations for this machine */
>  struct smp_ops_t *smp_ops;
> 
> @@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int cpu,
>  	return tg;
>  }
> 
> +static int update_mask_from_threadgroup(cpumask_var_t *mask, struct thread_groups *tg, int cpu, int cpu_group_start)
> +{
> +	int first_thread = cpu_first_thread_sibling(cpu);
> +	int i;
> +
> +	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> +
> +	for (i = first_thread; i < first_thread + threads_per_core; i++) {
> +		int i_group_start = get_cpu_thread_group_start(i, tg);
> +
> +		if (unlikely(i_group_start == -1)) {
> +			WARN_ON_ONCE(1);
> +			return -ENODATA;
> +		}
> +
> +		if (i_group_start == cpu_group_start)
> +			cpumask_set_cpu(i, *mask);
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init init_thread_group_cache_map(int cpu, int cache_property)
> 
>  {
> -	int first_thread = cpu_first_thread_sibling(cpu);
> -	int i, cpu_group_start = -1, err = 0;
> +	int cpu_group_start = -1, err = 0;
>  	struct thread_groups *tg = NULL;
>  	cpumask_var_t *mask = NULL;
> 
>  	if (cache_property != THREAD_GROUP_SHARE_L1 &&
> -	    cache_property != THREAD_GROUP_SHARE_L2)
> +	    cache_property != THREAD_GROUP_SHARE_L2_L3)
>  		return -EINVAL;
> 
>  	tg = get_thread_groups(cpu, cache_property, &err);
> +
>  	if (!tg)
>  		return err;
> 
> @@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property)
>  		return -ENODATA;
>  	}
> 
> -	if (cache_property == THREAD_GROUP_SHARE_L1)
> +	if (cache_property == THREAD_GROUP_SHARE_L1) {
>  		mask = &per_cpu(thread_group_l1_cache_map, cpu);
> -	else if (cache_property == THREAD_GROUP_SHARE_L2)
> +		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
> +	}
> +	else if (cache_property == THREAD_GROUP_SHARE_L2_L3) {
>  		mask = &per_cpu(thread_group_l2_cache_map, cpu);
> -
> -	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
> -
> -	for (i = first_thread; i < first_thread + threads_per_core; i++) {
> -		int i_group_start = get_cpu_thread_group_start(i, tg);
> -
> -		if (unlikely(i_group_start == -1)) {
> -			WARN_ON_ONCE(1);
> -			return -ENODATA;
> -		}
> -
> -		if (i_group_start == cpu_group_start)
> -			cpumask_set_cpu(i, *mask);
> +		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
> +		mask = &per_cpu(thread_group_l3_cache_map, cpu);
> +		update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start);
>  	}
> 
> +
>  	return 0;
>  }
> 
> @@ -1020,14 +1042,16 @@ static int __init init_big_cores(void)
>  	has_big_cores = true;
> 
>  	for_each_possible_cpu(cpu) {
> -		int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2);
> +		int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2_L3);
> 
>  		if (err)
>  			return err;
>  	}
> 
>  	thread_group_shares_l2 = true;
> -	pr_debug("L2 cache only shared by the threads in the small core\n");
> +	thread_group_shares_l3 = true;
> +	pr_debug("L2/L3 cache only shared by the threads in the small core\n");
> +
>  	return 0;
>  }
> 
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id
  2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah
@ 2021-08-06  5:43   ` Srikar Dronamraju
  0 siblings, 0 replies; 8+ messages in thread
From: Srikar Dronamraju @ 2021-08-06  5:43 UTC (permalink / raw)
  To: Parth Shah; +Cc: ego, mikey, parths1229, svaidy, linuxppc-dev

* Parth Shah <parth@linux.ibm.com> [2021-07-28 23:26:05]:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> Currently the cacheinfo code on powerpc indexes the "cache" objects
> (modelling the L1/L2/L3 caches) where the key is device-tree node
> corresponding to that cache. On some of the POWER server platforms
> thread-groups within the core share different sets of caches (Eg: On
> SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and
> threads 1,3,5,7 of the same core share another L1 cache). On such
> platforms, there is a single device-tree node corresponding to that
> cache and the cache-configuration within the threads of the core is
> indicated via "ibm,thread-groups" device-tree property.
> 
> Since the current code is not aware of the "ibm,thread-groups"
> property, on the aforementoined systems, cacheinfo code still treats
> all the threads in the core to be sharing the cache because of the
> single device-tree node (In the earlier example, the cacheinfo code
> would says CPUs 0-7 share L1 cache).
> 
> In this patch, we make the powerpc cacheinfo code aware of the
> "ibm,thread-groups" property. We indexe the "cache" objects by the
> key-pair (device-tree node, thread-group id). For any CPUX, for a
> given level of cache, the thread-group id is defined to be the first
> CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels
> of cache which are not represented in "ibm,thread-groups" property,
> the thread-group id is -1.
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
> [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map"
> and "thread_group_l2_cache_map" to get rid of the compile error.]
> Signed-off-by: Parth Shah <parth@linux.ibm.com>


Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> ---
>  arch/powerpc/include/asm/smp.h  |  3 ++
>  arch/powerpc/kernel/cacheinfo.c | 80 ++++++++++++++++++++++++---------
>  arch/powerpc/kernel/smp.c       |  4 +-
>  3 files changed, 63 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index 03b3d010cbab..1259040cc3a4 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -33,6 +33,9 @@ extern bool coregroup_enabled;
>  extern int cpu_to_chip_id(int cpu);
>  extern int *chip_id_lookup_table;
> 
> +DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
> +DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> +
>  #ifdef CONFIG_SMP
> 
>  struct smp_ops_t {
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 6f903e9aa20b..5a6925d87424 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -120,6 +120,7 @@ struct cache {
>  	struct cpumask shared_cpu_map; /* online CPUs using this cache */
>  	int type;                      /* split cache disambiguation */
>  	int level;                     /* level not explicit in device tree */
> +	int group_id;                  /* id of the group of threads that share this cache */
>  	struct list_head list;         /* global list of cache objects */
>  	struct cache *next_local;      /* next cache of >= level */
>  };
> @@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache *cache)
>  }
> 
>  static void cache_init(struct cache *cache, int type, int level,
> -		       struct device_node *ofnode)
> +		       struct device_node *ofnode, int group_id)
>  {
>  	cache->type = type;
>  	cache->level = level;
>  	cache->ofnode = of_node_get(ofnode);
> +	cache->group_id = group_id;
>  	INIT_LIST_HEAD(&cache->list);
>  	list_add(&cache->list, &cache_list);
>  }
> 
> -static struct cache *new_cache(int type, int level, struct device_node *ofnode)
> +static struct cache *new_cache(int type, int level,
> +			       struct device_node *ofnode, int group_id)
>  {
>  	struct cache *cache;
> 
>  	cache = kzalloc(sizeof(*cache), GFP_KERNEL);
>  	if (cache)
> -		cache_init(cache, type, level, ofnode);
> +		cache_init(cache, type, level, ofnode, group_id);
> 
>  	return cache;
>  }
> @@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct cache *cache)
>  		return cache;
> 
>  	list_for_each_entry(iter, &cache_list, list)
> -		if (iter->ofnode == cache->ofnode && iter->next_local == cache)
> +		if (iter->ofnode == cache->ofnode &&
> +		    iter->group_id == cache->group_id &&
> +		    iter->next_local == cache)
>  			return iter;
> 
>  	return cache;
>  }
> 
> -/* return the first cache on a local list matching node */
> -static struct cache *cache_lookup_by_node(const struct device_node *node)
> +/* return the first cache on a local list matching node and thread-group id */
> +static struct cache *cache_lookup_by_node_group(const struct device_node *node,
> +						int group_id)
>  {
>  	struct cache *cache = NULL;
>  	struct cache *iter;
> 
>  	list_for_each_entry(iter, &cache_list, list) {
> -		if (iter->ofnode != node)
> +		if (iter->ofnode != node ||
> +		    iter->group_id != group_id)
>  			continue;
>  		cache = cache_find_first_sibling(iter);
>  		break;
> @@ -352,14 +359,15 @@ static int cache_is_unified_d(const struct device_node *np)
>  		CACHE_TYPE_UNIFIED_D : CACHE_TYPE_UNIFIED;
>  }
> 
> -static struct cache *cache_do_one_devnode_unified(struct device_node *node, int level)
> +static struct cache *cache_do_one_devnode_unified(struct device_node *node, int group_id,
> +						  int level)
>  {
>  	pr_debug("creating L%d ucache for %pOFP\n", level, node);
> 
> -	return new_cache(cache_is_unified_d(node), level, node);
> +	return new_cache(cache_is_unified_d(node), level, node, group_id);
>  }
> 
> -static struct cache *cache_do_one_devnode_split(struct device_node *node,
> +static struct cache *cache_do_one_devnode_split(struct device_node *node, int group_id,
>  						int level)
>  {
>  	struct cache *dcache, *icache;
> @@ -367,8 +375,8 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node,
>  	pr_debug("creating L%d dcache and icache for %pOFP\n", level,
>  		 node);
> 
> -	dcache = new_cache(CACHE_TYPE_DATA, level, node);
> -	icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node);
> +	dcache = new_cache(CACHE_TYPE_DATA, level, node, group_id);
> +	icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node, group_id);
> 
>  	if (!dcache || !icache)
>  		goto err;
> @@ -382,31 +390,32 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node,
>  	return NULL;
>  }
> 
> -static struct cache *cache_do_one_devnode(struct device_node *node, int level)
> +static struct cache *cache_do_one_devnode(struct device_node *node, int group_id, int level)
>  {
>  	struct cache *cache;
> 
>  	if (cache_node_is_unified(node))
> -		cache = cache_do_one_devnode_unified(node, level);
> +		cache = cache_do_one_devnode_unified(node, group_id, level);
>  	else
> -		cache = cache_do_one_devnode_split(node, level);
> +		cache = cache_do_one_devnode_split(node, group_id, level);
> 
>  	return cache;
>  }
> 
>  static struct cache *cache_lookup_or_instantiate(struct device_node *node,
> +						 int group_id,
>  						 int level)
>  {
>  	struct cache *cache;
> 
> -	cache = cache_lookup_by_node(node);
> +	cache = cache_lookup_by_node_group(node, group_id);
> 
>  	WARN_ONCE(cache && cache->level != level,
>  		  "cache level mismatch on lookup (got %d, expected %d)\n",
>  		  cache->level, level);
> 
>  	if (!cache)
> -		cache = cache_do_one_devnode(node, level);
> +		cache = cache_do_one_devnode(node, group_id, level);
> 
>  	return cache;
>  }
> @@ -443,7 +452,27 @@ static void do_subsidiary_caches_debugcheck(struct cache *cache)
>  		  of_node_get_device_type(cache->ofnode));
>  }
> 
> -static void do_subsidiary_caches(struct cache *cache)
> +/*
> + * If sub-groups of threads in a core containing @cpu_id share the
> + * L@level-cache (information obtained via "ibm,thread-groups"
> + * device-tree property), then we identify the group by the first
> + * thread-sibling in the group. We define this to be the group-id.
> + *
> + * In the absence of any thread-group information for L@level-cache,
> + * this function returns -1.
> + */
> +static int get_group_id(unsigned int cpu_id, int level)
> +{
> +	if (has_big_cores && level == 1)
> +		return cpumask_first(per_cpu(thread_group_l1_cache_map,
> +					     cpu_id));
> +	else if (thread_group_shares_l2 && level == 2)
> +		return cpumask_first(per_cpu(thread_group_l2_cache_map,
> +					     cpu_id));
> +	return -1;
> +}
> +
> +static void do_subsidiary_caches(struct cache *cache, unsigned int cpu_id)
>  {
>  	struct device_node *subcache_node;
>  	int level = cache->level;
> @@ -452,9 +481,11 @@ static void do_subsidiary_caches(struct cache *cache)
> 
>  	while ((subcache_node = of_find_next_cache_node(cache->ofnode))) {
>  		struct cache *subcache;
> +		int group_id;
> 
>  		level++;
> -		subcache = cache_lookup_or_instantiate(subcache_node, level);
> +		group_id = get_group_id(cpu_id, level);
> +		subcache = cache_lookup_or_instantiate(subcache_node, group_id, level);
>  		of_node_put(subcache_node);
>  		if (!subcache)
>  			break;
> @@ -468,6 +499,7 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id)
>  {
>  	struct device_node *cpu_node;
>  	struct cache *cpu_cache = NULL;
> +	int group_id;
> 
>  	pr_debug("creating cache object(s) for CPU %i\n", cpu_id);
> 
> @@ -476,11 +508,13 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id)
>  	if (!cpu_node)
>  		goto out;
> 
> -	cpu_cache = cache_lookup_or_instantiate(cpu_node, 1);
> +	group_id = get_group_id(cpu_id, 1);
> +
> +	cpu_cache = cache_lookup_or_instantiate(cpu_node, group_id, 1);
>  	if (!cpu_cache)
>  		goto out;
> 
> -	do_subsidiary_caches(cpu_cache);
> +	do_subsidiary_caches(cpu_cache, cpu_id);
> 
>  	cache_cpu_set(cpu_cache, cpu_id);
>  out:
> @@ -848,13 +882,15 @@ static struct cache *cache_lookup_by_cpu(unsigned int cpu_id)
>  {
>  	struct device_node *cpu_node;
>  	struct cache *cache;
> +	int group_id;
> 
>  	cpu_node = of_get_cpu_node(cpu_id, NULL);
>  	WARN_ONCE(!cpu_node, "no OF node found for CPU %i\n", cpu_id);
>  	if (!cpu_node)
>  		return NULL;
> 
> -	cache = cache_lookup_by_node(cpu_node);
> +	group_id = get_group_id(cpu_id, 1);
> +	cache = cache_lookup_by_node_group(cpu_node, group_id);
>  	of_node_put(cpu_node);
> 
>  	return cache;
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 447b78a87c8f..a7fcac44a8e2 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -122,14 +122,14 @@ static struct thread_groups_list tgl[NR_CPUS] __initdata;
>   * On big-cores system, thread_group_l1_cache_map for each CPU corresponds to
>   * the set its siblings that share the L1-cache.
>   */
> -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
> +DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map);
> 
>  /*
>   * On some big-cores system, thread_group_l2_cache_map for each CPU
>   * corresponds to the set its siblings within the core that share the
>   * L2-cache.
>   */
> -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> +DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
> 
>  /* SMP operations for this machine */
>  struct smp_ops_t *smp_ops;
> -- 
> 2.26.3
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()
  2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah
@ 2021-08-06  5:44   ` Srikar Dronamraju
  0 siblings, 0 replies; 8+ messages in thread
From: Srikar Dronamraju @ 2021-08-06  5:44 UTC (permalink / raw)
  To: Parth Shah; +Cc: ego, mikey, parths1229, svaidy, linuxppc-dev

* Parth Shah <parth@linux.ibm.com> [2021-07-28 23:26:06]:

> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> 
> The helper function get_shared_cpu_map() was added in
> 
> 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct
> shared_cpu_map on big-cores")'
> 
> and subsequently expanded upon in
> 
> 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling
> map/list for L2 cache")'
> 
> in order to help report the correct groups of threads sharing these caches
> on big-core systems where groups of threads within a core can share
> different sets of caches.
> 
> Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property,
> cache->shared_cpu_map contains the correct set of thread-siblings
> sharing the cache. Hence we no longer need the functions
> get_shared_cpu_map(). This patch removes this function. We also remove
> the helper function index_dir_to_cpu() which was only called by
> get_shared_cpu_map().
> 
> With these functions removed, we can still see the correct
> cache-sibling map/list for L1 and L2 caches on systems with L1 and L2
> caches distributed among groups of threads in a core.
> 
> With this patch, on a SMT8 POWER10 system where the L1 and L2 caches
> are split between the two groups of threads in a core, for CPUs 8,9,
> the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as
> follows:
> 
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15
> 
> $ ppc64_cpu --smt=4
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11
> 
> $ ppc64_cpu --smt=2
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9
> /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9
> /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9
> 
> $ ppc64_cpu --smt=1
> $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list
> /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8
> /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8
> 
> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/cacheinfo.c | 41 +--------------------------------
>  1 file changed, 1 insertion(+), 40 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
> index 5a6925d87424..20d91693eac1 100644
> --- a/arch/powerpc/kernel/cacheinfo.c
> +++ b/arch/powerpc/kernel/cacheinfo.c
> @@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char *
>  static struct kobj_attribute cache_level_attr =
>  	__ATTR(level, 0444, level_show, NULL);
> 
> -static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
> -{
> -	struct kobject *index_dir_kobj = &index->kobj;
> -	struct kobject *cache_dir_kobj = index_dir_kobj->parent;
> -	struct kobject *cpu_dev_kobj = cache_dir_kobj->parent;
> -	struct device *dev = kobj_to_dev(cpu_dev_kobj);
> -
> -	return dev->id;
> -}
> -
> -/*
> - * On big-core systems, each core has two groups of CPUs each of which
> - * has its own L1-cache. The thread-siblings which share l1-cache with
> - * @cpu can be obtained via cpu_smallcore_mask().
> - *
> - * On some big-core systems, the L2 cache is shared only between some
> - * groups of siblings. This is already parsed and encoded in
> - * cpu_l2_cache_mask().
> - *
> - * TODO: cache_lookup_or_instantiate() needs to be made aware of the
> - *       "ibm,thread-groups" property so that cache->shared_cpu_map
> - *       reflects the correct siblings on platforms that have this
> - *       device-tree property. This helper function is only a stop-gap
> - *       solution so that we report the correct siblings to the
> - *       userspace via sysfs.
> - */
> -static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache)
> -{
> -	if (has_big_cores) {
> -		int cpu = index_dir_to_cpu(index);
> -		if (cache->level == 1)
> -			return cpu_smallcore_mask(cpu);
> -		if (cache->level == 2 && thread_group_shares_l2)
> -			return cpu_l2_cache_mask(cpu);
> -	}
> -
> -	return &cache->shared_cpu_map;
> -}
> -
>  static ssize_t
>  show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bool list)
>  {
> @@ -724,7 +685,7 @@ show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bo
>  	index = kobj_to_cache_index_dir(k);
>  	cache = index->cache;
> 
> -	mask = get_shared_cpu_map(index, cache);
> +	mask = &cache->shared_cpu_map;
> 
>  	return cpumap_print_to_pagebuf(list, buf, mask);
>  }
> -- 
> 2.26.3
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property
  2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah
                   ` (2 preceding siblings ...)
  2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah
@ 2021-08-18 13:38 ` Michael Ellerman
  3 siblings, 0 replies; 8+ messages in thread
From: Michael Ellerman @ 2021-08-18 13:38 UTC (permalink / raw)
  To: linuxppc-dev, Parth Shah; +Cc: parths1229, mikey, svaidy, srikar, ego

On Wed, 28 Jul 2021 23:26:04 +0530, Parth Shah wrote:
> Changes from v1 -> v2:
> - Based on Gautham's comments, use a separate thread_group_l3_cache_map
>   and modify parsing code to build cache_map for L3. This makes the
>   cache_map building code isolated from the parsing code.
> v1 can be found at:
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html
> 
> [...]

Applied to powerpc/next.

[1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id
      https://git.kernel.org/powerpc/c/a4bec516b9c0823d7e2bb8c8928c98b535cf9adf
[2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map()
      https://git.kernel.org/powerpc/c/69aa8e078545bc14d84a8b4b3cb914ac8f9f280e
[3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings
      https://git.kernel.org/powerpc/c/e9ef81e1079b0c4c374fba0f9affa7129c7c913b

cheers

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-18 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah
2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah
2021-08-06  5:43   ` Srikar Dronamraju
2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah
2021-08-06  5:44   ` Srikar Dronamraju
2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah
2021-07-30 10:08   ` Gautham R Shenoy
2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).