* [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property @ 2021-07-28 17:56 Parth Shah 2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw) To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy Changes from v1 -> v2: - Based on Gautham's comments, use a separate thread_group_l3_cache_map and modify parsing code to build cache_map for L3. This makes the cache_map building code isolated from the parsing code. v1 can be found at: https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html On POWER10 big-core system, the L3 cache reflected by sysfs contains all the CPUs in the big-core. grep . /sys/devices/system/cpu/cpu0/cache/index*/shared_cpu_list /sys/devices/system/cpu/cpu0/cache/index0/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index1/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list:0-7 In the above example, CPU-0 observes CPU 0-7 in L3 (index3) cache, which is not correct as only the CPUs in small core share the L3 cache. The "ibm,thread-groups" contains property "2" to indicate that the CPUs share both the L2 and L3 caches. This patch-set uses this property to reflect correct L3 topology to a cache-object. After applying this patch-set, the topology looks like: $> ppc64_cpu --smt=8 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11,13,15 $> ppc64_cpu --smt=4 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9,11 $> ppc64_cpu --smt=2 $> grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:9 $> ppc64_cpu --smt=1 grep . /sys/devices/system/cpu/cpu[89]/cache/*/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 Patches Organization: ===================== This patch-set series is based on top of v5.14-rc2 - Patch 1-2: Add functionality to introduce awareness for "ibm,thread-groups". Original (not merged) posted version can be found at: https://lore.kernel.org/linuxppc-dev/1611041780-8640-1-git-send-email-ego@linux.vnet.ibm.co - Patch 3: Use existing L2 cache_map to detect L3 cache siblings Gautham R. Shenoy (2): powerpc/cacheinfo: Lookup cache by dt node and thread-group id powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah (1): powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings arch/powerpc/include/asm/smp.h | 6 ++ arch/powerpc/kernel/cacheinfo.c | 124 ++++++++++++++++---------------- arch/powerpc/kernel/smp.c | 70 ++++++++++++------ 3 files changed, 115 insertions(+), 85 deletions(-) -- 2.26.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id 2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah @ 2021-07-28 17:56 ` Parth Shah 2021-08-06 5:43 ` Srikar Dronamraju 2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah ` (2 subsequent siblings) 3 siblings, 1 reply; 8+ messages in thread From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw) To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> Currently the cacheinfo code on powerpc indexes the "cache" objects (modelling the L1/L2/L3 caches) where the key is device-tree node corresponding to that cache. On some of the POWER server platforms thread-groups within the core share different sets of caches (Eg: On SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and threads 1,3,5,7 of the same core share another L1 cache). On such platforms, there is a single device-tree node corresponding to that cache and the cache-configuration within the threads of the core is indicated via "ibm,thread-groups" device-tree property. Since the current code is not aware of the "ibm,thread-groups" property, on the aforementoined systems, cacheinfo code still treats all the threads in the core to be sharing the cache because of the single device-tree node (In the earlier example, the cacheinfo code would says CPUs 0-7 share L1 cache). In this patch, we make the powerpc cacheinfo code aware of the "ibm,thread-groups" property. We indexe the "cache" objects by the key-pair (device-tree node, thread-group id). For any CPUX, for a given level of cache, the thread-group id is defined to be the first CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels of cache which are not represented in "ibm,thread-groups" property, the thread-group id is -1. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map" and "thread_group_l2_cache_map" to get rid of the compile error.] Signed-off-by: Parth Shah <parth@linux.ibm.com> --- arch/powerpc/include/asm/smp.h | 3 ++ arch/powerpc/kernel/cacheinfo.c | 80 ++++++++++++++++++++++++--------- arch/powerpc/kernel/smp.c | 4 +- 3 files changed, 63 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 03b3d010cbab..1259040cc3a4 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -33,6 +33,9 @@ extern bool coregroup_enabled; extern int cpu_to_chip_id(int cpu); extern int *chip_id_lookup_table; +DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); +DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); + #ifdef CONFIG_SMP struct smp_ops_t { diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 6f903e9aa20b..5a6925d87424 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -120,6 +120,7 @@ struct cache { struct cpumask shared_cpu_map; /* online CPUs using this cache */ int type; /* split cache disambiguation */ int level; /* level not explicit in device tree */ + int group_id; /* id of the group of threads that share this cache */ struct list_head list; /* global list of cache objects */ struct cache *next_local; /* next cache of >= level */ }; @@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache *cache) } static void cache_init(struct cache *cache, int type, int level, - struct device_node *ofnode) + struct device_node *ofnode, int group_id) { cache->type = type; cache->level = level; cache->ofnode = of_node_get(ofnode); + cache->group_id = group_id; INIT_LIST_HEAD(&cache->list); list_add(&cache->list, &cache_list); } -static struct cache *new_cache(int type, int level, struct device_node *ofnode) +static struct cache *new_cache(int type, int level, + struct device_node *ofnode, int group_id) { struct cache *cache; cache = kzalloc(sizeof(*cache), GFP_KERNEL); if (cache) - cache_init(cache, type, level, ofnode); + cache_init(cache, type, level, ofnode, group_id); return cache; } @@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct cache *cache) return cache; list_for_each_entry(iter, &cache_list, list) - if (iter->ofnode == cache->ofnode && iter->next_local == cache) + if (iter->ofnode == cache->ofnode && + iter->group_id == cache->group_id && + iter->next_local == cache) return iter; return cache; } -/* return the first cache on a local list matching node */ -static struct cache *cache_lookup_by_node(const struct device_node *node) +/* return the first cache on a local list matching node and thread-group id */ +static struct cache *cache_lookup_by_node_group(const struct device_node *node, + int group_id) { struct cache *cache = NULL; struct cache *iter; list_for_each_entry(iter, &cache_list, list) { - if (iter->ofnode != node) + if (iter->ofnode != node || + iter->group_id != group_id) continue; cache = cache_find_first_sibling(iter); break; @@ -352,14 +359,15 @@ static int cache_is_unified_d(const struct device_node *np) CACHE_TYPE_UNIFIED_D : CACHE_TYPE_UNIFIED; } -static struct cache *cache_do_one_devnode_unified(struct device_node *node, int level) +static struct cache *cache_do_one_devnode_unified(struct device_node *node, int group_id, + int level) { pr_debug("creating L%d ucache for %pOFP\n", level, node); - return new_cache(cache_is_unified_d(node), level, node); + return new_cache(cache_is_unified_d(node), level, node, group_id); } -static struct cache *cache_do_one_devnode_split(struct device_node *node, +static struct cache *cache_do_one_devnode_split(struct device_node *node, int group_id, int level) { struct cache *dcache, *icache; @@ -367,8 +375,8 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node, pr_debug("creating L%d dcache and icache for %pOFP\n", level, node); - dcache = new_cache(CACHE_TYPE_DATA, level, node); - icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node); + dcache = new_cache(CACHE_TYPE_DATA, level, node, group_id); + icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node, group_id); if (!dcache || !icache) goto err; @@ -382,31 +390,32 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node, return NULL; } -static struct cache *cache_do_one_devnode(struct device_node *node, int level) +static struct cache *cache_do_one_devnode(struct device_node *node, int group_id, int level) { struct cache *cache; if (cache_node_is_unified(node)) - cache = cache_do_one_devnode_unified(node, level); + cache = cache_do_one_devnode_unified(node, group_id, level); else - cache = cache_do_one_devnode_split(node, level); + cache = cache_do_one_devnode_split(node, group_id, level); return cache; } static struct cache *cache_lookup_or_instantiate(struct device_node *node, + int group_id, int level) { struct cache *cache; - cache = cache_lookup_by_node(node); + cache = cache_lookup_by_node_group(node, group_id); WARN_ONCE(cache && cache->level != level, "cache level mismatch on lookup (got %d, expected %d)\n", cache->level, level); if (!cache) - cache = cache_do_one_devnode(node, level); + cache = cache_do_one_devnode(node, group_id, level); return cache; } @@ -443,7 +452,27 @@ static void do_subsidiary_caches_debugcheck(struct cache *cache) of_node_get_device_type(cache->ofnode)); } -static void do_subsidiary_caches(struct cache *cache) +/* + * If sub-groups of threads in a core containing @cpu_id share the + * L@level-cache (information obtained via "ibm,thread-groups" + * device-tree property), then we identify the group by the first + * thread-sibling in the group. We define this to be the group-id. + * + * In the absence of any thread-group information for L@level-cache, + * this function returns -1. + */ +static int get_group_id(unsigned int cpu_id, int level) +{ + if (has_big_cores && level == 1) + return cpumask_first(per_cpu(thread_group_l1_cache_map, + cpu_id)); + else if (thread_group_shares_l2 && level == 2) + return cpumask_first(per_cpu(thread_group_l2_cache_map, + cpu_id)); + return -1; +} + +static void do_subsidiary_caches(struct cache *cache, unsigned int cpu_id) { struct device_node *subcache_node; int level = cache->level; @@ -452,9 +481,11 @@ static void do_subsidiary_caches(struct cache *cache) while ((subcache_node = of_find_next_cache_node(cache->ofnode))) { struct cache *subcache; + int group_id; level++; - subcache = cache_lookup_or_instantiate(subcache_node, level); + group_id = get_group_id(cpu_id, level); + subcache = cache_lookup_or_instantiate(subcache_node, group_id, level); of_node_put(subcache_node); if (!subcache) break; @@ -468,6 +499,7 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id) { struct device_node *cpu_node; struct cache *cpu_cache = NULL; + int group_id; pr_debug("creating cache object(s) for CPU %i\n", cpu_id); @@ -476,11 +508,13 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id) if (!cpu_node) goto out; - cpu_cache = cache_lookup_or_instantiate(cpu_node, 1); + group_id = get_group_id(cpu_id, 1); + + cpu_cache = cache_lookup_or_instantiate(cpu_node, group_id, 1); if (!cpu_cache) goto out; - do_subsidiary_caches(cpu_cache); + do_subsidiary_caches(cpu_cache, cpu_id); cache_cpu_set(cpu_cache, cpu_id); out: @@ -848,13 +882,15 @@ static struct cache *cache_lookup_by_cpu(unsigned int cpu_id) { struct device_node *cpu_node; struct cache *cache; + int group_id; cpu_node = of_get_cpu_node(cpu_id, NULL); WARN_ONCE(!cpu_node, "no OF node found for CPU %i\n", cpu_id); if (!cpu_node) return NULL; - cache = cache_lookup_by_node(cpu_node); + group_id = get_group_id(cpu_id, 1); + cache = cache_lookup_by_node_group(cpu_node, group_id); of_node_put(cpu_node); return cache; diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 447b78a87c8f..a7fcac44a8e2 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -122,14 +122,14 @@ static struct thread_groups_list tgl[NR_CPUS] __initdata; * On big-cores system, thread_group_l1_cache_map for each CPU corresponds to * the set its siblings that share the L1-cache. */ -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); +DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); /* * On some big-cores system, thread_group_l2_cache_map for each CPU * corresponds to the set its siblings within the core that share the * L2-cache. */ -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); +DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); /* SMP operations for this machine */ struct smp_ops_t *smp_ops; -- 2.26.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id 2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah @ 2021-08-06 5:43 ` Srikar Dronamraju 0 siblings, 0 replies; 8+ messages in thread From: Srikar Dronamraju @ 2021-08-06 5:43 UTC (permalink / raw) To: Parth Shah; +Cc: ego, mikey, parths1229, svaidy, linuxppc-dev * Parth Shah <parth@linux.ibm.com> [2021-07-28 23:26:05]: > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > Currently the cacheinfo code on powerpc indexes the "cache" objects > (modelling the L1/L2/L3 caches) where the key is device-tree node > corresponding to that cache. On some of the POWER server platforms > thread-groups within the core share different sets of caches (Eg: On > SMT8 POWER9 systems, threads 0,2,4,6 of a core share L1 cache and > threads 1,3,5,7 of the same core share another L1 cache). On such > platforms, there is a single device-tree node corresponding to that > cache and the cache-configuration within the threads of the core is > indicated via "ibm,thread-groups" device-tree property. > > Since the current code is not aware of the "ibm,thread-groups" > property, on the aforementoined systems, cacheinfo code still treats > all the threads in the core to be sharing the cache because of the > single device-tree node (In the earlier example, the cacheinfo code > would says CPUs 0-7 share L1 cache). > > In this patch, we make the powerpc cacheinfo code aware of the > "ibm,thread-groups" property. We indexe the "cache" objects by the > key-pair (device-tree node, thread-group id). For any CPUX, for a > given level of cache, the thread-group id is defined to be the first > CPU in the "ibm,thread-groups" cache-group containing CPUX. For levels > of cache which are not represented in "ibm,thread-groups" property, > the thread-group id is -1. > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > [parth: Remove "static" keyword for the definition of "thread_group_l1_cache_map" > and "thread_group_l2_cache_map" to get rid of the compile error.] > Signed-off-by: Parth Shah <parth@linux.ibm.com> Looks good to me. Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > --- > arch/powerpc/include/asm/smp.h | 3 ++ > arch/powerpc/kernel/cacheinfo.c | 80 ++++++++++++++++++++++++--------- > arch/powerpc/kernel/smp.c | 4 +- > 3 files changed, 63 insertions(+), 24 deletions(-) > > diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h > index 03b3d010cbab..1259040cc3a4 100644 > --- a/arch/powerpc/include/asm/smp.h > +++ b/arch/powerpc/include/asm/smp.h > @@ -33,6 +33,9 @@ extern bool coregroup_enabled; > extern int cpu_to_chip_id(int cpu); > extern int *chip_id_lookup_table; > > +DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); > +DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); > + > #ifdef CONFIG_SMP > > struct smp_ops_t { > diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c > index 6f903e9aa20b..5a6925d87424 100644 > --- a/arch/powerpc/kernel/cacheinfo.c > +++ b/arch/powerpc/kernel/cacheinfo.c > @@ -120,6 +120,7 @@ struct cache { > struct cpumask shared_cpu_map; /* online CPUs using this cache */ > int type; /* split cache disambiguation */ > int level; /* level not explicit in device tree */ > + int group_id; /* id of the group of threads that share this cache */ > struct list_head list; /* global list of cache objects */ > struct cache *next_local; /* next cache of >= level */ > }; > @@ -142,22 +143,24 @@ static const char *cache_type_string(const struct cache *cache) > } > > static void cache_init(struct cache *cache, int type, int level, > - struct device_node *ofnode) > + struct device_node *ofnode, int group_id) > { > cache->type = type; > cache->level = level; > cache->ofnode = of_node_get(ofnode); > + cache->group_id = group_id; > INIT_LIST_HEAD(&cache->list); > list_add(&cache->list, &cache_list); > } > > -static struct cache *new_cache(int type, int level, struct device_node *ofnode) > +static struct cache *new_cache(int type, int level, > + struct device_node *ofnode, int group_id) > { > struct cache *cache; > > cache = kzalloc(sizeof(*cache), GFP_KERNEL); > if (cache) > - cache_init(cache, type, level, ofnode); > + cache_init(cache, type, level, ofnode, group_id); > > return cache; > } > @@ -309,20 +312,24 @@ static struct cache *cache_find_first_sibling(struct cache *cache) > return cache; > > list_for_each_entry(iter, &cache_list, list) > - if (iter->ofnode == cache->ofnode && iter->next_local == cache) > + if (iter->ofnode == cache->ofnode && > + iter->group_id == cache->group_id && > + iter->next_local == cache) > return iter; > > return cache; > } > > -/* return the first cache on a local list matching node */ > -static struct cache *cache_lookup_by_node(const struct device_node *node) > +/* return the first cache on a local list matching node and thread-group id */ > +static struct cache *cache_lookup_by_node_group(const struct device_node *node, > + int group_id) > { > struct cache *cache = NULL; > struct cache *iter; > > list_for_each_entry(iter, &cache_list, list) { > - if (iter->ofnode != node) > + if (iter->ofnode != node || > + iter->group_id != group_id) > continue; > cache = cache_find_first_sibling(iter); > break; > @@ -352,14 +359,15 @@ static int cache_is_unified_d(const struct device_node *np) > CACHE_TYPE_UNIFIED_D : CACHE_TYPE_UNIFIED; > } > > -static struct cache *cache_do_one_devnode_unified(struct device_node *node, int level) > +static struct cache *cache_do_one_devnode_unified(struct device_node *node, int group_id, > + int level) > { > pr_debug("creating L%d ucache for %pOFP\n", level, node); > > - return new_cache(cache_is_unified_d(node), level, node); > + return new_cache(cache_is_unified_d(node), level, node, group_id); > } > > -static struct cache *cache_do_one_devnode_split(struct device_node *node, > +static struct cache *cache_do_one_devnode_split(struct device_node *node, int group_id, > int level) > { > struct cache *dcache, *icache; > @@ -367,8 +375,8 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node, > pr_debug("creating L%d dcache and icache for %pOFP\n", level, > node); > > - dcache = new_cache(CACHE_TYPE_DATA, level, node); > - icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node); > + dcache = new_cache(CACHE_TYPE_DATA, level, node, group_id); > + icache = new_cache(CACHE_TYPE_INSTRUCTION, level, node, group_id); > > if (!dcache || !icache) > goto err; > @@ -382,31 +390,32 @@ static struct cache *cache_do_one_devnode_split(struct device_node *node, > return NULL; > } > > -static struct cache *cache_do_one_devnode(struct device_node *node, int level) > +static struct cache *cache_do_one_devnode(struct device_node *node, int group_id, int level) > { > struct cache *cache; > > if (cache_node_is_unified(node)) > - cache = cache_do_one_devnode_unified(node, level); > + cache = cache_do_one_devnode_unified(node, group_id, level); > else > - cache = cache_do_one_devnode_split(node, level); > + cache = cache_do_one_devnode_split(node, group_id, level); > > return cache; > } > > static struct cache *cache_lookup_or_instantiate(struct device_node *node, > + int group_id, > int level) > { > struct cache *cache; > > - cache = cache_lookup_by_node(node); > + cache = cache_lookup_by_node_group(node, group_id); > > WARN_ONCE(cache && cache->level != level, > "cache level mismatch on lookup (got %d, expected %d)\n", > cache->level, level); > > if (!cache) > - cache = cache_do_one_devnode(node, level); > + cache = cache_do_one_devnode(node, group_id, level); > > return cache; > } > @@ -443,7 +452,27 @@ static void do_subsidiary_caches_debugcheck(struct cache *cache) > of_node_get_device_type(cache->ofnode)); > } > > -static void do_subsidiary_caches(struct cache *cache) > +/* > + * If sub-groups of threads in a core containing @cpu_id share the > + * L@level-cache (information obtained via "ibm,thread-groups" > + * device-tree property), then we identify the group by the first > + * thread-sibling in the group. We define this to be the group-id. > + * > + * In the absence of any thread-group information for L@level-cache, > + * this function returns -1. > + */ > +static int get_group_id(unsigned int cpu_id, int level) > +{ > + if (has_big_cores && level == 1) > + return cpumask_first(per_cpu(thread_group_l1_cache_map, > + cpu_id)); > + else if (thread_group_shares_l2 && level == 2) > + return cpumask_first(per_cpu(thread_group_l2_cache_map, > + cpu_id)); > + return -1; > +} > + > +static void do_subsidiary_caches(struct cache *cache, unsigned int cpu_id) > { > struct device_node *subcache_node; > int level = cache->level; > @@ -452,9 +481,11 @@ static void do_subsidiary_caches(struct cache *cache) > > while ((subcache_node = of_find_next_cache_node(cache->ofnode))) { > struct cache *subcache; > + int group_id; > > level++; > - subcache = cache_lookup_or_instantiate(subcache_node, level); > + group_id = get_group_id(cpu_id, level); > + subcache = cache_lookup_or_instantiate(subcache_node, group_id, level); > of_node_put(subcache_node); > if (!subcache) > break; > @@ -468,6 +499,7 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id) > { > struct device_node *cpu_node; > struct cache *cpu_cache = NULL; > + int group_id; > > pr_debug("creating cache object(s) for CPU %i\n", cpu_id); > > @@ -476,11 +508,13 @@ static struct cache *cache_chain_instantiate(unsigned int cpu_id) > if (!cpu_node) > goto out; > > - cpu_cache = cache_lookup_or_instantiate(cpu_node, 1); > + group_id = get_group_id(cpu_id, 1); > + > + cpu_cache = cache_lookup_or_instantiate(cpu_node, group_id, 1); > if (!cpu_cache) > goto out; > > - do_subsidiary_caches(cpu_cache); > + do_subsidiary_caches(cpu_cache, cpu_id); > > cache_cpu_set(cpu_cache, cpu_id); > out: > @@ -848,13 +882,15 @@ static struct cache *cache_lookup_by_cpu(unsigned int cpu_id) > { > struct device_node *cpu_node; > struct cache *cache; > + int group_id; > > cpu_node = of_get_cpu_node(cpu_id, NULL); > WARN_ONCE(!cpu_node, "no OF node found for CPU %i\n", cpu_id); > if (!cpu_node) > return NULL; > > - cache = cache_lookup_by_node(cpu_node); > + group_id = get_group_id(cpu_id, 1); > + cache = cache_lookup_by_node_group(cpu_node, group_id); > of_node_put(cpu_node); > > return cache; > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index 447b78a87c8f..a7fcac44a8e2 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -122,14 +122,14 @@ static struct thread_groups_list tgl[NR_CPUS] __initdata; > * On big-cores system, thread_group_l1_cache_map for each CPU corresponds to > * the set its siblings that share the L1-cache. > */ > -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); > +DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); > > /* > * On some big-cores system, thread_group_l2_cache_map for each CPU > * corresponds to the set its siblings within the core that share the > * L2-cache. > */ > -static DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); > +DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); > > /* SMP operations for this machine */ > struct smp_ops_t *smp_ops; > -- > 2.26.3 > -- Thanks and Regards Srikar Dronamraju ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() 2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah 2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah @ 2021-07-28 17:56 ` Parth Shah 2021-08-06 5:44 ` Srikar Dronamraju 2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah 2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman 3 siblings, 1 reply; 8+ messages in thread From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw) To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> The helper function get_shared_cpu_map() was added in 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct shared_cpu_map on big-cores")' and subsequently expanded upon in 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache")' in order to help report the correct groups of threads sharing these caches on big-core systems where groups of threads within a core can share different sets of caches. Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property, cache->shared_cpu_map contains the correct set of thread-siblings sharing the cache. Hence we no longer need the functions get_shared_cpu_map(). This patch removes this function. We also remove the helper function index_dir_to_cpu() which was only called by get_shared_cpu_map(). With these functions removed, we can still see the correct cache-sibling map/list for L1 and L2 caches on systems with L1 and L2 caches distributed among groups of threads in a core. With this patch, on a SMT8 POWER10 system where the L1 and L2 caches are split between the two groups of threads in a core, for CPUs 8,9, the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as follows: $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15 $ ppc64_cpu --smt=4 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11 $ ppc64_cpu --smt=2 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9 /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9 $ ppc64_cpu --smt=1 $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/kernel/cacheinfo.c | 41 +-------------------------------- 1 file changed, 1 insertion(+), 40 deletions(-) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 5a6925d87424..20d91693eac1 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char * static struct kobj_attribute cache_level_attr = __ATTR(level, 0444, level_show, NULL); -static unsigned int index_dir_to_cpu(struct cache_index_dir *index) -{ - struct kobject *index_dir_kobj = &index->kobj; - struct kobject *cache_dir_kobj = index_dir_kobj->parent; - struct kobject *cpu_dev_kobj = cache_dir_kobj->parent; - struct device *dev = kobj_to_dev(cpu_dev_kobj); - - return dev->id; -} - -/* - * On big-core systems, each core has two groups of CPUs each of which - * has its own L1-cache. The thread-siblings which share l1-cache with - * @cpu can be obtained via cpu_smallcore_mask(). - * - * On some big-core systems, the L2 cache is shared only between some - * groups of siblings. This is already parsed and encoded in - * cpu_l2_cache_mask(). - * - * TODO: cache_lookup_or_instantiate() needs to be made aware of the - * "ibm,thread-groups" property so that cache->shared_cpu_map - * reflects the correct siblings on platforms that have this - * device-tree property. This helper function is only a stop-gap - * solution so that we report the correct siblings to the - * userspace via sysfs. - */ -static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache) -{ - if (has_big_cores) { - int cpu = index_dir_to_cpu(index); - if (cache->level == 1) - return cpu_smallcore_mask(cpu); - if (cache->level == 2 && thread_group_shares_l2) - return cpu_l2_cache_mask(cpu); - } - - return &cache->shared_cpu_map; -} - static ssize_t show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bool list) { @@ -724,7 +685,7 @@ show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bo index = kobj_to_cache_index_dir(k); cache = index->cache; - mask = get_shared_cpu_map(index, cache); + mask = &cache->shared_cpu_map; return cpumap_print_to_pagebuf(list, buf, mask); } -- 2.26.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() 2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah @ 2021-08-06 5:44 ` Srikar Dronamraju 0 siblings, 0 replies; 8+ messages in thread From: Srikar Dronamraju @ 2021-08-06 5:44 UTC (permalink / raw) To: Parth Shah; +Cc: ego, mikey, parths1229, svaidy, linuxppc-dev * Parth Shah <parth@linux.ibm.com> [2021-07-28 23:26:06]: > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> > > The helper function get_shared_cpu_map() was added in > > 'commit 500fe5f550ec ("powerpc/cacheinfo: Report the correct > shared_cpu_map on big-cores")' > > and subsequently expanded upon in > > 'commit 0be47634db0b ("powerpc/cacheinfo: Print correct cache-sibling > map/list for L2 cache")' > > in order to help report the correct groups of threads sharing these caches > on big-core systems where groups of threads within a core can share > different sets of caches. > > Now that powerpc/cacheinfo is aware of "ibm,thread-groups" property, > cache->shared_cpu_map contains the correct set of thread-siblings > sharing the cache. Hence we no longer need the functions > get_shared_cpu_map(). This patch removes this function. We also remove > the helper function index_dir_to_cpu() which was only called by > get_shared_cpu_map(). > > With these functions removed, we can still see the correct > cache-sibling map/list for L1 and L2 caches on systems with L1 and L2 > caches distributed among groups of threads in a core. > > With this patch, on a SMT8 POWER10 system where the L1 and L2 caches > are split between the two groups of threads in a core, for CPUs 8,9, > the L1-Data, L1-Instruction, L2, L3 cache CPU sibling list is as > follows: > > $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10,12,14 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10,12,14 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10,12,14 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-15 > /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11,13,15 > /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11,13,15 > /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11,13,15 > /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-15 > > $ ppc64_cpu --smt=4 > $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8,10 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8,10 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8,10 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-11 > /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9,11 > /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9,11 > /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9,11 > /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-11 > > $ ppc64_cpu --smt=2 > $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8-9 > /sys/devices/system/cpu/cpu9/cache/index0/shared_cpu_list:9 > /sys/devices/system/cpu/cpu9/cache/index1/shared_cpu_list:9 > /sys/devices/system/cpu/cpu9/cache/index2/shared_cpu_list:9 > /sys/devices/system/cpu/cpu9/cache/index3/shared_cpu_list:8-9 > > $ ppc64_cpu --smt=1 > $ grep . /sys/devices/system/cpu/cpu[89]/cache/index[0123]/shared_cpu_list > /sys/devices/system/cpu/cpu8/cache/index0/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index1/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index2/shared_cpu_list:8 > /sys/devices/system/cpu/cpu8/cache/index3/shared_cpu_list:8 > > Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Looks good to me. Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > --- > arch/powerpc/kernel/cacheinfo.c | 41 +-------------------------------- > 1 file changed, 1 insertion(+), 40 deletions(-) > > diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c > index 5a6925d87424..20d91693eac1 100644 > --- a/arch/powerpc/kernel/cacheinfo.c > +++ b/arch/powerpc/kernel/cacheinfo.c > @@ -675,45 +675,6 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char * > static struct kobj_attribute cache_level_attr = > __ATTR(level, 0444, level_show, NULL); > > -static unsigned int index_dir_to_cpu(struct cache_index_dir *index) > -{ > - struct kobject *index_dir_kobj = &index->kobj; > - struct kobject *cache_dir_kobj = index_dir_kobj->parent; > - struct kobject *cpu_dev_kobj = cache_dir_kobj->parent; > - struct device *dev = kobj_to_dev(cpu_dev_kobj); > - > - return dev->id; > -} > - > -/* > - * On big-core systems, each core has two groups of CPUs each of which > - * has its own L1-cache. The thread-siblings which share l1-cache with > - * @cpu can be obtained via cpu_smallcore_mask(). > - * > - * On some big-core systems, the L2 cache is shared only between some > - * groups of siblings. This is already parsed and encoded in > - * cpu_l2_cache_mask(). > - * > - * TODO: cache_lookup_or_instantiate() needs to be made aware of the > - * "ibm,thread-groups" property so that cache->shared_cpu_map > - * reflects the correct siblings on platforms that have this > - * device-tree property. This helper function is only a stop-gap > - * solution so that we report the correct siblings to the > - * userspace via sysfs. > - */ > -static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache) > -{ > - if (has_big_cores) { > - int cpu = index_dir_to_cpu(index); > - if (cache->level == 1) > - return cpu_smallcore_mask(cpu); > - if (cache->level == 2 && thread_group_shares_l2) > - return cpu_l2_cache_mask(cpu); > - } > - > - return &cache->shared_cpu_map; > -} > - > static ssize_t > show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bool list) > { > @@ -724,7 +685,7 @@ show_shared_cpumap(struct kobject *k, struct kobj_attribute *attr, char *buf, bo > index = kobj_to_cache_index_dir(k); > cache = index->cache; > > - mask = get_shared_cpu_map(index, cache); > + mask = &cache->shared_cpu_map; > > return cpumap_print_to_pagebuf(list, buf, mask); > } > -- > 2.26.3 > -- Thanks and Regards Srikar Dronamraju ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings 2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah 2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah 2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah @ 2021-07-28 17:56 ` Parth Shah 2021-07-30 10:08 ` Gautham R Shenoy 2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman 3 siblings, 1 reply; 8+ messages in thread From: Parth Shah @ 2021-07-28 17:56 UTC (permalink / raw) To: linuxppc-dev; +Cc: ego, mikey, srikar, parths1229, svaidy On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus in thread-group share both L2 and L3 caches. Hence, use cache_property = 2 itself to find both the L2 and L3 cache siblings. Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings, but fill the mask using same property "2" array. Signed-off-by: Parth Shah <parth@linux.ibm.com> --- arch/powerpc/include/asm/smp.h | 3 ++ arch/powerpc/kernel/cacheinfo.c | 3 ++ arch/powerpc/kernel/smp.c | 66 ++++++++++++++++++++++----------- 3 files changed, 51 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index 1259040cc3a4..7ef1cd8168a0 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -35,6 +35,7 @@ extern int *chip_id_lookup_table; DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); +DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); #ifdef CONFIG_SMP @@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu); extern bool has_big_cores; extern bool thread_group_shares_l2; +extern bool thread_group_shares_l3; #define cpu_smt_mask cpu_smt_mask #ifdef CONFIG_SCHED_SMT @@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu); #define hard_smp_processor_id() get_hard_smp_processor_id(0) #define smp_setup_cpu_maps() #define thread_group_shares_l2 0 +#define thread_group_shares_l3 0 static inline void inhibit_secondary_onlining(void) {} static inline void uninhibit_secondary_onlining(void) {} static inline const struct cpumask *cpu_sibling_mask(int cpu) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 20d91693eac1..cf1be75b7833 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level) else if (thread_group_shares_l2 && level == 2) return cpumask_first(per_cpu(thread_group_l2_cache_map, cpu_id)); + else if (thread_group_shares_l3 && level == 3) + return cpumask_first(per_cpu(thread_group_l3_cache_map, + cpu_id)); return -1; } diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index a7fcac44a8e2..f2abd88e0c25 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -78,6 +78,7 @@ struct task_struct *secondary_current; bool has_big_cores; bool coregroup_enabled; bool thread_group_shares_l2; +bool thread_group_shares_l3; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map); @@ -101,7 +102,7 @@ enum { #define MAX_THREAD_LIST_SIZE 8 #define THREAD_GROUP_SHARE_L1 1 -#define THREAD_GROUP_SHARE_L2 2 +#define THREAD_GROUP_SHARE_L2_L3 2 struct thread_groups { unsigned int property; unsigned int nr_groups; @@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); */ DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); +/* + * On P10, thread_group_l3_cache_map for each CPU is equal to the + * thread_group_l2_cache_map + */ +DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); + /* SMP operations for this machine */ struct smp_ops_t *smp_ops; @@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int cpu, return tg; } +static int update_mask_from_threadgroup(cpumask_var_t *mask, struct thread_groups *tg, int cpu, int cpu_group_start) +{ + int first_thread = cpu_first_thread_sibling(cpu); + int i; + + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); + + for (i = first_thread; i < first_thread + threads_per_core; i++) { + int i_group_start = get_cpu_thread_group_start(i, tg); + + if (unlikely(i_group_start == -1)) { + WARN_ON_ONCE(1); + return -ENODATA; + } + + if (i_group_start == cpu_group_start) + cpumask_set_cpu(i, *mask); + } + + return 0; +} + static int __init init_thread_group_cache_map(int cpu, int cache_property) { - int first_thread = cpu_first_thread_sibling(cpu); - int i, cpu_group_start = -1, err = 0; + int cpu_group_start = -1, err = 0; struct thread_groups *tg = NULL; cpumask_var_t *mask = NULL; if (cache_property != THREAD_GROUP_SHARE_L1 && - cache_property != THREAD_GROUP_SHARE_L2) + cache_property != THREAD_GROUP_SHARE_L2_L3) return -EINVAL; tg = get_thread_groups(cpu, cache_property, &err); + if (!tg) return err; @@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property) return -ENODATA; } - if (cache_property == THREAD_GROUP_SHARE_L1) + if (cache_property == THREAD_GROUP_SHARE_L1) { mask = &per_cpu(thread_group_l1_cache_map, cpu); - else if (cache_property == THREAD_GROUP_SHARE_L2) + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); + } + else if (cache_property == THREAD_GROUP_SHARE_L2_L3) { mask = &per_cpu(thread_group_l2_cache_map, cpu); - - zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); - - for (i = first_thread; i < first_thread + threads_per_core; i++) { - int i_group_start = get_cpu_thread_group_start(i, tg); - - if (unlikely(i_group_start == -1)) { - WARN_ON_ONCE(1); - return -ENODATA; - } - - if (i_group_start == cpu_group_start) - cpumask_set_cpu(i, *mask); + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); + mask = &per_cpu(thread_group_l3_cache_map, cpu); + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); } + return 0; } @@ -1020,14 +1042,16 @@ static int __init init_big_cores(void) has_big_cores = true; for_each_possible_cpu(cpu) { - int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2); + int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2_L3); if (err) return err; } thread_group_shares_l2 = true; - pr_debug("L2 cache only shared by the threads in the small core\n"); + thread_group_shares_l3 = true; + pr_debug("L2/L3 cache only shared by the threads in the small core\n"); + return 0; } -- 2.26.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings 2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah @ 2021-07-30 10:08 ` Gautham R Shenoy 0 siblings, 0 replies; 8+ messages in thread From: Gautham R Shenoy @ 2021-07-30 10:08 UTC (permalink / raw) To: Parth Shah; +Cc: ego, mikey, srikar, parths1229, svaidy, linuxppc-dev On Wed, Jul 28, 2021 at 11:26:07PM +0530, Parth Shah wrote: > On POWER10 systems, the "ibm,thread-groups" property "2" indicates the cpus > in thread-group share both L2 and L3 caches. Hence, use cache_property = 2 > itself to find both the L2 and L3 cache siblings. > Hence, create a new thread_group_l3_cache_map to keep list of L3 siblings, > but fill the mask using same property "2" array. This version looks good to me. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> > > Signed-off-by: Parth Shah <parth@linux.ibm.com> > --- > arch/powerpc/include/asm/smp.h | 3 ++ > arch/powerpc/kernel/cacheinfo.c | 3 ++ > arch/powerpc/kernel/smp.c | 66 ++++++++++++++++++++++----------- > 3 files changed, 51 insertions(+), 21 deletions(-) > > diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h > index 1259040cc3a4..7ef1cd8168a0 100644 > --- a/arch/powerpc/include/asm/smp.h > +++ b/arch/powerpc/include/asm/smp.h > @@ -35,6 +35,7 @@ extern int *chip_id_lookup_table; > > DECLARE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); > DECLARE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); > +DECLARE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); > > #ifdef CONFIG_SMP > > @@ -144,6 +145,7 @@ extern int cpu_to_core_id(int cpu); > > extern bool has_big_cores; > extern bool thread_group_shares_l2; > +extern bool thread_group_shares_l3; > > #define cpu_smt_mask cpu_smt_mask > #ifdef CONFIG_SCHED_SMT > @@ -198,6 +200,7 @@ extern void __cpu_die(unsigned int cpu); > #define hard_smp_processor_id() get_hard_smp_processor_id(0) > #define smp_setup_cpu_maps() > #define thread_group_shares_l2 0 > +#define thread_group_shares_l3 0 > static inline void inhibit_secondary_onlining(void) {} > static inline void uninhibit_secondary_onlining(void) {} > static inline const struct cpumask *cpu_sibling_mask(int cpu) > diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c > index 20d91693eac1..cf1be75b7833 100644 > --- a/arch/powerpc/kernel/cacheinfo.c > +++ b/arch/powerpc/kernel/cacheinfo.c > @@ -469,6 +469,9 @@ static int get_group_id(unsigned int cpu_id, int level) > else if (thread_group_shares_l2 && level == 2) > return cpumask_first(per_cpu(thread_group_l2_cache_map, > cpu_id)); > + else if (thread_group_shares_l3 && level == 3) > + return cpumask_first(per_cpu(thread_group_l3_cache_map, > + cpu_id)); > return -1; > } > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index a7fcac44a8e2..f2abd88e0c25 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -78,6 +78,7 @@ struct task_struct *secondary_current; > bool has_big_cores; > bool coregroup_enabled; > bool thread_group_shares_l2; > +bool thread_group_shares_l3; > > DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); > DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map); > @@ -101,7 +102,7 @@ enum { > > #define MAX_THREAD_LIST_SIZE 8 > #define THREAD_GROUP_SHARE_L1 1 > -#define THREAD_GROUP_SHARE_L2 2 > +#define THREAD_GROUP_SHARE_L2_L3 2 > struct thread_groups { > unsigned int property; > unsigned int nr_groups; > @@ -131,6 +132,12 @@ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); > */ > DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); > > +/* > + * On P10, thread_group_l3_cache_map for each CPU is equal to the > + * thread_group_l2_cache_map > + */ > +DEFINE_PER_CPU(cpumask_var_t, thread_group_l3_cache_map); > + > /* SMP operations for this machine */ > struct smp_ops_t *smp_ops; > > @@ -889,19 +896,41 @@ static struct thread_groups *__init get_thread_groups(int cpu, > return tg; > } > > +static int update_mask_from_threadgroup(cpumask_var_t *mask, struct thread_groups *tg, int cpu, int cpu_group_start) > +{ > + int first_thread = cpu_first_thread_sibling(cpu); > + int i; > + > + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); > + > + for (i = first_thread; i < first_thread + threads_per_core; i++) { > + int i_group_start = get_cpu_thread_group_start(i, tg); > + > + if (unlikely(i_group_start == -1)) { > + WARN_ON_ONCE(1); > + return -ENODATA; > + } > + > + if (i_group_start == cpu_group_start) > + cpumask_set_cpu(i, *mask); > + } > + > + return 0; > +} > + > static int __init init_thread_group_cache_map(int cpu, int cache_property) > > { > - int first_thread = cpu_first_thread_sibling(cpu); > - int i, cpu_group_start = -1, err = 0; > + int cpu_group_start = -1, err = 0; > struct thread_groups *tg = NULL; > cpumask_var_t *mask = NULL; > > if (cache_property != THREAD_GROUP_SHARE_L1 && > - cache_property != THREAD_GROUP_SHARE_L2) > + cache_property != THREAD_GROUP_SHARE_L2_L3) > return -EINVAL; > > tg = get_thread_groups(cpu, cache_property, &err); > + > if (!tg) > return err; > > @@ -912,25 +941,18 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property) > return -ENODATA; > } > > - if (cache_property == THREAD_GROUP_SHARE_L1) > + if (cache_property == THREAD_GROUP_SHARE_L1) { > mask = &per_cpu(thread_group_l1_cache_map, cpu); > - else if (cache_property == THREAD_GROUP_SHARE_L2) > + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); > + } > + else if (cache_property == THREAD_GROUP_SHARE_L2_L3) { > mask = &per_cpu(thread_group_l2_cache_map, cpu); > - > - zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); > - > - for (i = first_thread; i < first_thread + threads_per_core; i++) { > - int i_group_start = get_cpu_thread_group_start(i, tg); > - > - if (unlikely(i_group_start == -1)) { > - WARN_ON_ONCE(1); > - return -ENODATA; > - } > - > - if (i_group_start == cpu_group_start) > - cpumask_set_cpu(i, *mask); > + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); > + mask = &per_cpu(thread_group_l3_cache_map, cpu); > + update_mask_from_threadgroup(mask, tg, cpu, cpu_group_start); > } > > + > return 0; > } > > @@ -1020,14 +1042,16 @@ static int __init init_big_cores(void) > has_big_cores = true; > > for_each_possible_cpu(cpu) { > - int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2); > + int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2_L3); > > if (err) > return err; > } > > thread_group_shares_l2 = true; > - pr_debug("L2 cache only shared by the threads in the small core\n"); > + thread_group_shares_l3 = true; > + pr_debug("L2/L3 cache only shared by the threads in the small core\n"); > + > return 0; > } > > -- > 2.26.3 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property 2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah ` (2 preceding siblings ...) 2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah @ 2021-08-18 13:38 ` Michael Ellerman 3 siblings, 0 replies; 8+ messages in thread From: Michael Ellerman @ 2021-08-18 13:38 UTC (permalink / raw) To: linuxppc-dev, Parth Shah; +Cc: parths1229, mikey, svaidy, srikar, ego On Wed, 28 Jul 2021 23:26:04 +0530, Parth Shah wrote: > Changes from v1 -> v2: > - Based on Gautham's comments, use a separate thread_group_l3_cache_map > and modify parsing code to build cache_map for L3. This makes the > cache_map building code isolated from the parsing code. > v1 can be found at: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2021-June/230680.html > > [...] Applied to powerpc/next. [1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id https://git.kernel.org/powerpc/c/a4bec516b9c0823d7e2bb8c8928c98b535cf9adf [2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() https://git.kernel.org/powerpc/c/69aa8e078545bc14d84a8b4b3cb914ac8f9f280e [3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings https://git.kernel.org/powerpc/c/e9ef81e1079b0c4c374fba0f9affa7129c7c913b cheers ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-08-18 13:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-28 17:56 [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Parth Shah 2021-07-28 17:56 ` [PATCHv2 1/3] powerpc/cacheinfo: Lookup cache by dt node and thread-group id Parth Shah 2021-08-06 5:43 ` Srikar Dronamraju 2021-07-28 17:56 ` [PATCHv2 2/3] powerpc/cacheinfo: Remove the redundant get_shared_cpu_map() Parth Shah 2021-08-06 5:44 ` Srikar Dronamraju 2021-07-28 17:56 ` [PATCHv2 3/3] powerpc/smp: Use existing L2 cache_map cpumask to find L3 cache siblings Parth Shah 2021-07-30 10:08 ` Gautham R Shenoy 2021-08-18 13:38 ` [PATCHv2 0/3] Subject: [PATCHv2 0/3] Make cache-object aware of L3 siblings by parsing "ibm, thread-groups" property Michael Ellerman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).