* [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information @ 2020-12-09 17:08 Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 1/5] powerpc/smp: Parse ibm, thread-groups with multiple properties Gautham R. Shenoy ` (4 more replies) 0 siblings, 5 replies; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> Hi, This is the v2 of the patchset to extend parsing of "ibm,thread-groups" property to discover the Shared-L2 cache information. The v1 can be found here : https://lore.kernel.org/linuxppc-dev/1607057327-29822-1-git-send-email-ego@linux.vnet.ibm.com/T/#m0fabffa1ea1a2807b362f25c849bb19415216520 The key changes from v1 are as follows to incorporate the review comments from Srikar and fix a build error on !PPC64 configs reported by the kernel bot. * Split Patch 1 into three patches * First patch ensure that parse_thread_groups() is made generic to support more than one property. * Second patch renames cpu_l1_cache_map as thread_group_l1_cache_map for consistency. No functional impact. * The third patch makes init_thread_group_l1_cache_map() generic. No functional impact. * Patch 2 (Now patch 4): Incorporates the review comments from Srikar simplifying the changes to update_mask_by_l2() * Patch 3 (Now patch 5): Fix a build errors for 32-bit configs reported by the kernel build bot. Description of the Patchset =========================== The "ibm,thread-groups" device-tree property is an array that is used to indicate if groups of threads within a core share certain properties. It provides details of which property is being shared by which groups of threads. This array can encode information about multiple properties being shared by different thread-groups within the core. Example: Suppose, "ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15] This can be decomposed up into two consecutive arrays: a) [1,2,4,8,10,12,14,9,11,13,15] b) [2,2,4,8,10,12,14,9,11,13,15] where in, a) provides information of Property "1" being shared by "2" groups, each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of the second group is {9,11,13,15}. Property "1" is indicative of the thread in the group sharing L1 cache, translation cache and Instruction Data flow. b) provides information of Property "2" being shared by "2" groups, each group with "4" threads. The "ibm,ppc-interrupt-server#s" of the first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of the second group is {9,11,13,15}. Property "2" indicates that the threads in each group share the L2-cache. The existing code assumes that the "ibm,thread-groups" encodes information about only one property. Hence even on platforms which encode information about multiple properties being shared by the corresponding groups of threads, the current code will only pick the first one. (In the above example, it will only consider [1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]). Furthermore, currently on platforms where groups of threads share L2 cache, we incorrectly create an extra CACHE level sched-domain that maps to all the threads of the core. For example, if "ibm,thread-groups" is 00000001 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 00000002 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 then, the sub-array [00000002 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007] indicates that L2 (Property "2") is shared only between the threads of a single group. There are "2" groups of threads where each group contains "4" threads each. The groups being {0,2,4,6} and {1,3,5,7}. However, the sched-domain hierarchy for CPUs 0,1 is CPU0 attaching sched-domain(s): domain-0: span=0,2,4,6 level=SMT domain-1: span=0-7 level=CACHE domain-2: span=0-15,24-39,48-55 level=MC domain-3: span=0-55 level=DIE CPU1 attaching sched-domain(s): domain-0: span=1,3,5,7 level=SMT domain-1: span=0-7 level=CACHE domain-2: span=0-15,24-39,48-55 level=MC domain-3: span=0-55 level=DIE where the CACHE domain reports that L2 is shared across the entire core which is incorrect on such platforms. This patchset remedies these issues by extending the parsing support for "ibm,thread-groups" to discover information about multiple properties being shared by the corresponding groups of threads. In particular we cano now detect if the groups of threads within a core share the L2-cache. On such platforms, we populate the populating the cpu_l2_cache_mask of every CPU to the core-siblings which share L2 with the CPU as specified in the by the "ibm,thread-groups" property array. With the patchset, the sched-domain hierarchy is correctly reported. For eg for CPUs 0,1, with the patchset CPU0 attaching sched-domain(s): domain-0: span=0,2,4,6 level=SMT domain-1: span=0-15,24-39,48-55 level=MC domain-2: span=0-55 level=DIE CPU1 attaching sched-domain(s): domain-0: span=1,3,5,7 level=SMT domain-1: span=0-15,24-39,48-55 level=MC domain-2: span=0-55 level=DIE The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1 resp.) gets degenerated into the SMT domain. Furthermore, the last-level-cache domain gets correctly set to the SMT sched-domain. Testing ========== With the producer-consumer testcase(https://github.com/gautshen/misc/tree/master/producer_consumer) where in the producer thread performs writes to 4096 random locations, and the consumer thread subsequently reads from those 4096 random location. We measure the time taken by the consumer to finish the 4096 reads (called an iteration of the consumer). Thus lower the value, better is the result. The best case occurs when the producer and consumer are affined to the same L2 cache domain (Eg: CPU0, CPU2). On the platform with the thread-groups sharing L2, |-----------------------------------------------| | Without Patch | |-----------|-----------|-----------------------| | Producer | Consumer | Avg time per Consumer | | Affinity | Affinity | Iteration | |-----------|-----------|-----------------------| | CPU0 | CPU2 | 235us | |-----------|-----------|-----------------------| |Not affined|Not affined| 347us | |-----------------------------------------------| We see that out-of-box, the average time per consumer iteration is higher since the tasks can be placed anywhere within the core without them being in the L2 domain. |-----------------------------------------------| | With Patch | |-----------|-----------|-----------------------| | Producer | Consumer | Avg time per Consumer | | Affinity | Affinity | Iteration | |-----------|-----------|-----------------------| | CPU0 | CPU2 | 235us | |-----------|-----------|-----------------------| |Not affined|Not affined| 236us | |-----------------------------------------------| With the patch, since the L2 domain is correctly identified, the scheduler does the right thing by co-locating the producer and consumer on the same L2 domain, thereby yielding the out-of-box performance matching the best case. Finally, this patchset reports the correct shared_cpu_map/list in the sysfs for L2 cache on such platforms. With the patchset for CPUs0, 1, for L2 cache we see the correct shared_cpu_map/list /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa The patchset has been tested on older platforms which encode only the L1 sharing information via "ibm,thread-groups" and there is no regression found. Gautham R. Shenoy (5): powerpc/smp: Parse ibm,thread-groups with multiple properties powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic powerpc/smp: Add support detecting thread-groups sharing L2 cache powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache arch/powerpc/include/asm/smp.h | 1 + arch/powerpc/kernel/cacheinfo.c | 34 ++++-- arch/powerpc/kernel/smp.c | 241 ++++++++++++++++++++++++++++------------ 3 files changed, 197 insertions(+), 79 deletions(-) -- 1.9.4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 1/5] powerpc/smp: Parse ibm, thread-groups with multiple properties 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy @ 2020-12-09 17:08 ` Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 2/5] powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map Gautham R. Shenoy ` (3 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> The "ibm,thread-groups" device-tree property is an array that is used to indicate if groups of threads within a core share certain properties. It provides details of which property is being shared by which groups of threads. This array can encode information about multiple properties being shared by different thread-groups within the core. Example: Suppose, "ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15] This can be decomposed up into two consecutive arrays: a) [1,2,4,8,10,12,14,9,11,13,15] b) [2,2,4,8,10,12,14,9,11,13,15] where in, a) provides information of Property "1" being shared by "2" groups, each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of the second group is {9,11,13,15}. Property "1" is indicative of the thread in the group sharing L1 cache, translation cache and Instruction Data flow. b) provides information of Property "2" being shared by "2" groups, each group with "4" threads. The "ibm,ppc-interrupt-server#s" of the first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of the second group is {9,11,13,15}. Property "2" indicates that the threads in each group share the L2-cache. The existing code assumes that the "ibm,thread-groups" encodes information about only one property. Hence even on platforms which encode information about multiple properties being shared by the corresponding groups of threads, the current code will only pick the first one. (In the above example, it will only consider [1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]). This patch extends the parsing support on platforms which encode information about multiple properties being shared by the corresponding groups of threads. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/kernel/smp.c | 174 ++++++++++++++++++++++++++++++---------------- 1 file changed, 113 insertions(+), 61 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 8c2857c..88d88ad 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -106,6 +106,15 @@ struct thread_groups { unsigned int thread_list[MAX_THREAD_LIST_SIZE]; }; +/* Maximum number of properties that groups of threads within a core can share */ +#define MAX_THREAD_GROUP_PROPERTIES 1 + +struct thread_groups_list { + unsigned int nr_properties; + struct thread_groups property_tgs[MAX_THREAD_GROUP_PROPERTIES]; +}; + +static struct thread_groups_list tgl[NR_CPUS] __initdata; /* * On big-cores system, cpu_l1_cache_map for each CPU corresponds to * the set its siblings that share the L1-cache. @@ -695,81 +704,98 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int), /* * parse_thread_groups: Parses the "ibm,thread-groups" device tree * property for the CPU device node @dn and stores - * the parsed output in the thread_groups - * structure @tg if the ibm,thread-groups[0] - * matches @property. + * the parsed output in the thread_groups_list + * structure @tglp. * * @dn: The device node of the CPU device. - * @tg: Pointer to a thread group structure into which the parsed + * @tglp: Pointer to a thread group list structure into which the parsed * output of "ibm,thread-groups" is stored. - * @property: The property of the thread-group that the caller is - * interested in. * * ibm,thread-groups[0..N-1] array defines which group of threads in * the CPU-device node can be grouped together based on the property. * - * ibm,thread-groups[0] tells us the property based on which the + * This array can represent thread groupings for multiple properties. + * + * ibm,thread-groups[i + 0] tells us the property based on which the * threads are being grouped together. If this value is 1, it implies * that the threads in the same group share L1, translation cache. * - * ibm,thread-groups[1] tells us how many such thread groups exist. + * ibm,thread-groups[i+1] tells us how many such thread groups exist for the + * property ibm,thread-groups[i] * - * ibm,thread-groups[2] tells us the number of threads in each such + * ibm,thread-groups[i+2] tells us the number of threads in each such * group. + * Suppose k = (ibm,thread-groups[i+1] * ibm,thread-groups[i+2]), then, * - * ibm,thread-groups[3..N-1] is the list of threads identified by + * ibm,thread-groups[i+3..i+k+2] (is the list of threads identified by * "ibm,ppc-interrupt-server#s" arranged as per their membership in * the grouping. * - * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it - * implies that there are 2 groups of 4 threads each, where each group - * of threads share L1, translation cache. + * Example: + * If "ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15] + * This can be decomposed up into two consecutive arrays: + * a) [1,2,4,8,10,12,14,9,11,13,15] + * b) [2,2,4,8,10,12,14,9,11,13,15] + * + * where in, + * + * a) provides information of Property "1" being shared by "2" groups, + * each with "4" threads each. The "ibm,ppc-interrupt-server#s" of + * the first group is {8,10,12,14} and the + * "ibm,ppc-interrupt-server#s" of the second group is + * {9,11,13,15}. Property "1" is indicative of the thread in the + * group sharing L1 cache, translation cache and Instruction Data + * flow. * - * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8} - * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10, - * 11, 12} structure + * b) provides information of Property "2" being shared by "2" groups, + * each group with "4" threads. The "ibm,ppc-interrupt-server#s" of + * the first group is {8,10,12,14} and the + * "ibm,ppc-interrupt-server#s" of the second group is + * {9,11,13,15}. Property "2" indicates that the threads in each + * group share the L2-cache. * * Returns 0 on success, -EINVAL if the property does not exist, * -ENODATA if property does not have a value, and -EOVERFLOW if the * property data isn't large enough. */ static int parse_thread_groups(struct device_node *dn, - struct thread_groups *tg, - unsigned int property) + struct thread_groups_list *tglp) { - int i; - u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE]; - u32 *thread_list; + unsigned int property_idx = 0; + u32 *thread_group_array; size_t total_threads; - int ret; + int ret = 0, count; + u32 *thread_list; + int i = 0; + count = of_property_count_u32_elems(dn, "ibm,thread-groups"); + thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL); ret = of_property_read_u32_array(dn, "ibm,thread-groups", - thread_group_array, 3); + thread_group_array, count); if (ret) - return ret; - - tg->property = thread_group_array[0]; - tg->nr_groups = thread_group_array[1]; - tg->threads_per_group = thread_group_array[2]; - if (tg->property != property || - tg->nr_groups < 1 || - tg->threads_per_group < 1) - return -ENODATA; + goto out_free; - total_threads = tg->nr_groups * tg->threads_per_group; + while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) { + int j; + struct thread_groups *tg = &tglp->property_tgs[property_idx++]; - ret = of_property_read_u32_array(dn, "ibm,thread-groups", - thread_group_array, - 3 + total_threads); - if (ret) - return ret; + tg->property = thread_group_array[i]; + tg->nr_groups = thread_group_array[i + 1]; + tg->threads_per_group = thread_group_array[i + 2]; + total_threads = tg->nr_groups * tg->threads_per_group; - thread_list = &thread_group_array[3]; + thread_list = &thread_group_array[i + 3]; - for (i = 0 ; i < total_threads; i++) - tg->thread_list[i] = thread_list[i]; + for (j = 0; j < total_threads; j++) + tg->thread_list[j] = thread_list[j]; + i = i + 3 + total_threads; + } - return 0; + tglp->nr_properties = property_idx; + +out_free: + kfree(thread_group_array); + return ret; } /* @@ -805,50 +831,76 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg) return -1; } +static struct thread_groups *__init get_thread_groups(int cpu, + int group_property, + int *err) +{ + struct device_node *dn = of_get_cpu_node(cpu, NULL); + struct thread_groups_list *cpu_tgl = &tgl[cpu]; + struct thread_groups *tg = NULL; + int i; + *err = 0; + + if (!dn) { + *err = -ENODATA; + return NULL; + } + + if (!cpu_tgl->nr_properties) { + *err = parse_thread_groups(dn, cpu_tgl); + if (*err) + goto out; + } + + for (i = 0; i < cpu_tgl->nr_properties; i++) { + if (cpu_tgl->property_tgs[i].property == group_property) { + tg = &cpu_tgl->property_tgs[i]; + break; + } + } + + if (!tg) + *err = -EINVAL; +out: + of_node_put(dn); + return tg; +} + static int init_cpu_l1_cache_map(int cpu) { - struct device_node *dn = of_get_cpu_node(cpu, NULL); - struct thread_groups tg = {.property = 0, - .nr_groups = 0, - .threads_per_group = 0}; int first_thread = cpu_first_thread_sibling(cpu); int i, cpu_group_start = -1, err = 0; + struct thread_groups *tg = NULL; - if (!dn) - return -ENODATA; - - err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1); - if (err) - goto out; + tg = get_thread_groups(cpu, THREAD_GROUP_SHARE_L1, + &err); + if (!tg) + return err; - cpu_group_start = get_cpu_thread_group_start(cpu, &tg); + cpu_group_start = get_cpu_thread_group_start(cpu, tg); if (unlikely(cpu_group_start == -1)) { WARN_ON_ONCE(1); - err = -ENODATA; - goto out; + return -ENODATA; } zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu), GFP_KERNEL, cpu_to_node(cpu)); for (i = first_thread; i < first_thread + threads_per_core; i++) { - int i_group_start = get_cpu_thread_group_start(i, &tg); + int i_group_start = get_cpu_thread_group_start(i, tg); if (unlikely(i_group_start == -1)) { WARN_ON_ONCE(1); - err = -ENODATA; - goto out; + return -ENODATA; } if (i_group_start == cpu_group_start) cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu)); } -out: - of_node_put(dn); - return err; + return 0; } static bool shared_caches; -- 1.9.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 2/5] powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 1/5] powerpc/smp: Parse ibm, thread-groups with multiple properties Gautham R. Shenoy @ 2020-12-09 17:08 ` Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 3/5] powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic Gautham R. Shenoy ` (2 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> On platforms which have the "ibm,thread-groups" property, the per-cpu variable cpu_l1_cache_map keeps a track of which group of threads within the same core share the L1 cache, Instruction and Data flow. This patch renames the variable to "thread_group_l1_cache_map" to make it consistent with a subsequent patch which will introduce thread_group_l2_cache_map. This patch introduces no functional change. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/kernel/smp.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 88d88ad..f3290d5 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -116,10 +116,10 @@ struct thread_groups_list { static struct thread_groups_list tgl[NR_CPUS] __initdata; /* - * On big-cores system, cpu_l1_cache_map for each CPU corresponds to + * On big-cores system, thread_group_l1_cache_map for each CPU corresponds to * the set its siblings that share the L1-cache. */ -DEFINE_PER_CPU(cpumask_var_t, cpu_l1_cache_map); +DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); /* SMP operations for this machine */ struct smp_ops_t *smp_ops; @@ -866,7 +866,7 @@ static struct thread_groups *__init get_thread_groups(int cpu, return tg; } -static int init_cpu_l1_cache_map(int cpu) +static int init_thread_group_l1_cache_map(int cpu) { int first_thread = cpu_first_thread_sibling(cpu); @@ -885,7 +885,7 @@ static int init_cpu_l1_cache_map(int cpu) return -ENODATA; } - zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu), + zalloc_cpumask_var_node(&per_cpu(thread_group_l1_cache_map, cpu), GFP_KERNEL, cpu_to_node(cpu)); for (i = first_thread; i < first_thread + threads_per_core; i++) { @@ -897,7 +897,7 @@ static int init_cpu_l1_cache_map(int cpu) } if (i_group_start == cpu_group_start) - cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu)); + cpumask_set_cpu(i, per_cpu(thread_group_l1_cache_map, cpu)); } return 0; @@ -976,7 +976,7 @@ static int init_big_cores(void) int cpu; for_each_possible_cpu(cpu) { - int err = init_cpu_l1_cache_map(cpu); + int err = init_thread_group_l1_cache_map(cpu); if (err) return err; @@ -1372,7 +1372,7 @@ static inline void add_cpu_to_smallcore_masks(int cpu) cpumask_set_cpu(cpu, cpu_smallcore_mask(cpu)); - for_each_cpu(i, per_cpu(cpu_l1_cache_map, cpu)) { + for_each_cpu(i, per_cpu(thread_group_l1_cache_map, cpu)) { if (cpu_online(i)) set_cpus_related(i, cpu, cpu_smallcore_mask); } -- 1.9.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 3/5] powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 1/5] powerpc/smp: Parse ibm, thread-groups with multiple properties Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 2/5] powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map Gautham R. Shenoy @ 2020-12-09 17:08 ` Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for " Gautham R. Shenoy 4 siblings, 0 replies; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> init_thread_group_l1_cache_map() initializes the per-cpu cpumask thread_group_l1_cache_map with the core-siblings which share L1 cache with the CPU. Make this function generic to the cache-property (L1 or L2) and update a suitable mask. This is a preparatory patch for the next patch where we will introduce discovery of thread-groups that share L2-cache. No functional change. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/kernel/smp.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index f3290d5..9078b5b5 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -866,15 +866,18 @@ static struct thread_groups *__init get_thread_groups(int cpu, return tg; } -static int init_thread_group_l1_cache_map(int cpu) +static int __init init_thread_group_cache_map(int cpu, int cache_property) { int first_thread = cpu_first_thread_sibling(cpu); int i, cpu_group_start = -1, err = 0; struct thread_groups *tg = NULL; + cpumask_var_t *mask; - tg = get_thread_groups(cpu, THREAD_GROUP_SHARE_L1, - &err); + if (cache_property != THREAD_GROUP_SHARE_L1) + return -EINVAL; + + tg = get_thread_groups(cpu, cache_property, &err); if (!tg) return err; @@ -885,8 +888,8 @@ static int init_thread_group_l1_cache_map(int cpu) return -ENODATA; } - zalloc_cpumask_var_node(&per_cpu(thread_group_l1_cache_map, cpu), - GFP_KERNEL, cpu_to_node(cpu)); + mask = &per_cpu(thread_group_l1_cache_map, cpu); + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); for (i = first_thread; i < first_thread + threads_per_core; i++) { int i_group_start = get_cpu_thread_group_start(i, tg); @@ -897,7 +900,7 @@ static int init_thread_group_l1_cache_map(int cpu) } if (i_group_start == cpu_group_start) - cpumask_set_cpu(i, per_cpu(thread_group_l1_cache_map, cpu)); + cpumask_set_cpu(i, *mask); } return 0; @@ -976,7 +979,7 @@ static int init_big_cores(void) int cpu; for_each_possible_cpu(cpu) { - int err = init_thread_group_l1_cache_map(cpu); + int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L1); if (err) return err; -- 1.9.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy ` (2 preceding siblings ...) 2020-12-09 17:08 ` [PATCH v2 3/5] powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic Gautham R. Shenoy @ 2020-12-09 17:08 ` Gautham R. Shenoy 2020-12-10 0:57 ` kernel test robot 2020-12-10 4:30 ` kernel test robot 2020-12-09 17:08 ` [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for " Gautham R. Shenoy 4 siblings, 2 replies; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> On POWER systems, groups of threads within a core sharing the L2-cache can be indicated by the "ibm,thread-groups" property array with the identifier "2". This patch adds support for detecting this, and when present, populate the populating the cpu_l2_cache_mask of every CPU to the core-siblings which share L2 with the CPU as specified in the by the "ibm,thread-groups" property array. On a platform with the following "ibm,thread-group" configuration 00000001 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 00000002 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 Without this patch, the sched-domain hierarchy for CPUs 0,1 would be CPU0 attaching sched-domain(s): domain-0: span=0,2,4,6 level=SMT domain-1: span=0-7 level=CACHE domain-2: span=0-15,24-39,48-55 level=MC domain-3: span=0-55 level=DIE CPU1 attaching sched-domain(s): domain-0: span=1,3,5,7 level=SMT domain-1: span=0-7 level=CACHE domain-2: span=0-15,24-39,48-55 level=MC domain-3: span=0-55 level=DIE The CACHE domain at 0-7 is incorrect since the ibm,thread-groups sub-array [00000002 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007] indicates that L2 (Property "2") is shared only between the threads of a single group. There are "2" groups of threads where each group contains "4" threads each. The groups being {0,2,4,6} and {1,3,5,7}. With this patch, the sched-domain hierarchy for CPUs 0,1 would be CPU0 attaching sched-domain(s): domain-0: span=0,2,4,6 level=SMT domain-1: span=0-15,24-39,48-55 level=MC domain-2: span=0-55 level=DIE CPU1 attaching sched-domain(s): domain-0: span=1,3,5,7 level=SMT domain-1: span=0-15,24-39,48-55 level=MC domain-2: span=0-55 level=DIE The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1 resp.) gets degenerated into the SMT domain. Furthermore, the last-level-cache domain gets correctly set to the SMT sched-domain. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/include/asm/smp.h | 1 + arch/powerpc/kernel/smp.c | 56 +++++++++++++++++++++++++++++++++++++++--- 2 files changed, 53 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index b2035b2..8d3d081 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -134,6 +134,7 @@ static inline struct cpumask *cpu_smallcore_mask(int cpu) extern int cpu_to_core_id(int cpu); extern bool has_big_cores; +extern bool thread_group_shares_l2; #define cpu_smt_mask cpu_smt_mask #ifdef CONFIG_SCHED_SMT diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 9078b5b5..a46cf3f 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -76,6 +76,7 @@ struct task_struct *secondary_current; bool has_big_cores; bool coregroup_enabled; +bool thread_group_shares_l2; DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map); DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map); @@ -99,6 +100,7 @@ enum { #define MAX_THREAD_LIST_SIZE 8 #define THREAD_GROUP_SHARE_L1 1 +#define THREAD_GROUP_SHARE_L2 2 struct thread_groups { unsigned int property; unsigned int nr_groups; @@ -107,7 +109,7 @@ struct thread_groups { }; /* Maximum number of properties that groups of threads within a core can share */ -#define MAX_THREAD_GROUP_PROPERTIES 1 +#define MAX_THREAD_GROUP_PROPERTIES 2 struct thread_groups_list { unsigned int nr_properties; @@ -121,6 +123,13 @@ struct thread_groups_list { */ DEFINE_PER_CPU(cpumask_var_t, thread_group_l1_cache_map); +/* + * On some big-cores system, thread_group_l2_cache_map for each CPU + * corresponds to the set its siblings within the core that share the + * L2-cache. + */ +DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map); + /* SMP operations for this machine */ struct smp_ops_t *smp_ops; @@ -718,7 +727,9 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int), * * ibm,thread-groups[i + 0] tells us the property based on which the * threads are being grouped together. If this value is 1, it implies - * that the threads in the same group share L1, translation cache. + * that the threads in the same group share L1, translation cache. If + * the value is 2, it implies that the threads in the same group share + * the same L2 cache. * * ibm,thread-groups[i+1] tells us how many such thread groups exist for the * property ibm,thread-groups[i] @@ -874,7 +885,8 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property) struct thread_groups *tg = NULL; cpumask_var_t *mask; - if (cache_property != THREAD_GROUP_SHARE_L1) + if (cache_property != THREAD_GROUP_SHARE_L1 && + cache_property != THREAD_GROUP_SHARE_L2) return -EINVAL; tg = get_thread_groups(cpu, cache_property, &err); @@ -888,7 +900,11 @@ static int __init init_thread_group_cache_map(int cpu, int cache_property) return -ENODATA; } - mask = &per_cpu(thread_group_l1_cache_map, cpu); + if (cache_property == THREAD_GROUP_SHARE_L1) + mask = &per_cpu(thread_group_l1_cache_map, cpu); + else if (cache_property == THREAD_GROUP_SHARE_L2) + mask = &per_cpu(thread_group_l2_cache_map, cpu); + zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); for (i = first_thread; i < first_thread + threads_per_core; i++) { @@ -990,6 +1006,16 @@ static int init_big_cores(void) } has_big_cores = true; + + for_each_possible_cpu(cpu) { + int err = init_thread_group_cache_map(cpu, THREAD_GROUP_SHARE_L2); + + if (err) + return err; + } + + thread_group_shares_l2 = true; + pr_debug("L2 cache only shared by the threads in the small core\n"); return 0; } @@ -1304,6 +1330,28 @@ static bool update_mask_by_l2(int cpu, cpumask_var_t *mask) if (has_big_cores) submask_fn = cpu_smallcore_mask; + /* + * If the threads in a thread-group share L2 cache, then then + * the L2-mask can be obtained from thread_group_l2_cache_map. + */ + if (thread_group_shares_l2) { + cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu)); + + for_each_cpu(i, per_cpu(thread_group_l2_cache_map, cpu)) { + if (cpu_online(i)) + set_cpus_related(i, cpu, cpu_l2_cache_mask); + } + + /* Verify that L1-cache siblings are a subset of L2 cache-siblings */ + if (!cpumask_equal(submask_fn(cpu), cpu_l2_cache_mask(cpu)) && + !cpumask_subset(submask_fn(cpu), cpu_l2_cache_mask(cpu))) { + pr_warn_once("CPU %d : Inconsistent L1 and L2 cache siblings\n", + cpu); + } + + return true; + } + l2_cache = cpu_to_l2cache(cpu); if (!l2_cache || !*mask) { /* Assume only core siblings share cache with this CPU */ -- 1.9.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache 2020-12-09 17:08 ` [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache Gautham R. Shenoy @ 2020-12-10 0:57 ` kernel test robot 2020-12-10 4:30 ` kernel test robot 1 sibling, 0 replies; 9+ messages in thread From: kernel test robot @ 2020-12-10 0:57 UTC (permalink / raw) To: Gautham R. Shenoy, Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: clang-built-linux, kbuild-all, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 10510 bytes --] Hi "Gautham, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.10-rc7 next-20201209] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-r003-20201209 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc64 cross compiling tool for clang build # apt-get install binutils-powerpc64-linux-gnu # https://github.com/0day-ci/linux/commit/61bc65c11bf36fdc3827c6d6f4f555fba5306bd9 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 git checkout 61bc65c11bf36fdc3827c6d6f4f555fba5306bd9 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:164:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:166:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:168:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:170:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: error: performing pointer arithmetic on a null pointer has undefined behavior [-Werror,-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:172:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ arch/powerpc/kernel/smp.c:569:6: error: no previous prototype for function 'tick_broadcast' [-Werror,-Wmissing-prototypes] void tick_broadcast(const struct cpumask *mask) ^ arch/powerpc/kernel/smp.c:569:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void tick_broadcast(const struct cpumask *mask) ^ static arch/powerpc/kernel/smp.c:579:6: error: no previous prototype for function 'debugger_ipi_callback' [-Werror,-Wmissing-prototypes] void debugger_ipi_callback(struct pt_regs *regs) ^ arch/powerpc/kernel/smp.c:579:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void debugger_ipi_callback(struct pt_regs *regs) ^ static >> arch/powerpc/kernel/smp.c:905:11: error: variable 'mask' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] else if (cache_property == THREAD_GROUP_SHARE_L2) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/kernel/smp.c:908:26: note: uninitialized use occurs here zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); ^~~~ arch/powerpc/kernel/smp.c:905:7: note: remove the 'if' if its condition is always true else if (cache_property == THREAD_GROUP_SHARE_L2) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/kernel/smp.c:886:21: note: initialize the variable 'mask' to silence this warning cpumask_var_t *mask; ^ = NULL arch/powerpc/kernel/smp.c:1560:5: error: no previous prototype for function 'setup_profiling_timer' [-Werror,-Wmissing-prototypes] int setup_profiling_timer(unsigned int multiplier) ^ arch/powerpc/kernel/smp.c:1560:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int setup_profiling_timer(unsigned int multiplier) ^ static 16 errors generated. vim +905 arch/powerpc/kernel/smp.c 881 882 { 883 int first_thread = cpu_first_thread_sibling(cpu); 884 int i, cpu_group_start = -1, err = 0; 885 struct thread_groups *tg = NULL; 886 cpumask_var_t *mask; 887 888 if (cache_property != THREAD_GROUP_SHARE_L1 && 889 cache_property != THREAD_GROUP_SHARE_L2) 890 return -EINVAL; 891 892 tg = get_thread_groups(cpu, cache_property, &err); 893 if (!tg) 894 return err; 895 896 cpu_group_start = get_cpu_thread_group_start(cpu, tg); 897 898 if (unlikely(cpu_group_start == -1)) { 899 WARN_ON_ONCE(1); 900 return -ENODATA; 901 } 902 903 if (cache_property == THREAD_GROUP_SHARE_L1) 904 mask = &per_cpu(thread_group_l1_cache_map, cpu); > 905 else if (cache_property == THREAD_GROUP_SHARE_L2) 906 mask = &per_cpu(thread_group_l2_cache_map, cpu); 907 908 zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); 909 910 for (i = first_thread; i < first_thread + threads_per_core; i++) { 911 int i_group_start = get_cpu_thread_group_start(i, tg); 912 913 if (unlikely(i_group_start == -1)) { 914 WARN_ON_ONCE(1); 915 return -ENODATA; 916 } 917 918 if (i_group_start == cpu_group_start) 919 cpumask_set_cpu(i, *mask); 920 } 921 922 return 0; 923 } 924 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 28213 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache 2020-12-09 17:08 ` [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache Gautham R. Shenoy 2020-12-10 0:57 ` kernel test robot @ 2020-12-10 4:30 ` kernel test robot 1 sibling, 0 replies; 9+ messages in thread From: kernel test robot @ 2020-12-10 4:30 UTC (permalink / raw) To: Gautham R. Shenoy, Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: clang-built-linux, kbuild-all, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 10071 bytes --] Hi "Gautham, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on v5.10-rc7 next-20201209] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-r035-20201209 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc64 cross compiling tool for clang build # apt-get install binutils-powerpc64-linux-gnu # https://github.com/0day-ci/linux/commit/61bc65c11bf36fdc3827c6d6f4f555fba5306bd9 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 git checkout 61bc65c11bf36fdc3827c6d6f4f555fba5306bd9 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:233:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:235:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:2:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:4:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/smp.c:22: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:6:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ arch/powerpc/kernel/smp.c:569:6: warning: no previous prototype for function 'tick_broadcast' [-Wmissing-prototypes] void tick_broadcast(const struct cpumask *mask) ^ arch/powerpc/kernel/smp.c:569:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void tick_broadcast(const struct cpumask *mask) ^ static arch/powerpc/kernel/smp.c:579:6: warning: no previous prototype for function 'debugger_ipi_callback' [-Wmissing-prototypes] void debugger_ipi_callback(struct pt_regs *regs) ^ arch/powerpc/kernel/smp.c:579:1: note: declare 'static' if the function is not intended to be used outside of this translation unit void debugger_ipi_callback(struct pt_regs *regs) ^ static >> arch/powerpc/kernel/smp.c:905:11: warning: variable 'mask' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] else if (cache_property == THREAD_GROUP_SHARE_L2) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/kernel/smp.c:908:26: note: uninitialized use occurs here zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); ^~~~ arch/powerpc/kernel/smp.c:905:7: note: remove the 'if' if its condition is always true else if (cache_property == THREAD_GROUP_SHARE_L2) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/kernel/smp.c:886:21: note: initialize the variable 'mask' to silence this warning cpumask_var_t *mask; ^ = NULL 15 warnings generated. vim +905 arch/powerpc/kernel/smp.c 881 882 { 883 int first_thread = cpu_first_thread_sibling(cpu); 884 int i, cpu_group_start = -1, err = 0; 885 struct thread_groups *tg = NULL; 886 cpumask_var_t *mask; 887 888 if (cache_property != THREAD_GROUP_SHARE_L1 && 889 cache_property != THREAD_GROUP_SHARE_L2) 890 return -EINVAL; 891 892 tg = get_thread_groups(cpu, cache_property, &err); 893 if (!tg) 894 return err; 895 896 cpu_group_start = get_cpu_thread_group_start(cpu, tg); 897 898 if (unlikely(cpu_group_start == -1)) { 899 WARN_ON_ONCE(1); 900 return -ENODATA; 901 } 902 903 if (cache_property == THREAD_GROUP_SHARE_L1) 904 mask = &per_cpu(thread_group_l1_cache_map, cpu); > 905 else if (cache_property == THREAD_GROUP_SHARE_L2) 906 mask = &per_cpu(thread_group_l2_cache_map, cpu); 907 908 zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu)); 909 910 for (i = first_thread; i < first_thread + threads_per_core; i++) { 911 int i_group_start = get_cpu_thread_group_start(i, tg); 912 913 if (unlikely(i_group_start == -1)) { 914 WARN_ON_ONCE(1); 915 return -ENODATA; 916 } 917 918 if (i_group_start == cpu_group_start) 919 cpumask_set_cpu(i, *mask); 920 } 921 922 return 0; 923 } 924 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 27079 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy ` (3 preceding siblings ...) 2020-12-09 17:08 ` [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache Gautham R. Shenoy @ 2020-12-09 17:08 ` Gautham R. Shenoy 2020-12-09 23:58 ` kernel test robot 4 siblings, 1 reply; 9+ messages in thread From: Gautham R. Shenoy @ 2020-12-09 17:08 UTC (permalink / raw) To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> On POWER platforms where only some groups of threads within a core share the L2-cache (indicated by the ibm,thread-groups device-tree property), we currently print the incorrect shared_cpu_map/list for L2-cache in the sysfs. This patch reports the correct shared_cpu_map/list on such platforms. Example: On a platform with "ibm,thread-groups" set to 00000001 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 00000002 00000002 00000004 00000000 00000002 00000004 00000006 00000001 00000003 00000005 00000007 This indicates that threads {0,2,4,6} in the core share the L2-cache and threads {1,3,5,7} in the core share the L2 cache. However, without the patch, the shared_cpu_map/list for L2 for CPUs 0, 1 is reported in the sysfs as follows: /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7 /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is correctly reported as follows: /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6 /sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7 /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa This patch adds #CONFIG_PPC64 checks for these cases to ensure that 32-bit configs build correctly. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> --- arch/powerpc/kernel/cacheinfo.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c index 65ab9fc..cb87b68 100644 --- a/arch/powerpc/kernel/cacheinfo.c +++ b/arch/powerpc/kernel/cacheinfo.c @@ -641,6 +641,7 @@ static ssize_t level_show(struct kobject *k, struct kobj_attribute *attr, char * static struct kobj_attribute cache_level_attr = __ATTR(level, 0444, level_show, NULL); +#ifdef CONFIG_PPC64 static unsigned int index_dir_to_cpu(struct cache_index_dir *index) { struct kobject *index_dir_kobj = &index->kobj; @@ -650,16 +651,35 @@ static unsigned int index_dir_to_cpu(struct cache_index_dir *index) return dev->id; } +#endif /* * On big-core systems, each core has two groups of CPUs each of which * has its own L1-cache. The thread-siblings which share l1-cache with * @cpu can be obtained via cpu_smallcore_mask(). + * + * On some big-core systems, the L2 cache is shared only between some + * groups of siblings. This is already parsed and encoded in + * cpu_l2_cache_mask(). + * + * TODO: cache_lookup_or_instantiate() needs to be made aware of the + * "ibm,thread-groups" property so that cache->shared_cpu_map + * reflects the correct siblings on platforms that have this + * device-tree property. This helper function is only a stop-gap + * solution so that we report the correct siblings to the + * userspace via sysfs. */ -static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache) +static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache) { - if (cache->level == 1) - return cpu_smallcore_mask(cpu); +#ifdef CONFIG_PPC64 + if (has_big_cores) { + int cpu = index_dir_to_cpu(index); + if (cache->level == 1) + return cpu_smallcore_mask(cpu); + if (cache->level == 2 && thread_group_shares_l2) + return cpu_l2_cache_mask(cpu); + } +#endif return &cache->shared_cpu_map; } @@ -670,17 +690,11 @@ static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache * struct cache_index_dir *index; struct cache *cache; const struct cpumask *mask; - int cpu; index = kobj_to_cache_index_dir(k); cache = index->cache; - if (has_big_cores) { - cpu = index_dir_to_cpu(index); - mask = get_big_core_shared_cpu_map(cpu, cache); - } else { - mask = &cache->shared_cpu_map; - } + mask = get_shared_cpu_map(index, cache); return cpumap_print_to_pagebuf(list, buf, mask); } -- 1.9.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache 2020-12-09 17:08 ` [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for " Gautham R. Shenoy @ 2020-12-09 23:58 ` kernel test robot 0 siblings, 0 replies; 9+ messages in thread From: kernel test robot @ 2020-12-09 23:58 UTC (permalink / raw) To: Gautham R. Shenoy, Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan, Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch, Peter Zijlstra, Valentin Schneider Cc: clang-built-linux, kbuild-all, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 10557 bytes --] Hi "Gautham, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on powerpc/next] [also build test WARNING on v5.10-rc7 next-20201209] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc64-randconfig-r031-20201209 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1968804ac726e7674d5de22bc2204b45857da344) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc64 cross compiling tool for clang build # apt-get install binutils-powerpc64-linux-gnu # https://github.com/0day-ci/linux/commit/61bd9b188793d5009b5cdf310149e498264e6d57 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Gautham-R-Shenoy/Extend-Parsing-ibm-thread-groups-for-Shared-L2-information/20201210-011226 git checkout 61bd9b188793d5009b5cdf310149e498264e6d57 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:125:1: note: expanded from here __do_insb ^ arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb' #define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/cacheinfo.c:21: In file included from arch/powerpc/include/asm/prom.h:21: In file included from include/linux/of_address.h:7: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:127:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/cacheinfo.c:21: In file included from arch/powerpc/include/asm/prom.h:21: In file included from include/linux/of_address.h:7: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:129:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/cacheinfo.c:21: In file included from arch/powerpc/include/asm/prom.h:21: In file included from include/linux/of_address.h:7: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:131:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/cacheinfo.c:21: In file included from arch/powerpc/include/asm/prom.h:21: In file included from include/linux/of_address.h:7: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:133:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from arch/powerpc/kernel/cacheinfo.c:21: In file included from arch/powerpc/include/asm/prom.h:21: In file included from include/linux/of_address.h:7: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:135:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ arch/powerpc/kernel/cacheinfo.c:679:28: error: use of undeclared identifier 'thread_group_shares_l2'; did you mean 'thread_group_leader'? if (cache->level == 2 && thread_group_shares_l2) ^~~~~~~~~~~~~~~~~~~~~~ thread_group_leader include/linux/sched/signal.h:652:20: note: 'thread_group_leader' declared here static inline bool thread_group_leader(struct task_struct *p) ^ >> arch/powerpc/kernel/cacheinfo.c:679:28: warning: address of function 'thread_group_leader' will always evaluate to 'true' [-Wpointer-bool-conversion] if (cache->level == 2 && thread_group_shares_l2) ~~ ^~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/kernel/cacheinfo.c:679:28: note: prefix with the address-of operator to silence this warning if (cache->level == 2 && thread_group_shares_l2) ^ & arch/powerpc/kernel/cacheinfo.c:680:11: error: implicit declaration of function 'cpu_l2_cache_mask' [-Werror,-Wimplicit-function-declaration] return cpu_l2_cache_mask(cpu); ^ >> arch/powerpc/kernel/cacheinfo.c:680:11: warning: incompatible integer to pointer conversion returning 'int' from a function with result type 'const struct cpumask *' [-Wint-conversion] return cpu_l2_cache_mask(cpu); ^~~~~~~~~~~~~~~~~~~~~~ 14 warnings and 2 errors generated. vim +679 arch/powerpc/kernel/cacheinfo.c 655 656 /* 657 * On big-core systems, each core has two groups of CPUs each of which 658 * has its own L1-cache. The thread-siblings which share l1-cache with 659 * @cpu can be obtained via cpu_smallcore_mask(). 660 * 661 * On some big-core systems, the L2 cache is shared only between some 662 * groups of siblings. This is already parsed and encoded in 663 * cpu_l2_cache_mask(). 664 * 665 * TODO: cache_lookup_or_instantiate() needs to be made aware of the 666 * "ibm,thread-groups" property so that cache->shared_cpu_map 667 * reflects the correct siblings on platforms that have this 668 * device-tree property. This helper function is only a stop-gap 669 * solution so that we report the correct siblings to the 670 * userspace via sysfs. 671 */ 672 static const struct cpumask *get_shared_cpu_map(struct cache_index_dir *index, struct cache *cache) 673 { 674 #ifdef CONFIG_PPC64 675 if (has_big_cores) { 676 int cpu = index_dir_to_cpu(index); 677 if (cache->level == 1) 678 return cpu_smallcore_mask(cpu); > 679 if (cache->level == 2 && thread_group_shares_l2) > 680 return cpu_l2_cache_mask(cpu); 681 } 682 #endif 683 684 return &cache->shared_cpu_map; 685 } 686 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org [-- Attachment #2: .config.gz --] [-- Type: application/gzip, Size: 37110 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-12-10 4:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-09 17:08 [PATCH v2 0/5] Extend Parsing "ibm, thread-groups" for Shared-L2 information Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 1/5] powerpc/smp: Parse ibm, thread-groups with multiple properties Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 2/5] powerpc/smp: Rename cpu_l1_cache_map as thread_group_l1_cache_map Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 3/5] powerpc/smp: Rename init_thread_group_l1_cache_map() to make it generic Gautham R. Shenoy 2020-12-09 17:08 ` [PATCH v2 4/5] powerpc/smp: Add support detecting thread-groups sharing L2 cache Gautham R. Shenoy 2020-12-10 0:57 ` kernel test robot 2020-12-10 4:30 ` kernel test robot 2020-12-09 17:08 ` [PATCH v2 5/5] powerpc/cacheinfo: Print correct cache-sibling map/list for " Gautham R. Shenoy 2020-12-09 23:58 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).