* [RFC PATCH 0/3] Rework CPU capacity asymmetry detection @ 2021-04-16 13:01 Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag Beata Michalska ` (3 more replies) 0 siblings, 4 replies; 7+ messages in thread From: Beata Michalska @ 2021-04-16 13:01 UTC (permalink / raw) To: linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, valentin.schneider, dietmar.eggemann, corbet, linux-doc As of now, the asym_cpu_capacity_level will try to locate the lowest topology level where the highest available CPU capacity is being visible to all CPUs. This works perfectly fine for most of existing asymmetric designs out there, though for some possible and completely valid setups, combining different cpu microarchitectures within clusters, this might not be the best approach, resulting in pointing at a level, at which some of the domains might not see any asymmetry at all. This could be problematic for misfit migration and/or energy aware placement. And as such, for affected platforms it might result in custom changes to wake-up and CPU selection paths. The following patches rework how the asymmetric detection is being carried out, pinning the asymmetric topology level to the lowest one, where full range of CPU capacities is visible to all CPUs within given sched domain. The asym_cpu_capacity_level will also keep track of those levels where any scope of asymmetry is being observed, to denote corresponding sched domains with the SD_ASYM_CPUCAPACITY flag and to enable misfit migration for those. In order to distinguish the sched domains with partial vs full range of CPU capacity asymmetry, new sched domain flag has been introduced: SD_ASYM_CPUCAPACITY_FULL. The overall idea of changing the asymmetry detection has been suggested earlier by Valentin Schneider <valentin.schneider@arm.com> Verified on (mostly): - QEMU (version 4.2.1) with variants of possible asymmetric topologies - machine: virt - modifying the device-tree 'cpus' node for virt machine: qemu-system-aarch64 -kernel $KERNEL_IMG -drive format=qcow2,file=$IMAGE -append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug loglevel=15 kmemleak=on' -m 2G --nographic -cpu cortex-a57 -machine virt -smp cores=6 -machine dumpdtb=$CUSTOM_DTB.dtb $KERNEL_PATH/scripts/dtc/dtc -I dtb -O dts $CUSTOM_DTB.dts > $CUSTOM_DTB.dtb (modify the dts) $KERNEL_PATH/scripts/dtc/dtc -I dts -O dtb $CUSTOM_DTB.dts > $CUSTOM_DTB.dtb qemu-system-aarch64 -kernel $KERNEL_IMG -drive format=qcow2,file=$IMAGE -append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug loglevel=15 kmemleak=on' -m 2G --nographic -cpu cortex-a57 -machine virt -smp cores=6 -machine dtb=$CUSTOM_DTB.dtb Beata Michalska (3): sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag sched/topology: Rework CPU capacity asymmetry detection sched/doc: Update the CPU capacity asymmetry bits Documentation/scheduler/sched-capacity.rst | 6 +- Documentation/scheduler/sched-energy.rst | 2 +- include/linux/sched/sd_flags.h | 10 + kernel/sched/topology.c | 339 +++++++++++++++++++++++++---- 4 files changed, 314 insertions(+), 43 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag 2021-04-16 13:01 [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Beata Michalska @ 2021-04-16 13:01 ` Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection Beata Michalska ` (2 subsequent siblings) 3 siblings, 0 replies; 7+ messages in thread From: Beata Michalska @ 2021-04-16 13:01 UTC (permalink / raw) To: linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, valentin.schneider, dietmar.eggemann, corbet, linux-doc Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain topology flag, to distinguish between shed_domains where any CPU capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where a full range of CPU capacities is visible to all domain members (SD_ASYM_CPUCAPACITY_FULL). With the distinction between full and partial CPU capacity asymmetry, the scope of the original SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing behaviour when one is detected on a given sched domain, allowing misfit migrations within sched domains that do not observe full range of CPU capacities but still do have members with different capacity values. It loses though it's meaning when it comes to the lowest CPU asymmetry sched_domain level per-cpu pointer, which is to be now denoted by SD_ASYM_CPUCAPACITY_FULL flag. Signed-off-by: Beata Michalska <beata.michalska@arm.com> --- include/linux/sched/sd_flags.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 34b21e9..57bde66 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h @@ -91,6 +91,16 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD) SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) /* + * Domain members have different CPU capacities spanning all unique CPU + * capacity values. + * + * SHARED_PARENT: Set from the topmost domain down to the first domain where + * all available CPU capacities are visible + * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups. + */ +SD_FLAG(SD_ASYM_CPUCAPACITY_FULL, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) + +/* * Domain members share CPU capacity (i.e. SMT) * * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share -- 2.7.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection 2021-04-16 13:01 [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag Beata Michalska @ 2021-04-16 13:01 ` Beata Michalska 2021-04-21 19:58 ` Valentin Schneider 2021-04-16 13:01 ` [RFC PATCH 3/3] sched/doc: Update the CPU capacity asymmetry bits Beata Michalska 2021-04-21 19:58 ` [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Valentin Schneider 3 siblings, 1 reply; 7+ messages in thread From: Beata Michalska @ 2021-04-16 13:01 UTC (permalink / raw) To: linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, valentin.schneider, dietmar.eggemann, corbet, linux-doc Currently the CPU capacity asymmetry detection, performed through asym_cpu_capacity_level, tries to identify the lowest topology level at which the highest CPU capacity is being observed, not necessarily finding the level at which all possible capacity values are visible to all CPUs, which might be bit problematic for some possible/valid asymmetric topologies i.e.: DIE [ ] MC [ ][ ] CPU [0] [1] [2] [3] [4] [5] [6] [7] Capacity |.....| |.....| |.....| |.....| L M B B Where: arch_scale_cpu_capacity(L) = 512 arch_scale_cpu_capacity(M) = 871 arch_scale_cpu_capacity(B) = 1024 In this particular case, the asymmetric topology level will point at MC, as all possible CPU masks for that level do cover the CPU with the highest capacity. It will work just fine for the first cluster, not so much for the second one though (consider the find_energy_efficient_cpu which might end up attempting the energy aware wake-up for a domain that does not see any asymmetry at all) Rework the way the capacity asymmetry levels are being detected, to point to the lowest topology level( for a given CPU), where full range of available CPU capacities is visible to all CPUs within given domain. As a result, the per-cpu sd_asym_cpucapacity might differ across the domains. This will have an impact on EAS wake-up placement in a way that it might see different range of CPUs to be considered, depending on the given current and target CPUs. Additionally, those levels, where any range of asymmetry (not necessarily full) is being detected will get identified as well. The selected asymmetric topology level will be denoted by SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels' would receive the already used SD_ASYM_CPUCAPACITY flag. This allows maintaining the current behaviour for asymmetric topologies, with misfit migration operating correctly on lower levels, if applicable, as any asymmetry is enough to trigger the misfit migration. The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not relate to the full asymmetry level denoted by the sd_asym_cpucapacity pointer. Signed-off-by: Beata Michalska <beata.michalska@arm.com> --- kernel/sched/topology.c | 339 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 299 insertions(+), 40 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 09d3504..9dfa66b 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -675,7 +675,7 @@ static void update_top_cache_domain(int cpu) sd = highest_flag_domain(cpu, SD_ASYM_PACKING); rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY); + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); } @@ -1958,65 +1958,322 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, return true; } +/** + * Asym capacity bits + */ + +/** + * Cached cpu masks for those sched domains, at a given topology level, + * that do represent CPUs with asymmetric capacities. + * + * Each topology level will get the cached data assigned, + * with asym cap sched_flags (SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL + * accordingly) and the corresponding cpumask for: + * - domains that do span CPUs with different capacities + * - domains where all CPU capacities are visible for all CPUs within + * the domain + * + * Within a single topology level there might be domains + * with different scope of asymmetry: + * none -> . + * partial -> SD_ASYM_CPUCAPACITY + * full -> SD_ASYM_CPUCAPACITY|SD_ASYM_CPUCAPACITY_FULL + */ +struct asym_cache_data { + + struct sched_domain_topology_level *tl; + unsigned int sched_flags; + struct cpumask *asym_mask; + struct cpumask *asym_full_mask; +}; + +static inline int asym_cpu_capacity_verify(struct asym_cache_data *data, + struct sched_domain_topology_level *tl, int cpu) +{ + int flags = 0; + + if (!data) + goto leave; + + while (data->tl) { + if (data->tl != tl) { + ++data; + continue; + } + + if (!data->sched_flags) + break; + /* + * For topology levels above one, where all CPUs observe + * all available capacities, CPUs mask is not being + * cached for optimization reasons, assuming, that at this + * point, all possible CPUs are being concerned. + * Those levels will have both: + * SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL + * flags set. + */ + if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL && + !data->asym_full_mask) { + flags = data->sched_flags; + break; + } + + if (data->asym_full_mask && + cpumask_test_cpu(cpu, data->asym_full_mask)) { + flags = data->sched_flags; + break; + } + /* + * A given topology level might be marked with + * SD_ASYM_CPUCAPACITY_FULL mask but only for a certain subset + * of CPUs. + * Consider the following: + * #1 + * + * DIE [ ] + * MC [ ][ ] + * [0] [1] [2] [3] [4] [5] [6] [7] + * |.....| |.....| |.....| |.....| + * L M B B + * + * where: + * arch_scale_cpu_capacity(L) = 512 + * arch_scale_cpu_capacity(M) = 871 + * arch_scale_cpu_capacity(B) = 1024 + * + * MC topology level will be marked with both + * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: + * asym_full_mask = [0-5] + * asym_mask empty (no other asymmetry apart from + * already covered [0-5]) + * + * #2 + * + * DIE [ ] + * MC [ ][ ] + * [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] + * |.....| |.....| |.....| |.....| |.....| + * L M B L B + * + * MC topology level will be marked with both + * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: + * asym_full_mask = [0-5] + * asym_mask = [6-9] + */ + if (data->asym_mask && cpumask_test_cpu(cpu, data->asym_mask)) + flags = SD_ASYM_CPUCAPACITY; + break; + + } +leave: + return flags; +} + + +static inline void asym_cpu_capacity_release_data(struct asym_cache_data *data) +{ + struct asym_cache_data *__data = data; + + if (data) { + while (data->tl) { + if (!data->sched_flags) + goto next; + if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL) + kfree(data->asym_full_mask); + kfree(data->asym_mask); +next: + ++data; + }; + kfree(__data); + } +} + +static inline void asym_cpu_capacity_cache_data(struct asym_cache_data *data, + unsigned int flags, const struct cpumask *cpumask) +{ + struct cpumask **__mask; + + if (!data) + return; + __mask = flags & SD_ASYM_CPUCAPACITY_FULL ? &data->asym_full_mask + : &data->asym_mask; + + if (!(*__mask)) + *__mask = kzalloc(cpumask_size(), GFP_KERNEL); + if (*__mask) + cpumask_or(*__mask, *__mask, cpumask); + data->sched_flags |= flags; +} /* * Find the sched_domain_topology_level where all CPU capacities are visible * for all CPUs. */ -static struct sched_domain_topology_level -*asym_cpu_capacity_level(const struct cpumask *cpu_map) +static struct asym_cache_data +*asym_cpu_capacity_scan(const struct cpumask *cpu_map) { - int i, j, asym_level = 0; - bool asym = false; + /* + * Simple data structure to record all available CPU capacities. + * Additional scan level allows tracking unique capacities per each + * topology level and each separate topology level CPU mask. + * During each scan phase, the scan level will allow to determine, + * whether given capacity has been already accounted for, by syncing + * it with the scan stage id. + */ + struct capacity_entry { + struct list_head link; + unsigned long capacity; + unsigned int scan_level; + }; + struct sched_domain_topology_level *tl, *asym_tl = NULL; - unsigned long cap; + struct asym_cache_data *scan_data = NULL; + struct capacity_entry *entry = NULL, *tmp; + unsigned int level_count = 0; + unsigned int cap_count = 0; + unsigned int scan_id = 0; + LIST_HEAD(capacity_set); + unsigned long capacity; + cpumask_var_t cpu_mask; + int cpu; - /* Is there any asymmetry? */ - cap = arch_scale_cpu_capacity(cpumask_first(cpu_map)); + /* Build-up a list of all CPU capacities, verifying on the way + * if there is any asymmetry at all + */ + for_each_cpu(cpu, cpu_map) { + unsigned long capacity = arch_scale_cpu_capacity(cpu); - for_each_cpu(i, cpu_map) { - if (arch_scale_cpu_capacity(i) != cap) { - asym = true; - break; + if (entry && capacity == entry->capacity) + goto next; + + list_for_each_entry(entry, &capacity_set, link) { + if (capacity == entry->capacity) + goto next; + } + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (entry) { + entry->capacity = capacity; + list_add(&entry->link, &capacity_set); } + ++cap_count; +next: + ; } - if (!asym) - return NULL; + /* No asymmetry detected so skip the rest */ + if (!(cap_count > 1)) + goto leave; + if (!alloc_cpumask_var(&cpu_mask, GFP_KERNEL)) + goto leave; + + /* Get the number of topology levels */ + for_each_sd_topology(tl) level_count++; /* - * Examine topology from all CPU's point of views to detect the lowest - * sched_domain_topology_level where a highest capacity CPU is visible - * to everyone. + * Allocate an array to store cached data + * per each topology level + sentinel */ - for_each_cpu(i, cpu_map) { - unsigned long max_capacity = arch_scale_cpu_capacity(i); - int tl_id = 0; + scan_data = kcalloc(level_count + 1, sizeof(*scan_data), GFP_KERNEL); + if (!scan_data) { + free_cpumask_var(cpu_mask); + goto leave; + } - for_each_sd_topology(tl) { - if (tl_id < asym_level) - goto next_level; + level_count = 0; + + for_each_sd_topology(tl) { + unsigned int local_cap_count; + bool full_asym = true; + const struct cpumask *mask; + struct asym_cache_data *data = &scan_data[level_count++]; - for_each_cpu_and(j, tl->mask(i), cpu_map) { - unsigned long capacity; +#ifdef CONFIG_NUMA + /* + * For NUMA we might end-up in a sched domain + * that spans numa nodes with cpus with + * different capacities which would not be caught + * by the above scan as those will have + * separate cpumasks - subject to numa level + * @see: sched_domains_curr_level & sd_numa_mask + * Considered to be a no-go + */ + if (WARN_ON_ONCE(tl->numa_level && !full_asym)) + goto leave; +#endif + data->tl = tl; - capacity = arch_scale_cpu_capacity(j); + if (asym_tl) { + data->sched_flags = SD_ASYM_CPUCAPACITY | + SD_ASYM_CPUCAPACITY_FULL; + continue; + } - if (capacity <= max_capacity) - continue; + cpumask_copy(cpu_mask, cpu_map); + cpu = cpumask_first(cpu_mask); + + while (cpu < nr_cpu_ids) { + int i; + + /* + * Tracking each CPU capacity 'scan' id + * to distinguish discovered capacity sets + * between different CPU masks at each topology level: + * capturing unique capacity values at each scan stage + */ + ++scan_id; + local_cap_count = 0; + + mask = tl->mask(cpu); + for_each_cpu_and(i, mask, cpu_map) { + + capacity = arch_scale_cpu_capacity(i); - max_capacity = capacity; - asym_level = tl_id; - asym_tl = tl; + list_for_each_entry(entry, &capacity_set, link) { + if (entry->capacity == capacity + && entry->scan_level < scan_id) { + entry->scan_level = scan_id; + ++local_cap_count; + } + } + __cpumask_clear_cpu(i, cpu_mask); + } + if (cap_count != local_cap_count) + full_asym = false; + if (local_cap_count > 1) { + int flags = (cap_count != local_cap_count) + ? SD_ASYM_CPUCAPACITY + : SD_ASYM_CPUCAPACITY + | SD_ASYM_CPUCAPACITY_FULL; + + asym_cpu_capacity_cache_data(data, flags, mask); } -next_level: - tl_id++; + cpu = cpumask_first(cpu_mask); + } + /* + * Clear the cached masks from CPUs that are not present + * in cpu_map + */ + if (data->asym_mask) + cpumask_and(data->asym_mask, data->asym_mask, cpu_map); + if (data->asym_full_mask) + cpumask_and(data->asym_full_mask, data->asym_full_mask, + cpu_map); + + if (full_asym) + asym_tl = tl; } + free_cpumask_var(cpu_mask); - return asym_tl; -} +leave: + list_for_each_entry_safe(entry, tmp, &capacity_set, link) { + list_del(&entry->link); + kfree(entry); + } + return scan_data; +} /* * Build sched domains for a given set of CPUs and attach the sched domains @@ -2025,12 +2282,12 @@ static struct sched_domain_topology_level static int build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr) { + struct asym_cache_data *asym_scan_data; enum s_alloc alloc_state = sa_none; struct sched_domain *sd; struct s_data d; struct rq *rq = NULL; int i, ret = -ENOMEM; - struct sched_domain_topology_level *tl_asym; bool has_asym = false; if (WARN_ON(cpumask_empty(cpu_map))) @@ -2040,7 +2297,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att if (alloc_state != sa_rootdomain) goto error; - tl_asym = asym_cpu_capacity_level(cpu_map); + asym_scan_data = asym_cpu_capacity_scan(cpu_map); /* Set up domains for CPUs specified by the cpu_map: */ for_each_cpu(i, cpu_map) { @@ -2049,9 +2306,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att sd = NULL; for_each_sd_topology(tl) { - if (tl == tl_asym) { - dflags |= SD_ASYM_CPUCAPACITY; - has_asym = true; + if (!(dflags & SD_ASYM_CPUCAPACITY_FULL)) { + dflags |= asym_cpu_capacity_verify(asym_scan_data, + tl, i); + has_asym = dflags & SD_ASYM_CPUCAPACITY; } if (WARN_ON(!topology_span_sane(tl, cpu_map, i))) @@ -2068,6 +2326,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att } } + asym_cpu_capacity_release_data(asym_scan_data); /* Build the groups for the domains */ for_each_cpu(i, cpu_map) { for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection 2021-04-16 13:01 ` [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection Beata Michalska @ 2021-04-21 19:58 ` Valentin Schneider 2021-04-22 10:27 ` Beata Michalska 0 siblings, 1 reply; 7+ messages in thread From: Valentin Schneider @ 2021-04-21 19:58 UTC (permalink / raw) To: Beata Michalska, linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, corbet, linux-doc Hi Beata, On 16/04/21 14:01, Beata Michalska wrote: > Currently the CPU capacity asymmetry detection, performed through > asym_cpu_capacity_level, tries to identify the lowest topology level > at which the highest CPU capacity is being observed, not necessarily > finding the level at which all possible capacity values are visible > to all CPUs Despite the latter being what it says on the tin! (see comment atop asym_cpu_capacity_level()) >, which might be bit problematic for some possible/valid > asymmetric topologies i.e.: > > DIE [ ] > MC [ ][ ] > > CPU [0] [1] [2] [3] [4] [5] [6] [7] > Capacity |.....| |.....| |.....| |.....| > L M B B > > Where: > arch_scale_cpu_capacity(L) = 512 > arch_scale_cpu_capacity(M) = 871 > arch_scale_cpu_capacity(B) = 1024 > > In this particular case, the asymmetric topology level will point > at MC, as all possible CPU masks for that level do cover the CPU > with the highest capacity. It will work just fine for the first > cluster, not so much for the second one though (consider the > find_energy_efficient_cpu which might end up attempting the energy > aware wake-up for a domain that does not see any asymmetry at all) > Another problematic topology is something the likes of DIE [ ] MC [ ][ ] L M B B Because here the asymmetric tl will be DIE, so we won't properly recognize that MC domain with L+M as having CPU asymmetry. That means no misfit upmigration from L to M, for one. The Exynos-based Galaxy S10 *almost* matches that topology - from what I've been able to scrounge, all CPUs are hooked up to the same LLC *but* the big CPUs have exclusive access to some part of it. From the devicetree files I've been able to see, the big cores are actually described as having their own LLC. Regardless, the topology you describe in this changelog is something that's achievable by cobbling two DynamIQ clusters (each with their own LLC) to an interconnect, which the architecture supports (IIRC up to something like 32 clusters). > Rework the way the capacity asymmetry levels are being detected, > to point to the lowest topology level( for a given CPU), where full > range of available CPU capacities is visible to all CPUs within given > domain. As a result, the per-cpu sd_asym_cpucapacity might differ > across the domains. This will have an impact on EAS wake-up placement > in a way that it might see different range of CPUs to be considered, > depending on the given current and target CPUs. > > Additionally, those levels, where any range of asymmetry (not > necessarily full) is being detected will get identified as well. > The selected asymmetric topology level will be denoted by > SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels' > would receive the already used SD_ASYM_CPUCAPACITY flag. This allows > maintaining the current behaviour for asymmetric topologies, with > misfit migration operating correctly on lower levels, if applicable, > as any asymmetry is enough to trigger the misfit migration. > The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not > relate to the full asymmetry level denoted by the sd_asym_cpucapacity > pointer. > > Signed-off-by: Beata Michalska <beata.michalska@arm.com> Most of this looks OK to me, I have a few comments below but nothing major. FWIW I've appended a diff at the tail of that email which covers (most) said comments. > --- > kernel/sched/topology.c | 339 ++++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 299 insertions(+), 40 deletions(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 09d3504..9dfa66b 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -675,7 +675,7 @@ static void update_top_cache_domain(int cpu) > sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > > - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY); > + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > } > > @@ -1958,65 +1958,322 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, > > return true; > } > +/** > + * Asym capacity bits > + */ > + > +/** > + * Cached cpu masks for those sched domains, at a given topology level, > + * that do represent CPUs with asymmetric capacities. > + * > + * Each topology level will get the cached data assigned, > + * with asym cap sched_flags (SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL > + * accordingly) and the corresponding cpumask for: > + * - domains that do span CPUs with different capacities > + * - domains where all CPU capacities are visible for all CPUs within > + * the domain > + * > + * Within a single topology level there might be domains > + * with different scope of asymmetry: > + * none -> . > + * partial -> SD_ASYM_CPUCAPACITY > + * full -> SD_ASYM_CPUCAPACITY|SD_ASYM_CPUCAPACITY_FULL > + */ > +struct asym_cache_data { > + > + struct sched_domain_topology_level *tl; > + unsigned int sched_flags; > + struct cpumask *asym_mask; > + struct cpumask *asym_full_mask; > +}; > + I'll dump this here because I think it can be useful to whoever else will stare at this: This is pretty much an extension of struct sched_domain_topology_level, providing a new / modified tl->sd_flags() output. Unfortunately, said output requires either a cpumask per flag per topology level or a flag accumulator per topology level per CPU. In light of this, it's preferable to keep this extra data outside of the sched_domain_topology_level struct and have its lifespan limited to the domain build, which is what's being done here. > +*asym_cpu_capacity_scan(const struct cpumask *cpu_map) > { > - int i, j, asym_level = 0; > - bool asym = false; > + /* > + * Simple data structure to record all available CPU capacities. > + * Additional scan level allows tracking unique capacities per each > + * topology level and each separate topology level CPU mask. > + * During each scan phase, the scan level will allow to determine, > + * whether given capacity has been already accounted for, by syncing > + * it with the scan stage id. > + */ > + struct capacity_entry { > + struct list_head link; > + unsigned long capacity; > + unsigned int scan_level; > + }; > + > struct sched_domain_topology_level *tl, *asym_tl = NULL; > - unsigned long cap; > + struct asym_cache_data *scan_data = NULL; > + struct capacity_entry *entry = NULL, *tmp; > + unsigned int level_count = 0; > + unsigned int cap_count = 0; > + unsigned int scan_id = 0; > + LIST_HEAD(capacity_set); > + unsigned long capacity; > + cpumask_var_t cpu_mask; > + int cpu; > > - /* Is there any asymmetry? */ > - cap = arch_scale_cpu_capacity(cpumask_first(cpu_map)); > + /* Build-up a list of all CPU capacities, verifying on the way > + * if there is any asymmetry at all That's a wrong comment style. > /* > - * Examine topology from all CPU's point of views to detect the lowest > - * sched_domain_topology_level where a highest capacity CPU is visible > - * to everyone. > + * Allocate an array to store cached data > + * per each topology level + sentinel > */ > - for_each_cpu(i, cpu_map) { > - unsigned long max_capacity = arch_scale_cpu_capacity(i); > - int tl_id = 0; > + scan_data = kcalloc(level_count + 1, sizeof(*scan_data), GFP_KERNEL); Given we have one cache per tl, do we need the sentinel? I gave that a shot and it didn't explode, also further simplified asym_cpu_capacity_verify(), see appended diff. > + if (!scan_data) { > + free_cpumask_var(cpu_mask); > + goto leave; > + } [...] > + for_each_cpu_and(i, mask, cpu_map) { > + > + capacity = arch_scale_cpu_capacity(i); > > - max_capacity = capacity; > - asym_level = tl_id; > - asym_tl = tl; > + list_for_each_entry(entry, &capacity_set, link) { > + if (entry->capacity == capacity > + && entry->scan_level < scan_id) { ^^ Operand should be at EOL. > + entry->scan_level = scan_id; > + ++local_cap_count; > + } > + } > + __cpumask_clear_cpu(i, cpu_mask); > + } > @@ -2049,9 +2306,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > sd = NULL; > for_each_sd_topology(tl) { > - if (tl == tl_asym) { > - dflags |= SD_ASYM_CPUCAPACITY; > - has_asym = true; > + if (!(dflags & SD_ASYM_CPUCAPACITY_FULL)) { > + dflags |= asym_cpu_capacity_verify(asym_scan_data, > + tl, i); > + has_asym = dflags & SD_ASYM_CPUCAPACITY; > } Given this dflags & SD_ASYM_CPUCAPACITY_FULL check, is the maskless optimization thing actually required? AIUI, for any CPU, the first topology level where we'll set SD_ASYM_CPUCAPACITY_FULL should have a matching asym_scan_data[tlid]->asym_full_mask, and all subsequent tl's will see that in dflags and not call into asym_cpu_capacity_verify(). > > if (WARN_ON(!topology_span_sane(tl, cpu_map, i))) > @@ -2068,6 +2326,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > } > } > > + asym_cpu_capacity_release_data(asym_scan_data); > /* Build the groups for the domains */ > for_each_cpu(i, cpu_map) { > for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { > -- > 2.7.4 --- diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 1f965293cc7e..31d89868f208 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2011,102 +2011,80 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, * full -> SD_ASYM_CPUCAPACITY|SD_ASYM_CPUCAPACITY_FULL */ struct asym_cache_data { - - struct sched_domain_topology_level *tl; unsigned int sched_flags; struct cpumask *asym_mask; struct cpumask *asym_full_mask; }; -static inline int asym_cpu_capacity_verify(struct asym_cache_data *data, - struct sched_domain_topology_level *tl, int cpu) +static inline int asym_cpu_capacity_verify(struct asym_cache_data *data, int cpu) { - int flags = 0; - if (!data) goto leave; - while (data->tl) { - if (data->tl != tl) { - ++data; - continue; - } - - if (!data->sched_flags) - break; - /* - * For topology levels above one, where all CPUs observe - * all available capacities, CPUs mask is not being - * cached for optimization reasons, assuming, that at this - * point, all possible CPUs are being concerned. - * Those levels will have both: - * SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL - * flags set. - */ - if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL && - !data->asym_full_mask) { - flags = data->sched_flags; - break; - } - - if (data->asym_full_mask && - cpumask_test_cpu(cpu, data->asym_full_mask)) { - flags = data->sched_flags; - break; - } - /* - * A given topology level might be marked with - * SD_ASYM_CPUCAPACITY_FULL mask but only for a certain subset - * of CPUs. - * Consider the following: - * #1 - * - * DIE [ ] - * MC [ ][ ] - * [0] [1] [2] [3] [4] [5] [6] [7] - * |.....| |.....| |.....| |.....| - * L M B B - * - * where: - * arch_scale_cpu_capacity(L) = 512 - * arch_scale_cpu_capacity(M) = 871 - * arch_scale_cpu_capacity(B) = 1024 - * - * MC topology level will be marked with both - * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: - * asym_full_mask = [0-5] - * asym_mask empty (no other asymmetry apart from - * already covered [0-5]) - * - * #2 - * - * DIE [ ] - * MC [ ][ ] - * [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] - * |.....| |.....| |.....| |.....| |.....| - * L M B L B - * - * MC topology level will be marked with both - * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: - * asym_full_mask = [0-5] - * asym_mask = [6-9] - */ - if (data->asym_mask && cpumask_test_cpu(cpu, data->asym_mask)) - flags = SD_ASYM_CPUCAPACITY; - break; + if (!data->sched_flags) + goto leave; + /* + * For topology levels above one, where all CPUs observe all available + * capacities, CPUs mask is not being cached for optimization reasons, + * assuming, that at this point, all possible CPUs are being concerned. + * Those levels will have both: SD_ASYM_CPUCAPACITY and + * SD_ASYM_CPUCAPACITY_FULL flags set. + */ + if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL && !data->asym_full_mask) + return data->sched_flags; - } + if (data->asym_full_mask && cpumask_test_cpu(cpu, data->asym_full_mask)) + return data->sched_flags; + /* + * A given topology level might be marked with SD_ASYM_CPUCAPACITY_FULL + * mask but only for a certain subset of CPUs. + * Consider the following: + * #1 + * + * DIE [ ] + * MC [ ][ ] + * [0] [1] [2] [3] [4] [5] [6] [7] + * |.....| |.....| |.....| |.....| + * L M B B + * + * where: + * arch_scale_cpu_capacity(L) = 512 + * arch_scale_cpu_capacity(M) = 871 + * arch_scale_cpu_capacity(B) = 1024 + * + * MC topology level will be marked with both SD_ASYM_CPUCAPACITY flags, + * but the relevant masks will be: + * asym_full_mask = [0-5] + * asym_mask empty (no other asymmetry apart from + * already covered [0-5]) + * + * #2 + * + * DIE [ ] + * MC [ ][ ] + * [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] + * |.....| |.....| |.....| |.....| |.....| + * L M B L B + * + * MC topology level will be marked with both SD_ASYM_CPUCAPACITY flags, + * but the relevant masks will be: + * asym_full_mask = [0-5] + * asym_mask = [6-9] + */ + if (data->asym_mask && cpumask_test_cpu(cpu, data->asym_mask)) + return SD_ASYM_CPUCAPACITY; leave: - return flags; + return 0; } static inline void asym_cpu_capacity_release_data(struct asym_cache_data *data) { + struct sched_domain_topology_level *tl; struct asym_cache_data *__data = data; if (data) { - while (data->tl) { + for_each_sd_topology(tl) { if (!data->sched_flags) goto next; if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL) @@ -2114,7 +2092,7 @@ static inline void asym_cpu_capacity_release_data(struct asym_cache_data *data) kfree(data->asym_mask); next: ++data; - }; + } kfree(__data); } } @@ -2168,7 +2146,8 @@ static struct asym_cache_data cpumask_var_t cpu_mask; int cpu; - /* Build-up a list of all CPU capacities, verifying on the way + /* + * Build-up a list of all CPU capacities, verifying on the way * if there is any asymmetry at all */ for_each_cpu(cpu, cpu_map) { @@ -2201,11 +2180,8 @@ static struct asym_cache_data /* Get the number of topology levels */ for_each_sd_topology(tl) level_count++; - /* - * Allocate an array to store cached data - * per each topology level + sentinel - */ - scan_data = kcalloc(level_count + 1, sizeof(*scan_data), GFP_KERNEL); + /* Allocate an array to store cached data per each topology level */ + scan_data = kcalloc(level_count, sizeof(*scan_data), GFP_KERNEL); if (!scan_data) { free_cpumask_var(cpu_mask); goto leave; @@ -2221,19 +2197,16 @@ static struct asym_cache_data #ifdef CONFIG_NUMA /* - * For NUMA we might end-up in a sched domain - * that spans numa nodes with cpus with - * different capacities which would not be caught - * by the above scan as those will have - * separate cpumasks - subject to numa level + * For NUMA we might end-up in a sched domain that spans numa + * nodes with cpus with different capacities which would not be + * caught by the above scan as those will have separate cpumasks + * - subject to numa level * @see: sched_domains_curr_level & sd_numa_mask * Considered to be a no-go */ if (WARN_ON_ONCE(tl->numa_level && !full_asym)) goto leave; #endif - data->tl = tl; - if (asym_tl) { data->sched_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL; @@ -2247,10 +2220,10 @@ static struct asym_cache_data int i; /* - * Tracking each CPU capacity 'scan' id - * to distinguish discovered capacity sets - * between different CPU masks at each topology level: - * capturing unique capacity values at each scan stage + * Tracking each CPU capacity 'scan' id to distinguish + * discovered capacity sets between different CPU masks + * at each topology level: capturing unique capacity + * values at each scan stage */ ++scan_id; local_cap_count = 0; @@ -2261,8 +2234,8 @@ static struct asym_cache_data capacity = arch_scale_cpu_capacity(i); list_for_each_entry(entry, &capacity_set, link) { - if (entry->capacity == capacity - && entry->scan_level < scan_id) { + if (entry->capacity == capacity && + entry->scan_level < scan_id) { entry->scan_level = scan_id; ++local_cap_count; } @@ -2334,12 +2307,12 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att for_each_cpu(i, cpu_map) { struct sched_domain_topology_level *tl; int dflags = 0; + int tlid = 0; sd = NULL; for_each_sd_topology(tl) { if (!(dflags & SD_ASYM_CPUCAPACITY_FULL)) { - dflags |= asym_cpu_capacity_verify(asym_scan_data, - tl, i); + dflags |= asym_cpu_capacity_verify(&asym_scan_data[tlid], i); has_asym = dflags & SD_ASYM_CPUCAPACITY; } @@ -2354,6 +2327,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att sd->flags |= SD_OVERLAP; if (cpumask_equal(cpu_map, sched_domain_span(sd))) break; + + tlid++; } } ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection 2021-04-21 19:58 ` Valentin Schneider @ 2021-04-22 10:27 ` Beata Michalska 0 siblings, 0 replies; 7+ messages in thread From: Beata Michalska @ 2021-04-22 10:27 UTC (permalink / raw) To: Valentin Schneider Cc: linux-kernel, peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, corbet, linux-doc On Wed, Apr 21, 2021 at 08:58:01PM +0100, Valentin Schneider wrote: Hi Valentin, > > Hi Beata, > > On 16/04/21 14:01, Beata Michalska wrote: > > Currently the CPU capacity asymmetry detection, performed through > > asym_cpu_capacity_level, tries to identify the lowest topology level > > at which the highest CPU capacity is being observed, not necessarily > > finding the level at which all possible capacity values are visible > > to all CPUs > > Despite the latter being what it says on the tin! (see comment atop > asym_cpu_capacity_level()) There are some labels on the tins out there that can be deceiving ... > > >, which might be bit problematic for some possible/valid > > asymmetric topologies i.e.: > > > > DIE [ ] > > MC [ ][ ] > > > > CPU [0] [1] [2] [3] [4] [5] [6] [7] > > Capacity |.....| |.....| |.....| |.....| > > L M B B > > > > Where: > > arch_scale_cpu_capacity(L) = 512 > > arch_scale_cpu_capacity(M) = 871 > > arch_scale_cpu_capacity(B) = 1024 > > > > In this particular case, the asymmetric topology level will point > > at MC, as all possible CPU masks for that level do cover the CPU > > with the highest capacity. It will work just fine for the first > > cluster, not so much for the second one though (consider the > > find_energy_efficient_cpu which might end up attempting the energy > > aware wake-up for a domain that does not see any asymmetry at all) > > > > Another problematic topology is something the likes of > > DIE [ ] > MC [ ][ ] > L M B B > > Because here the asymmetric tl will be DIE, so we won't properly recognize > that MC domain with L+M as having CPU asymmetry. That means no misfit > upmigration from L to M, for one. > > The Exynos-based Galaxy S10 *almost* matches that topology - from what I've > been able to scrounge, all CPUs are hooked up to the same LLC *but* the big > CPUs have exclusive access to some part of it. From the devicetree files > I've been able to see, the big cores are actually described as having their > own LLC. > That's also an example of modification to wake-up path to workaround the current problem (looking at available kernel for that platform) > Regardless, the topology you describe in this changelog is something that's > achievable by cobbling two DynamIQ clusters (each with their own LLC) to an > interconnect, which the architecture supports (IIRC up to something like 32 > clusters). > > > Rework the way the capacity asymmetry levels are being detected, > > to point to the lowest topology level( for a given CPU), where full > > range of available CPU capacities is visible to all CPUs within given > > domain. As a result, the per-cpu sd_asym_cpucapacity might differ > > across the domains. This will have an impact on EAS wake-up placement > > in a way that it might see different range of CPUs to be considered, > > depending on the given current and target CPUs. > > > > Additionally, those levels, where any range of asymmetry (not > > necessarily full) is being detected will get identified as well. > > The selected asymmetric topology level will be denoted by > > SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels' > > would receive the already used SD_ASYM_CPUCAPACITY flag. This allows > > maintaining the current behaviour for asymmetric topologies, with > > misfit migration operating correctly on lower levels, if applicable, > > as any asymmetry is enough to trigger the misfit migration. > > The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not > > relate to the full asymmetry level denoted by the sd_asym_cpucapacity > > pointer. > > > > Signed-off-by: Beata Michalska <beata.michalska@arm.com> > > Most of this looks OK to me, I have a few comments below but nothing > major. FWIW I've appended a diff at the tail of that email which covers > (most) said comments. > > > --- > > kernel/sched/topology.c | 339 ++++++++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 299 insertions(+), 40 deletions(-) > > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index 09d3504..9dfa66b 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -675,7 +675,7 @@ static void update_top_cache_domain(int cpu) > > sd = highest_flag_domain(cpu, SD_ASYM_PACKING); > > rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd); > > > > - sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY); > > + sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY_FULL); > > rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); > > } > > > > @@ -1958,65 +1958,322 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, > > > > return true; > > } > > +/** > > + * Asym capacity bits > > + */ > > + > > +/** > > + * Cached cpu masks for those sched domains, at a given topology level, > > + * that do represent CPUs with asymmetric capacities. > > + * > > + * Each topology level will get the cached data assigned, > > + * with asym cap sched_flags (SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL > > + * accordingly) and the corresponding cpumask for: > > + * - domains that do span CPUs with different capacities > > + * - domains where all CPU capacities are visible for all CPUs within > > + * the domain > > + * > > + * Within a single topology level there might be domains > > + * with different scope of asymmetry: > > + * none -> . > > + * partial -> SD_ASYM_CPUCAPACITY > > + * full -> SD_ASYM_CPUCAPACITY|SD_ASYM_CPUCAPACITY_FULL > > + */ > > +struct asym_cache_data { > > + > > + struct sched_domain_topology_level *tl; > > + unsigned int sched_flags; > > + struct cpumask *asym_mask; > > + struct cpumask *asym_full_mask; > > +}; > > + > > I'll dump this here because I think it can be useful to whoever else will > stare at this: > > This is pretty much an extension of struct sched_domain_topology_level, > providing a new / modified tl->sd_flags() output. Unfortunately, said > output requires either a cpumask per flag per topology level or a flag > accumulator per topology level per CPU. > > In light of this, it's preferable to keep this extra data outside of the > sched_domain_topology_level struct and have its lifespan limited to the > domain build, which is what's being done here. > > > +*asym_cpu_capacity_scan(const struct cpumask *cpu_map) > > { > > - int i, j, asym_level = 0; > > - bool asym = false; > > + /* > > + * Simple data structure to record all available CPU capacities. > > + * Additional scan level allows tracking unique capacities per each > > + * topology level and each separate topology level CPU mask. > > + * During each scan phase, the scan level will allow to determine, > > + * whether given capacity has been already accounted for, by syncing > > + * it with the scan stage id. > > + */ > > + struct capacity_entry { > > + struct list_head link; > > + unsigned long capacity; > > + unsigned int scan_level; > > + }; > > + > > struct sched_domain_topology_level *tl, *asym_tl = NULL; > > - unsigned long cap; > > + struct asym_cache_data *scan_data = NULL; > > + struct capacity_entry *entry = NULL, *tmp; > > + unsigned int level_count = 0; > > + unsigned int cap_count = 0; > > + unsigned int scan_id = 0; > > + LIST_HEAD(capacity_set); > > + unsigned long capacity; > > + cpumask_var_t cpu_mask; > > + int cpu; > > > > - /* Is there any asymmetry? */ > > - cap = arch_scale_cpu_capacity(cpumask_first(cpu_map)); > > + /* Build-up a list of all CPU capacities, verifying on the way > > + * if there is any asymmetry at all > > That's a wrong comment style. > > > /* > > - * Examine topology from all CPU's point of views to detect the lowest > > - * sched_domain_topology_level where a highest capacity CPU is visible > > - * to everyone. > > + * Allocate an array to store cached data > > + * per each topology level + sentinel > > */ > > - for_each_cpu(i, cpu_map) { > > - unsigned long max_capacity = arch_scale_cpu_capacity(i); > > - int tl_id = 0; > > + scan_data = kcalloc(level_count + 1, sizeof(*scan_data), GFP_KERNEL); > > Given we have one cache per tl, do we need the sentinel? I gave that a shot > and it didn't explode, also further simplified asym_cpu_capacity_verify(), > see appended diff. > It won't explode now indeed, though sentinel here is to make it bulletproof for potential future changes to the code around it. > > + if (!scan_data) { > > + free_cpumask_var(cpu_mask); > > + goto leave; > > + } > [...] > > + for_each_cpu_and(i, mask, cpu_map) { > > + > > + capacity = arch_scale_cpu_capacity(i); > > > > - max_capacity = capacity; > > - asym_level = tl_id; > > - asym_tl = tl; > > + list_for_each_entry(entry, &capacity_set, link) { > > + if (entry->capacity == capacity > > + && entry->scan_level < scan_id) { > ^^ > Operand should be at EOL. > > > + entry->scan_level = scan_id; > > + ++local_cap_count; > > + } > > + } > > + __cpumask_clear_cpu(i, cpu_mask); > > + } > > @@ -2049,9 +2306,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > > > sd = NULL; > > for_each_sd_topology(tl) { > > - if (tl == tl_asym) { > > - dflags |= SD_ASYM_CPUCAPACITY; > > - has_asym = true; > > + if (!(dflags & SD_ASYM_CPUCAPACITY_FULL)) { > > + dflags |= asym_cpu_capacity_verify(asym_scan_data, > > + tl, i); > > + has_asym = dflags & SD_ASYM_CPUCAPACITY; > > } > > Given this dflags & SD_ASYM_CPUCAPACITY_FULL check, is the maskless > optimization thing actually required? > > AIUI, for any CPU, the first topology level where we'll set > SD_ASYM_CPUCAPACITY_FULL should have a matching > asym_scan_data[tlid]->asym_full_mask, and all subsequent tl's will see that > in dflags and not call into asym_cpu_capacity_verify(). > That's the point here, yes. It doesn't bring huge benefit but still I prefer to skip unnecessary iterations through the cache table if possible. Although with the changes proposed that might go away. > > > > if (WARN_ON(!topology_span_sane(tl, cpu_map, i))) > > @@ -2068,6 +2326,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > } > > } > > > > + asym_cpu_capacity_release_data(asym_scan_data); > > /* Build the groups for the domains */ > > for_each_cpu(i, cpu_map) { > > for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { > > -- > > 2.7.4 > > --- > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 1f965293cc7e..31d89868f208 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -2011,102 +2011,80 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, > * full -> SD_ASYM_CPUCAPACITY|SD_ASYM_CPUCAPACITY_FULL > */ > struct asym_cache_data { > - > - struct sched_domain_topology_level *tl; > unsigned int sched_flags; > struct cpumask *asym_mask; > struct cpumask *asym_full_mask; > }; > > -static inline int asym_cpu_capacity_verify(struct asym_cache_data *data, > - struct sched_domain_topology_level *tl, int cpu) > +static inline int asym_cpu_capacity_verify(struct asym_cache_data *data, int cpu) > { > - int flags = 0; > - > if (!data) > goto leave; > > - while (data->tl) { > - if (data->tl != tl) { > - ++data; > - continue; > - } > - > - if (!data->sched_flags) > - break; > - /* > - * For topology levels above one, where all CPUs observe > - * all available capacities, CPUs mask is not being > - * cached for optimization reasons, assuming, that at this > - * point, all possible CPUs are being concerned. > - * Those levels will have both: > - * SD_ASYM_CPUCAPACITY and SD_ASYM_CPUCAPACITY_FULL > - * flags set. > - */ > - if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL && > - !data->asym_full_mask) { > - flags = data->sched_flags; > - break; > - } > - > - if (data->asym_full_mask && > - cpumask_test_cpu(cpu, data->asym_full_mask)) { > - flags = data->sched_flags; > - break; > - } > - /* > - * A given topology level might be marked with > - * SD_ASYM_CPUCAPACITY_FULL mask but only for a certain subset > - * of CPUs. > - * Consider the following: > - * #1 > - * > - * DIE [ ] > - * MC [ ][ ] > - * [0] [1] [2] [3] [4] [5] [6] [7] > - * |.....| |.....| |.....| |.....| > - * L M B B > - * > - * where: > - * arch_scale_cpu_capacity(L) = 512 > - * arch_scale_cpu_capacity(M) = 871 > - * arch_scale_cpu_capacity(B) = 1024 > - * > - * MC topology level will be marked with both > - * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: > - * asym_full_mask = [0-5] > - * asym_mask empty (no other asymmetry apart from > - * already covered [0-5]) > - * > - * #2 > - * > - * DIE [ ] > - * MC [ ][ ] > - * [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] > - * |.....| |.....| |.....| |.....| |.....| > - * L M B L B > - * > - * MC topology level will be marked with both > - * SD_ASYM_CPUCAPACITY flags, but the relevant masks will be: > - * asym_full_mask = [0-5] > - * asym_mask = [6-9] > - */ > - if (data->asym_mask && cpumask_test_cpu(cpu, data->asym_mask)) > - flags = SD_ASYM_CPUCAPACITY; > - break; > + if (!data->sched_flags) > + goto leave; > + /* > + * For topology levels above one, where all CPUs observe all available > + * capacities, CPUs mask is not being cached for optimization reasons, > + * assuming, that at this point, all possible CPUs are being concerned. > + * Those levels will have both: SD_ASYM_CPUCAPACITY and > + * SD_ASYM_CPUCAPACITY_FULL flags set. > + */ > + if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL && !data->asym_full_mask) > + return data->sched_flags; > > - } > + if (data->asym_full_mask && cpumask_test_cpu(cpu, data->asym_full_mask)) > + return data->sched_flags; > + /* > + * A given topology level might be marked with SD_ASYM_CPUCAPACITY_FULL > + * mask but only for a certain subset of CPUs. > + * Consider the following: > + * #1 > + * > + * DIE [ ] > + * MC [ ][ ] > + * [0] [1] [2] [3] [4] [5] [6] [7] > + * |.....| |.....| |.....| |.....| > + * L M B B > + * > + * where: > + * arch_scale_cpu_capacity(L) = 512 > + * arch_scale_cpu_capacity(M) = 871 > + * arch_scale_cpu_capacity(B) = 1024 > + * > + * MC topology level will be marked with both SD_ASYM_CPUCAPACITY flags, > + * but the relevant masks will be: > + * asym_full_mask = [0-5] > + * asym_mask empty (no other asymmetry apart from > + * already covered [0-5]) > + * > + * #2 > + * > + * DIE [ ] > + * MC [ ][ ] > + * [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] > + * |.....| |.....| |.....| |.....| |.....| > + * L M B L B > + * > + * MC topology level will be marked with both SD_ASYM_CPUCAPACITY flags, > + * but the relevant masks will be: > + * asym_full_mask = [0-5] > + * asym_mask = [6-9] > + */ > + if (data->asym_mask && cpumask_test_cpu(cpu, data->asym_mask)) > + return SD_ASYM_CPUCAPACITY; > leave: > - return flags; > + return 0; > } > > > static inline void asym_cpu_capacity_release_data(struct asym_cache_data *data) > { > + struct sched_domain_topology_level *tl; > struct asym_cache_data *__data = data; > > if (data) { > - while (data->tl) { > + for_each_sd_topology(tl) { > if (!data->sched_flags) > goto next; > if (data->sched_flags & SD_ASYM_CPUCAPACITY_FULL) > @@ -2114,7 +2092,7 @@ static inline void asym_cpu_capacity_release_data(struct asym_cache_data *data) > kfree(data->asym_mask); > next: > ++data; > - }; > + } > kfree(__data); > } > } > @@ -2168,7 +2146,8 @@ static struct asym_cache_data > cpumask_var_t cpu_mask; > int cpu; > > - /* Build-up a list of all CPU capacities, verifying on the way > + /* > + * Build-up a list of all CPU capacities, verifying on the way > * if there is any asymmetry at all > */ > for_each_cpu(cpu, cpu_map) { > @@ -2201,11 +2180,8 @@ static struct asym_cache_data > > /* Get the number of topology levels */ > for_each_sd_topology(tl) level_count++; > - /* > - * Allocate an array to store cached data > - * per each topology level + sentinel > - */ > - scan_data = kcalloc(level_count + 1, sizeof(*scan_data), GFP_KERNEL); > + /* Allocate an array to store cached data per each topology level */ > + scan_data = kcalloc(level_count, sizeof(*scan_data), GFP_KERNEL); > if (!scan_data) { > free_cpumask_var(cpu_mask); > goto leave; > @@ -2221,19 +2197,16 @@ static struct asym_cache_data > > #ifdef CONFIG_NUMA > /* > - * For NUMA we might end-up in a sched domain > - * that spans numa nodes with cpus with > - * different capacities which would not be caught > - * by the above scan as those will have > - * separate cpumasks - subject to numa level > + * For NUMA we might end-up in a sched domain that spans numa > + * nodes with cpus with different capacities which would not be > + * caught by the above scan as those will have separate cpumasks > + * - subject to numa level > * @see: sched_domains_curr_level & sd_numa_mask > * Considered to be a no-go > */ > if (WARN_ON_ONCE(tl->numa_level && !full_asym)) > goto leave; > #endif > - data->tl = tl; > - > if (asym_tl) { > data->sched_flags = SD_ASYM_CPUCAPACITY | > SD_ASYM_CPUCAPACITY_FULL; > @@ -2247,10 +2220,10 @@ static struct asym_cache_data > int i; > > /* > - * Tracking each CPU capacity 'scan' id > - * to distinguish discovered capacity sets > - * between different CPU masks at each topology level: > - * capturing unique capacity values at each scan stage > + * Tracking each CPU capacity 'scan' id to distinguish > + * discovered capacity sets between different CPU masks > + * at each topology level: capturing unique capacity > + * values at each scan stage > */ > ++scan_id; > local_cap_count = 0; > @@ -2261,8 +2234,8 @@ static struct asym_cache_data > capacity = arch_scale_cpu_capacity(i); > > list_for_each_entry(entry, &capacity_set, link) { > - if (entry->capacity == capacity > - && entry->scan_level < scan_id) { > + if (entry->capacity == capacity && > + entry->scan_level < scan_id) { > entry->scan_level = scan_id; > ++local_cap_count; > } > @@ -2334,12 +2307,12 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > for_each_cpu(i, cpu_map) { > struct sched_domain_topology_level *tl; > int dflags = 0; > + int tlid = 0; > > sd = NULL; > for_each_sd_topology(tl) { > if (!(dflags & SD_ASYM_CPUCAPACITY_FULL)) { > - dflags |= asym_cpu_capacity_verify(asym_scan_data, > - tl, i); > + dflags |= asym_cpu_capacity_verify(&asym_scan_data[tlid], i); > has_asym = dflags & SD_ASYM_CPUCAPACITY; > } > > @@ -2354,6 +2327,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > sd->flags |= SD_OVERLAP; > if (cpumask_equal(cpu_map, sched_domain_span(sd))) > break; > + > + tlid++; > } > } > This might indeed simplify things a bit. I've tried to get the changes self-contained and not 'expose' the internals of caching mechanism to the users but I might drop the idea in favour of making it less cumbersome. I will let it sit for a day or two before sending off v2, hoping for more input on the changes. Thanks for your feedback. --- BR B. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC PATCH 3/3] sched/doc: Update the CPU capacity asymmetry bits 2021-04-16 13:01 [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection Beata Michalska @ 2021-04-16 13:01 ` Beata Michalska 2021-04-21 19:58 ` [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Valentin Schneider 3 siblings, 0 replies; 7+ messages in thread From: Beata Michalska @ 2021-04-16 13:01 UTC (permalink / raw) To: linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, valentin.schneider, dietmar.eggemann, corbet, linux-doc Update the documentation bits referring to capacity aware scheduling with regards to newly introduced SD_ASYM_CPUCAPACITY_FULL shed_domain flag. Signed-off-by: Beata Michalska <beata.michalska@arm.com> --- Documentation/scheduler/sched-capacity.rst | 6 ++++-- Documentation/scheduler/sched-energy.rst | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/Documentation/scheduler/sched-capacity.rst b/Documentation/scheduler/sched-capacity.rst index 9b7cbe4..92d16e7 100644 --- a/Documentation/scheduler/sched-capacity.rst +++ b/Documentation/scheduler/sched-capacity.rst @@ -284,8 +284,10 @@ whether the system exhibits asymmetric CPU capacities. Should that be the case: - The sched_asym_cpucapacity static key will be enabled. -- The SD_ASYM_CPUCAPACITY flag will be set at the lowest sched_domain level that - spans all unique CPU capacity values. +- The SD_ASYM_CPUCAPACITY_FULL flag will be set at the lowest sched_domain + level that spans all unique CPU capacity values. +- The SD_ASYM_CPUCAPACITY flag will be set for any sched_domain that spans + cpus with any range of asymmetry. The sched_asym_cpucapacity static key is intended to guard sections of code that cater to asymmetric CPU capacity systems. Do note however that said key is diff --git a/Documentation/scheduler/sched-energy.rst b/Documentation/scheduler/sched-energy.rst index afe02d3..8fbce5e 100644 --- a/Documentation/scheduler/sched-energy.rst +++ b/Documentation/scheduler/sched-energy.rst @@ -328,7 +328,7 @@ section lists these dependencies and provides hints as to how they can be met. As mentioned in the introduction, EAS is only supported on platforms with asymmetric CPU topologies for now. This requirement is checked at run-time by -looking for the presence of the SD_ASYM_CPUCAPACITY flag when the scheduling +looking for the presence of the SD_ASYM_CPUCAPACITY_FULL flag when the scheduling domains are built. See Documentation/scheduler/sched-capacity.rst for requirements to be met for this -- 2.7.4 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC PATCH 0/3] Rework CPU capacity asymmetry detection 2021-04-16 13:01 [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Beata Michalska ` (2 preceding siblings ...) 2021-04-16 13:01 ` [RFC PATCH 3/3] sched/doc: Update the CPU capacity asymmetry bits Beata Michalska @ 2021-04-21 19:58 ` Valentin Schneider 3 siblings, 0 replies; 7+ messages in thread From: Valentin Schneider @ 2021-04-21 19:58 UTC (permalink / raw) To: Beata Michalska, linux-kernel Cc: peterz, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, corbet, linux-doc On 16/04/21 14:01, Beata Michalska wrote: > Verified on (mostly): > - QEMU (version 4.2.1) with variants of possible asymmetric topologies > - machine: virt > - modifying the device-tree 'cpus' node for virt machine: > > qemu-system-aarch64 -kernel $KERNEL_IMG > -drive format=qcow2,file=$IMAGE > -append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug > loglevel=15 kmemleak=on' -m 2G --nographic -cpu cortex-a57 > -machine virt -smp cores=6 -machine dumpdtb=$CUSTOM_DTB.dtb > > $KERNEL_PATH/scripts/dtc/dtc -I dtb -O dts $CUSTOM_DTB.dts > > $CUSTOM_DTB.dtb > > (modify the dts) > > $KERNEL_PATH/scripts/dtc/dtc -I dts -O dtb $CUSTOM_DTB.dts > > $CUSTOM_DTB.dtb > > qemu-system-aarch64 -kernel $KERNEL_IMG > -drive format=qcow2,file=$IMAGE > -append 'root=/dev/vda earlycon console=ttyAMA0 sched_debug > loglevel=15 kmemleak=on' -m 2G --nographic -cpu cortex-a57 > -machine virt -smp cores=6 -machine dtb=$CUSTOM_DTB.dtb > Thanks to your QEMU wizardry, I've also tested this on a few funky topologies such as: DIE [ ] MC [ ][ ] 0 1 2 3 4 5 6 7 L L M L L M B B and DIE [ ] MC [ ][ ] 0 1 2 3 4 5 L L M M B B + some hotplug ops, and the resulting SD_ flags all made sense to me. Tested-by: Valentin Schneider <valentin.schneider@arm.com> For patches 1, 3: Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> > > Beata Michalska (3): > sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag > sched/topology: Rework CPU capacity asymmetry detection > sched/doc: Update the CPU capacity asymmetry bits > > Documentation/scheduler/sched-capacity.rst | 6 +- > Documentation/scheduler/sched-energy.rst | 2 +- > include/linux/sched/sd_flags.h | 10 + > kernel/sched/topology.c | 339 +++++++++++++++++++++++++---- > 4 files changed, 314 insertions(+), 43 deletions(-) > > -- > 2.7.4 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-04-22 10:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-16 13:01 [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 2/3] sched/topology: Rework CPU capacity asymmetry detection Beata Michalska 2021-04-21 19:58 ` Valentin Schneider 2021-04-22 10:27 ` Beata Michalska 2021-04-16 13:01 ` [RFC PATCH 3/3] sched/doc: Update the CPU capacity asymmetry bits Beata Michalska 2021-04-21 19:58 ` [RFC PATCH 0/3] Rework CPU capacity asymmetry detection Valentin Schneider
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).