From: Lukasz Luba <lukasz.luba@arm.com> To: Ionela Voinescu <ionela.voinescu@arm.com> Cc: catalin.marinas@arm.com, will@kernel.org, mark.rutland@arm.com, maz@kernel.org, suzuki.poulose@arm.com, sudeep.holla@arm.com, dietmar.eggemann@arm.com, peterz@infradead.org, mingo@redhat.com, ggherdovich@suse.cz, vincent.guittot@linaro.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance Date: Fri, 24 Jan 2020 15:17:48 +0000 [thread overview] Message-ID: <d541c4ae-8419-0204-f399-7f0f0a18eb38@arm.com> (raw) In-Reply-To: <20200124131134.GA15164@arm.com> On 1/24/20 1:12 PM, Ionela Voinescu wrote: > Hi Lukasz, > > On Friday 24 Jan 2020 at 01:19:31 (+0000), Lukasz Luba wrote: >> >> >> On 1/23/20 5:07 PM, Ionela Voinescu wrote: >>> Hi Lukasz, >>> >>> Thank you for taking a look over the patches. >>> >>> On Thursday 23 Jan 2020 at 11:49:29 (+0000), Lukasz Luba wrote: >>>> Hi Ionela, >>>> >>>> Please find my few comments below. >>>> >>>> On 12/18/19 6:26 PM, Ionela Voinescu wrote: >>>>> The Frequency Invariance Engine (FIE) is providing a frequency >>>>> scaling correction factor that helps achieve more accurate >>>>> load-tracking. >>>>> >>>>> So far, for arm and arm64 platforms, this scale factor has been >>>>> obtained based on the ratio between the current frequency and the >>>>> maximum supported frequency recorded by the cpufreq policy. The >>>>> setting of this scale factor is triggered from cpufreq drivers by >>>>> calling arch_set_freq_scale. The current frequency used in computation >>>>> is the frequency requested by a governor, but it may not be the >>>>> frequency that was implemented by the platform. >>>>> >>>>> This correction factor can also be obtained using a core counter and a >>>>> constant counter to get information on the performance (frequency based >>>>> only) obtained in a period of time. This will more accurately reflect >>>>> the actual current frequency of the CPU, compared with the alternative >>>>> implementation that reflects the request of a performance level from >>>>> the OS. >>>>> >>>>> Therefore, implement arch_scale_freq_tick to use activity monitors, if >>>>> present, for the computation of the frequency scale factor. >>>>> >>>>> The use of AMU counters depends on: >>>>> - CONFIG_ARM64_AMU_EXTN - depents on the AMU extension being present >>>>> - CONFIG_CPU_FREQ - the current frequency obtained using counter >>>>> information is divided by the maximum frequency obtained from the >>>>> cpufreq policy. >>>>> >>>>> While it is possible to have a combination of CPUs in the system with >>>>> and without support for activity monitors, the use of counters for >>>>> frequency invariance is only enabled for a CPU, if all related CPUs >>>>> (CPUs in the same frequency domain) support and have enabled the core >>>> >>>> This looks like an edge case scenario, for which we are designing the >>>> whole machinery with workqueues. AFAIU we cannot run the code in >>>> arch_set_freq_scale() and you want to be check all CPUs upfront. >>>> >>> >>> Unfortunately, I don't believe it to be be an edge-case. Given that this >>> is an optional feature, I do believe that people might skip on >>> implementing it on some CPUs(LITTLEs) while keeping it for CPUs(bigs) >>> where power and thermal mitigation is more probable to happen in firmware. >>> This is the main reason to be conservative in the validation of CPUs and >>> cpufreq policies. >>> >>> In regards to arch_set_freq_scale, I want to be able to tell, when that >>> function is called, if I should return a scale factor based on cpufreq >>> for the current policy. If activity monitors are useable for the CPUs in >>> the full policy, than I'm bailing out and leave the AMU FIE machinery >>> set the scale factor. Unfortunately this works at policy granularity. >>> >>> This could be done in a nicer way by setting the scale factor per cpu >>> and not for all CPUs in a policy in this arch_set_freq_scale function. >>> But this would require some rewriting for the full frequency invariance >>> support in drivers which we've talked about for a while but it was not >>> the purpose of this patch set. But it would eliminate the policy >>> verification I do with the second workqueue. >>> >>>> Maybe you can just wait till all CPUs boot and then set the proper >>>> flags and finish initialization. Something like: >>>> per_cpu(s8, amu_feat) /* form the patch 1/6 */ >>>> OR >>>> per_cpu(u8, amu_scale_freq) /* from this patch */ >>>> with maybe some values: >>>> 0 - not checked yet >>>> 1 - checked and present >>>> -1 - checked and not available >>>> -2 - checked but in conflict with others in the freq domain >>>> -3..-k - other odd configurations >>>> >>>> could potentially eliminate the need of workqueues. >>>> >>>> Then, if we could trigger this from i.e. late_initcall, the CPUs >>>> should be online and you can validate them. >>>> >>> >>> I did initially give such a state machine a try but it proved to be >>> quite messy. A big reason for this is that the activity monitors unit >>> has multiple counters that can be used for different purposes. >>> >>> The amu_feat per_cpu variable only flags that you have the AMU present >>> for potential users (in this case FIE) to validate the counters they >>> need for their respective usecase. For this reason I don't want to >>> overload the meaning of amu_feat. For the same reason I'm not doing the >>> validation of the counters in a generic way, but I'm tying it to the >>> usecase for particular counters. For example, it would not matter if >>> the instructions retired counter is not enabled from firmware for the >>> usecase of FIE. For frequency invariance we only need the core and >>> constant cycle counters and I'm making it the job of the user (arm64 >>> topology code) to do the checking. >>> >>> Secondly, for amu_scale_freq I could have added such a state machine, >>> but I did not think it was useful. The only thing it would change is >>> that I would not have to use the cpu_amu_fie variable in the data >>> structure that gets passed to the work functions. The only way I would >>> eliminate the second workqueue was if I did not do a check of all CPUs >>> in a policy, as described above, and rewrite frequency invariance to >>> work at CPU granularity and not policy granularity. This would eliminate >>> the dependency on cpufreq policy all-together, so it would be worth >>> doing if only for this reason alone :). >>> >>> But even in that case, it's probably not needed to have more than two >>> states for amu_freq_scale. >>> >>> What do you think? >> >> I think currently we are the only users for this AMU and if there will >> be another in the future, then we can start thinking about his proposed >> changes. Let's cross that bridge when we come to it. >> >> Regarding the code, in the arch/arm64/cpufeature.c you can already >> read the cycle registers. All the CPUs are going through that code >> during start. If you use this fact in the late_initcall() all CPUs >> should be checked and you can just ask for cpufreq policy, calculate the >> max_freq ratio, set the per cpu config value to 'ready' state. >> >> Something like in the code below, it is on top of your patch set. >> >> ------------------------>8------------------------------------- >> >> >> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c >> index c639b3e052d7..837ea46d8867 100644 >> --- a/arch/arm64/kernel/cpufeature.c >> +++ b/arch/arm64/kernel/cpufeature.c >> @@ -1168,19 +1168,26 @@ static bool has_hw_dbm(const struct >> arm64_cpu_capabilities *cap, >> * from the current cpu. >> * - cpu_has_amu_feat() >> */ >> -static DEFINE_PER_CPU_READ_MOSTLY(u8, amu_feat); >> - >> -inline bool cpu_has_amu_feat(void) >> -{ >> - return !!this_cpu_read(amu_feat); >> -} >> +DECLARE_PER_CPU(u64, arch_const_cycles_prev); >> +DECLARE_PER_CPU(u64, arch_core_cycles_prev); >> +DECLARE_PER_CPU(u8, amu_scale_freq); >> >> static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap) >> { >> + u64 core_cnt, const_cnt; >> + >> if (has_cpuid_feature(cap, SCOPE_LOCAL_CPU)) { >> pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n", >> smp_processor_id()); >> - this_cpu_write(amu_feat, 1); >> + core_cnt = read_sysreg_s(SYS_AMEVCNTR0_CORE_EL0); >> + const_cnt = read_sysreg_s(SYS_AMEVCNTR0_CONST_EL0); >> + >> + this_cpu_write(arch_core_cycles_prev, core_cnt); >> + this_cpu_write(arch_const_cycles_prev, const_cnt); >> + >> + this_cpu_write(amu_scale_freq, 1); >> + } else { >> + this_cpu_write(amu_scale_freq, 2); >> } >> } > > > Yes, functionally this can be done here (it would need some extra checks > on the initial values of core_cnt and const_cnt), but what I was saying > in my previous comment is that I don't want to mix generic feature > detection, which should happen here, with counter validation for > frequency invariance. As you see, this would already bring here per-cpu > variables for counters and amu_scale_freq flag, and I only see this > getting more messy with the future use of more counters. I don't believe > this code belongs here. > > Looking a bit more over the code and checking against the new frequency > invariance code for x86, there is a case of either doing this CPU > validation in smp_prepare_cpus (separately for arm64 and x86) or calling > an arch_init_freq_invariance() maybe in sched_init_smp to be defined with > the proper frequency invariance counter initialisation code separately > for x86 and arm64. I'll have to look more over the details to make sure > this is feasible. I have found that we could simply draw on from Mark's solution to similar problem. In commit: commit df857416a13734ed9356f6e4f0152d55e4fb748a Author: Mark Rutland <mark.rutland@arm.com> Date: Wed Jul 16 16:32:44 2014 +0100 arm64: cpuinfo: record cpu system register values Several kernel subsystems need to know details about CPU system register values, sometimes for CPUs other than that they are executing on. Rather than hard-coding system register accesses and cross-calls for these cases, this patch adds logic to record various system register values at boot-time. This may be used for feature reporting, firmware bug detection, etc. Separate hooks are added for the boot and hotplug paths to enable one-time intialisation and cold/warm boot value mismatch detection in later patches. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> He added cpuinfo_store_cpu() call in secondary_start_kernel() [in arm64 smp.c]. Please check the file: arch/arm64/kernel/cpuinfo.c We can probably add our read-amu-regs-and-setup-invariance call just below his cpuinfo_store_cpu. Then the arm64 cpufeature.c would be clean, we will be called for each cpu, late_initcal() will finish setup with edge case policy check like in the init_amu_feature() code below. > >> >> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >> index 61f8264afec9..95b34085ae64 100644 >> --- a/arch/arm64/kernel/topology.c >> +++ b/arch/arm64/kernel/topology.c >> @@ -144,8 +144,8 @@ static struct cpu_amu_work __percpu *works; >> static cpumask_var_t cpus_to_visit; >> >> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, arch_max_freq_scale); >> -static DEFINE_PER_CPU(u64, arch_const_cycles_prev); >> -static DEFINE_PER_CPU(u64, arch_core_cycles_prev); >> +DEFINE_PER_CPU(u64, arch_const_cycles_prev); >> +DEFINE_PER_CPU(u64, arch_core_cycles_prev); >> DECLARE_PER_CPU(u8, amu_scale_freq); >> >> static void cpu_amu_fie_init_workfn(struct work_struct *work) >> @@ -323,12 +323,64 @@ static int __init >> register_fie_counters_cpufreq_notifier(void) >> } >> core_initcall(register_fie_counters_cpufreq_notifier); >> >> +static int __init init_amu_feature(void) >> +{ >> + struct cpufreq_policy *policy; >> + struct cpumask *checked_cpus; >> + int count, total; >> + int cpu, i; >> + s8 amu_config; >> + u64 ratio; >> + >> + checked_cpus = kzalloc(cpumask_size(), GFP_KERNEL); >> + if (!checked_cpus) >> + return -ENOMEM; >> + >> + for_each_possible_cpu(cpu) { >> + if (cpumask_test_cpu(cpu, checked_cpus)) >> + continue; >> + >> + policy = cpufreq_cpu_get(cpu); >> + if (!policy) { >> + pr_warn("No cpufreq policy found for CPU%d\n", cpu); >> + continue; >> + } >> + >> + count = total = 0; >> + >> + for_each_cpu(i, policy->related_cpus) { >> + amu_config = per_cpu(amu_scale_freq, i); >> + if (amu_config == 1) >> + count++; >> + total++; >> + } >> + >> + amu_config = (total == count) ? 3 : 4; >> + >> + ratio = (u64)arch_timer_get_rate() << (2 * SCHED_CAPACITY_SHIFT); >> + ratio = div64_u64(ratio, policy->cpuinfo.max_freq * 1000); >> + >> + for_each_cpu(i, policy->related_cpus) { >> + per_cpu(arch_max_freq_scale, i) = (unsigned long)ratio; >> + per_cpu(amu_scale_freq, i) = amu_config; >> + cpumask_set_cpu(i, checked_cpus); >> + } >> + >> + cpufreq_cpu_put(policy); >> + } >> + >> + kfree(checked_cpus); >> + >> + return 0; >> +} >> +late_initcall(init_amu_feature); >> + > > Yes, with the design I mentioned above, this CPU policy validation could > move to a late_initcall and I could drop the workqueues and the extra > data structure. Thanks for this! > > Let me know what you think! > One think is still open, the file drivers/base/arch_topology.c and #ifdef in function arch_set_freq_scale(). Generally, if there is such need, it's better to put such stuff into the header and make dual implementation not polluting generic code with: #if defined(CONFIG_ARM64_XZY) #endif #if defined(CONFIG_POWERPC_ABC) #endif #if defined(CONFIG_x86_QAZ) #endif ... In our case we would need i.e. linux/topology.h because it includes asm/topology.h, which might provide a needed symbol. At the end of linux/topology.h we can have: #ifndef arch_cpu_auto_scaling static __always_inline bool arch_cpu_auto_scaling(void) { return False; } #endif Then, when the symbol was missing and we got the default one, it should be easily optimized by the compiler. We could have a much cleaner function arch_set_freq_scale() in drivers/base/ and all architecture will deal with specific #ifdef CONFIG in their <asm/topology.h> implementations or use default. Example: arch_set_freq_scale() { unsigned long scale; int i; if (arch_cpu_auto_scaling(cpu)) return; scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq; for_each_cpu(i, cpus) per_cpu(freq_scale, i) = scale; } Regards, Lukasz
next prev parent reply other threads:[~2020-01-24 15:18 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-18 18:26 [PATCH v2 0/6] arm64: ARMv8.4 Activity Monitors support Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 1/6] arm64: add support for the AMU extension v1 Ionela Voinescu 2020-01-23 17:04 ` Valentin Schneider 2020-01-23 18:32 ` Ionela Voinescu 2020-01-24 12:00 ` Valentin Schneider 2020-01-28 11:00 ` Ionela Voinescu 2020-01-28 16:34 ` Suzuki Kuruppassery Poulose 2020-01-29 16:42 ` Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 2/6] arm64: trap to EL1 accesses to AMU counters from EL0 Ionela Voinescu 2020-01-23 17:04 ` Valentin Schneider 2020-01-23 17:34 ` Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 3/6] arm64/kvm: disable access to AMU registers from kvm guests Ionela Voinescu 2020-01-27 15:33 ` Valentin Schneider 2020-01-28 15:48 ` Ionela Voinescu 2020-01-28 17:26 ` Suzuki Kuruppassery Poulose 2020-01-28 17:37 ` Valentin Schneider 2020-01-28 17:52 ` Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 4/6] Documentation: arm64: document support for the AMU extension Ionela Voinescu 2020-01-27 16:47 ` Valentin Schneider 2020-01-28 16:53 ` Ionela Voinescu 2020-01-28 18:36 ` Valentin Schneider 2020-01-30 15:04 ` Suzuki Kuruppassery Poulose 2020-01-30 16:45 ` Ionela Voinescu 2020-01-30 18:26 ` Suzuki K Poulose 2020-01-31 9:54 ` Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 5/6] TEMP: sched: add interface for counter-based frequency invariance Ionela Voinescu 2020-01-29 19:37 ` Peter Zijlstra 2020-01-30 15:33 ` Ionela Voinescu 2019-12-18 18:26 ` [PATCH v2 6/6] arm64: use activity monitors for " Ionela Voinescu 2020-01-23 11:49 ` Lukasz Luba 2020-01-23 17:07 ` Ionela Voinescu 2020-01-24 1:19 ` Lukasz Luba 2020-01-24 13:12 ` Ionela Voinescu 2020-01-24 15:17 ` Lukasz Luba [this message] 2020-01-28 17:36 ` Ionela Voinescu 2020-01-29 17:13 ` Valentin Schneider 2020-01-29 17:52 ` Ionela Voinescu 2020-01-29 23:39 ` Valentin Schneider 2020-01-30 15:49 ` Ionela Voinescu 2020-01-30 16:11 ` Valentin Schneider
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=d541c4ae-8419-0204-f399-7f0f0a18eb38@arm.com \ --to=lukasz.luba@arm.com \ --cc=catalin.marinas@arm.com \ --cc=dietmar.eggemann@arm.com \ --cc=ggherdovich@suse.cz \ --cc=ionela.voinescu@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mark.rutland@arm.com \ --cc=maz@kernel.org \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=sudeep.holla@arm.com \ --cc=suzuki.poulose@arm.com \ --cc=vincent.guittot@linaro.org \ --cc=will@kernel.org \ --subject='Re: [PATCH v2 6/6] arm64: use activity monitors for frequency invariance' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).