linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Lukasz Luba <lukasz.luba@arm.com>
To: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Rob Herring <robh@kernel.org>,
	daniel.lezcano@linaro.org, devicetree@vger.kernel.org,
	vireshk@kernel.org, linux-pm@vger.kernel.org, rjw@rjwysocki.net,
	linux-kernel@vger.kernel.org, sudeep.holla@arm.com,
	Nicola Mazzucato <nicola.mazzucato@arm.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	chris.redpath@arm.com, morten.rasmussen@arm.com,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies
Date: Mon, 12 Oct 2020 14:48:20 +0100	[thread overview]
Message-ID: <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com> (raw)
In-Reply-To: <20201012105945.GA9219@arm.com>



On 10/12/20 11:59 AM, Ionela Voinescu wrote:
> On Monday 12 Oct 2020 at 11:22:57 (+0100), Lukasz Luba wrote:
> [..]
>>>> I thought about it and looked for other platforms' DT to see if can reuse
>>>> existing opp information. Unfortunately I don't think it is optimal. The reason
>>>> being that, because cpus have the same opp table it does not necessarily mean
>>>> that they share a clock wire. It just tells us that they have the same
>>>> capabilities (literally just tells us they have the same V/f op points).
>>>> Unless I am missing something?
>>>>
>>>> When comparing with ACPI/_PSD it becomes more intuitive that there is no
>>>> equivalent way to reveal "perf-dependencies" in DT.
>>>
>>> You should be able to by examining the clock tree. But perhaps SCMI
>>> abstracts all that and just presents virtual clocks without parent
>>> clocks available to determine what clocks are shared? Fix SCMI if that's
>>> the case.
>>
>> True, the SCMI clock does not support discovery of clock tree:
>> (from 4.6.1 Clock management protocol background)
>> 'The protocol does not cover discovery of the clock tree, which must be
>> described through firmware tables instead.' [1]
>>
>> In this situation, would it make sense, instead of this binding from
>> patch 1/2, create a binding for internal firmware/scmi node?
>>
>> Something like:
>>
>> firmware {
>> 	scmi {
>> 	...		
>> 		scmi-perf-dep {
>> 			compatible = "arm,scmi-perf-dependencies";
>> 			cpu-perf-dep0 {
>> 				cpu-perf-affinity = <&CPU0>, <&CPU1>;
>> 			};
>> 			cpu-perf-dep1 {
>> 				cpu-perf-affinity = <&CPU3>, <&CPU4>;
>> 			};
>> 			cpu-perf-dep2 {
>> 				cpu-perf-affinity = <&CPU7>;
>> 			};
>> 		};
>> 	};
>> };
>>
>> The code which is going to parse the binding would be inside the
>> scmi perf protocol code and used via API by scmi-cpufreq.c.
>>
> 
> While SCMI cpufreq would be able to benefit from the functionality that
> Nicola is trying to introduce, it's not the only driver, and more
> importantly, it's not *going* to be the only driver benefiting from
> this.
> 
> Currently there is also qcom-cpufreq-hw.c and the future
> mediatek-cpufreq-hw.c that is currently under review [1]. They both do
> their frequency setting by interacting with HW/FW, and could either take
> or update their OPP tables from there. Therefore, if the platform would
> require it, they could also expose different controls for frequency
> setting and could benefit from additional information about clock
> domains (either through opp-shared or the new entries in Nicola's patch),
> without driver changes.
> 
> Another point to be made is that I strongly believe this is going to be
> the norm in the future. Directly setting PLLs and regulator voltages
> has been proven unsafe and unsecure.
> 
> Therefore, I see this as support for a generic cpufreq feature (a
> hardware coordination type), rather than support for a specific driver.
> 
> [1] https://lkml.org/lkml/2020/9/10/11
> 
>>
>> Now regarding the 'dependent_cpus' mask.
>>
>> We could avoid adding a new field 'dependent_cpus' in policy
>> struct, but I am not sure of one bit - Frequency Invariant Engine,
>> (which is also not fixed by just adding a new cpumask).
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    Let's take it step by step..
>>
>> We have 3 subsystems to fix:
>> 1. EAS - EM has API function which takes custom cpumask, so no issue,
>             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 	   keep in mind that EAS it's using the max aggregation method
> 	   that schedutil is using. So if we are to describe the
> 	   functionality correctly, it needs both a cpumask describing
> 	   the frequency domains and an aggregation method.

EAS does not use schedutil max agregation, it calculates max_util
internally.

The compute_energy() loops through the CPUs in the domain and
takes the utilization from them via schedutil_cpu_util(cpu_rq(cpu)).
It figures out max_util and then em_cpu_energy() maps it to next
frequency for the cluster. It just needs proper utilization from
CPUs, which is taken from run-queues, which is a sum of utilization
of tasks being there. This leads to problem how we account utilization
of a task. This is the place where the FIE is involved. EAS assumes the
utilization is calculated properly.

> 
>>    fix would be to use it via the scmi-cpufreq.c
> 
>> 2. IPA (for calculating the power of a cluster, not whole thermal needs
>>    this knowledge about 'dependent cpus') - this can be fixed internally
> 
>> 3. Frequency Invariant Engine (FIE) - currently it relies on schedutil
>>    filtering and providing max freq of all cpus in the cluster into the
>>    FIE; this info is then populated to all 'related_cpus' which will
>>    have this freq (we know, because there is no other freq requests);
>>    Issues:
>> 3.1. Schedutil is not going to check all cpus in the cluster to take
>>    max freq, which is then passed into the cpufreq driver and FIE
>> 3.2. FIE would have to (or maybe we would drop it) have a logic similar
>>    to what schedutil does (max freq search and set, then filter next
>>    freq requests from other cpus in the next period e.g. 10ms)
>> 3.3. Schedutil is going to invoke freq change for each cpu independently
>>    and the current code just calls arch_set_freq_scale() - adding just
>>    'dependent_cpus' won't help
> 
> I don't believe these are issues. As we need changes for EAS and IPA, we'd
> need changes for FIE. We don't need more than the cpumask that shows
> frequency domains as we already already have the aggregation method that
> schedutil uses to propagate the max frequency in a domain across CPUs.

Schedutil is going to work in !policy_is_shared() mode, which leads to
sugov_update_single() being the 'main' function. We won't have
schedutil goodness which is handling related_cpus use case.

Then in software FIE would you just change the call from:
	arch_set_freq_scale(policy->related_cpus,...)
to:
	arch_set_freq_scale(policy->dependent_cpus,...)
?

This code would be called from any CPU (without filtering) and it
would loop through cpumask updating freq_scale, which is wrong IMO.
You need some 'logic', which is not currently in there.

Leaving the 'related_cpus' would also be wrong (because real CPU
frequency is different, so we would account task utilization wrongly).

> 
> This would be the default method if cycle counters are not present. It
> might not reflect the frequency the cores actually get from HW, but for
> that cycle counters should be used.

IMHO the configurations with per-cpu freq requests while there are CPUs
'dependent' and there are no HW counters to use for tasks
utilization accounting - should be blocked. Then we don't need
'dependent_cpus' in software FIE. Then one less from your requirements
list for new cpumask.

> 
>> 3.4 What would be the real frequency of these cpus and what would be
>>    set to FIE
>> 3.5 FIE is going to filter to soon requests from other dependent cpus?
>>
>> IMHO the FIE needs more bits than just a new cpumask.
>> Maybe we should consider to move FIE arch_set_freq_scale() call into the
>> cpufreq driver, which will know better how to aggregate/filter requests
>> and then call FIE update?
> 
> I'm quite strongly against this :). As described before, this is not a
> feature that a single driver needs, and even if it was, the aggregation
> method for FIE is not a driver policy.

Software version of FIE has issues in this case, schedutil or EAS won't
help (different code path).

Regards,
Lukasz

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-10-12 13:50 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-24  9:53 [PATCH v2 0/2] CPUFreq: Add support for cpu performance dependencies Nicola Mazzucato
2020-09-24  9:53 ` [PATCH v2 1/2] dt-bindings: arm: Add devicetree binding for cpu-performance-dependencies Nicola Mazzucato
2020-10-08 13:42   ` Ionela Voinescu
2020-09-24  9:53 ` [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies Nicola Mazzucato
2020-10-06  7:19   ` Viresh Kumar
2020-10-07 12:58     ` Nicola Mazzucato
2020-10-08 11:02       ` Viresh Kumar
2020-10-08 15:03         ` Ionela Voinescu
2020-10-08 15:57           ` Rafael J. Wysocki
2020-10-08 17:08             ` Ionela Voinescu
2020-10-12 16:06             ` Sudeep Holla
2020-10-08 16:00           ` Nicola Mazzucato
2020-10-09  5:39             ` Viresh Kumar
2020-10-09 11:10               ` Nicola Mazzucato
2020-10-09 11:17                 ` Viresh Kumar
2020-10-09 14:01                 ` Rob Herring
2020-10-09 15:28                   ` Nicola Mazzucato
2020-10-12  4:19                     ` Viresh Kumar
2020-10-12 10:22                   ` Lukasz Luba
2020-10-12 10:50                     ` Rafael J. Wysocki
2020-10-12 11:05                       ` Lukasz Luba
2020-10-12 10:59                     ` Ionela Voinescu
2020-10-12 13:48                       ` Lukasz Luba [this message]
2020-10-12 16:30                         ` Ionela Voinescu
2020-10-12 18:19                           ` Lukasz Luba
2020-10-12 22:01                             ` Ionela Voinescu
2020-10-13 11:53                               ` Rafael J. Wysocki
2020-10-13 12:39                                 ` Ionela Voinescu
2020-10-15 15:56                                   ` Rafael J. Wysocki
2020-10-15 18:38                                     ` Ionela Voinescu
2020-10-12 13:59                     ` Rob Herring
2020-10-12 16:02                     ` Sudeep Holla
2020-10-12 15:54                   ` Sudeep Holla
2020-10-12 15:49               ` Sudeep Holla
2020-10-12 16:52                 ` Ionela Voinescu
2020-10-12 17:18                   ` Lukasz Luba
2020-10-14  4:25                     ` Viresh Kumar
2020-10-14  9:11                       ` Lukasz Luba
2020-10-19  8:50                       ` Nicola Mazzucato
2020-10-19  9:46                         ` Viresh Kumar
2020-10-19 13:36                           ` Nicola Mazzucato
2020-10-20 10:48                             ` Viresh Kumar
2020-10-13 13:53               ` Lukasz Luba
2020-10-14  4:20                 ` Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=chris.redpath@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=devicetree@vger.kernel.org \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=nicola.mazzucato@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=robh@kernel.org \
    --cc=sudeep.holla@arm.com \
    --cc=viresh.kumar@linaro.org \
    --cc=vireshk@kernel.org \
    --subject='Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).