All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Luba <lukasz.luba@arm.com>
To: Quentin Perret <qperret@google.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Vincent Donnefort <vincent.donnefort@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Linux PM <linux-pm@vger.kernel.org>,
	Ionela Voinescu <ionela.voinescu@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Matthias Kaehlcke <mka@chromium.org>
Subject: Re: [PATCH v4 0/9] Inefficient OPPs
Date: Tue, 10 Aug 2021 15:18:07 +0100	[thread overview]
Message-ID: <cf9d78fe-cff0-1992-2c15-7053e4431296@arm.com> (raw)
In-Reply-To: <YRKINFhDmYqvgxsN@google.com>



On 8/10/21 3:07 PM, Quentin Perret wrote:
> On Tuesday 10 Aug 2021 at 13:28:21 (+0100), Lukasz Luba wrote:
>>
>>
>> On 8/10/21 7:13 AM, Viresh Kumar wrote:
>>> On 04-08-21, 18:21, Rafael J. Wysocki wrote:
>>>> The cpufreq changes are mostly fine by me now, but I would like to
>>>> hear from Viresh regarding this.
>>>
>>> I have few doubts/concerns here.
>>>
>>> Just to iterate it again, the idea here is to choose a higher
>>> frequency, which will work in an efficient manner based on energy
>>> consumption. So this _only_ works for the case where the caller asked
>>> for CPUFREQ_RELATION_L.
>>>
>>> - The new flag being added here, CPUFREQ_RELATION_E, doesn't feel
>>>     complete in this sense to me. It should rather be called as
>>>     CPUFREQ_RELATION_LE instead as it is _always_ used with relation L.
>>>
>>> - IMO this has made the caller site a bit confusing now, i.e.  why we
>>>     send CPUFREQ_RELATION_E sometimes and CPUFREQ_RELATION_H at other
>>>     times. Why shouldn't the _H freq be efficient as well ? I understand
>>>     that this was done to not do the efficient stuff in case of
>>>     userspace/powersave/performance governors.
>>>
>>>     What about reusing following for finding all such cases ?
>>>
>>>           policy->governor->flags & CPUFREQ_GOV_DYNAMIC_SWITCHING
>>>
>>>     The driver can set a flag to tell if it wants efficient frequencies
>>>     or not, and at runtime we apply efficient algorithm only if the
>>>     current governor does DVFS, which will leave out
>>>     userspace/performance/powersave.
>>>
>>>
>>> Now the other thing I didn't like since the beginning, I still don't
>>> like it :)
>>>
>>> A cpufreq table is provided by the driver, which can have some
>>> inefficient frequencies. The inefficient frequencies can be caught by
>>> many parts of the kernel, the current way (in this series) is via EM.
>>> But this can totally be anything else as well, like a devfreq driver.
>>
>> Currently devfreq drivers and governors don't support 'inefficient'
>> OPPs idea. They are not even 'utilization' driven yet. I'm not sure
>> if that would make sense for their workloads. For now, they are far
>> behind the CPUFreq/scheduler development in this field.
>> It needs more research and experiments, to even estimate if this brings
>> benefits. So, I would just skip devfreq use case...
>>
>>>
>>> The way we have tied this whole thing with EM, via
>>> cpufreq_read_inefficiencies_from_em(), is what I don't like. We have
>>> closely bound the whole thing with one of the ways this can be done
>>> and we shouldn't be polluting the cpufreq core with this IMHO. In a
>>> sane world, the cpufreq core should just provide an API, like
>>> cpufreq_set_freq_invariant() and it can be called by any part of
>>> the kernel.
>>>
>>> I understand that the current problem is where do we make that call
>>> from and I suggested dev_pm_opp_of_register_em() for the same earlier.
>>> But that doesn't work as the policy isn't properly initialized by that
>>> point.
>>
>> True, the policy is not initialized yet when cpufreq_driver::init()
>> callback is called, which the current place for scmi-cpufreq.
>>
>> What about the other callback cpufreq_driver::ready() ?
>> This might solve the two issues I mentioned below.
>>
>>>
>>> Now that I see the current implementation, I think we can make it work
>>> in a different way.
>>>
>>> - Copy what's done for thermal-cooling in cpufreq core, i.e.
>>>     CPUFREQ_IS_COOLING_DEV flag, for the EM core as well. So the cpufreq
>>>     drivers can set a flag, CPUFREQ_REGISTER_EM, and the cpufreq core
>>>     can call dev_pm_opp_of_register_em() on their behalf. This call will
>>>     be made from cpufreq_online(), just before/after
>>>     cpufreq_thermal_control_enabled() stuff. By this point the policy is
>>>     properly initialized and is also updated in
>>>     per_cpu(cpufreq_cpu_data).
>>>
>>> - Now the cpufreq core can provide an API like
>>>     cpufreq_set_freq_invariant(int cpu, unsigned long freq), which can
>>>     be called from EM core's implementation of
>>>     em_dev_register_perf_domain().
>>>
>>
>> I have to point out that there are two issues (which we
>> might solve):
>> 1. Advanced setup, due to per-CPU performance requests,
>> which are not limited to real DVFS domain.
>> The scmi-cpufreq (and possibly some other soon) uses more
>> tricky EM setup. As you might recall, the construction of temporary
>> cpumask is a bit complex, since we want per-CPU policy, but the
>> cpumask pass to EM has not a single bit but is 'spanned' with a few
>> CPUs.
>>
>> 2. The scmi-cpufreq (and IIRC MTK cpufreq driver soon) requires
>> custom struct em_data_callback, since the power data is coming from FW.
> 
> +1, we really need this to work.
> 
>> If there is a need for complex EM registration, maybe we could
>> do it in the cpufreq_driver::ready(). The policy would be ready
>> during that call, so it might fly.
> 
> I remember trying this, but ran into issues as well FWIW. I would need
> to check if this is still relevant, but at the time this was introduced
> we needed to register the EM _before_ the policy notifier is called with
> 'CPUFREQ_CREATE_POLICY', because this will trigger a sched domain
> rebuild in the arch_topology driver, which allows the scheduler to pick
> up the newly introduced EM data.
> 

Let me go through that code and experiment if there is an issue.

  reply	other threads:[~2021-08-10 14:24 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-08 10:08 [PATCH v4 0/9] Inefficient OPPs Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 1/9] PM / EM: Fix inefficient states detection Vincent Donnefort
2021-07-08 10:08 ` [PATCH v4 2/9] PM / EM: Mark inefficient states Vincent Donnefort
2021-07-22  7:25   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 3/9] PM / EM: Extend em_perf_domain with a flag field Vincent Donnefort
2021-07-22  7:27   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 4/9] PM / EM: Allow skipping inefficient states Vincent Donnefort
2021-07-22 13:09   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 5/9] cpufreq: Add an interface to mark inefficient frequencies Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 6/9] cpufreq: Add a new freq-table relation CPUFREQ_RELATION_E Vincent Donnefort
2021-07-22  8:17   ` Lukasz Luba
2021-07-08 10:09 ` [PATCH v4 7/9] cpufreq: CPUFREQ_RELATION_E in schedutil ondemand and conservative Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 8/9] cpufreq: Add driver flag CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-07-08 10:09 ` [PATCH v4 9/9] cpufreq: dt: Add CPUFREQ_READ_ENERGY_MODEL Vincent Donnefort
2021-08-04 16:21 ` [PATCH v4 0/9] Inefficient OPPs Rafael J. Wysocki
2021-08-10  6:13   ` Viresh Kumar
2021-08-10  7:39     ` Viresh Kumar
2021-08-10 12:28     ` Lukasz Luba
2021-08-10 14:07       ` Quentin Perret
2021-08-10 14:18         ` Lukasz Luba [this message]
2021-08-10 15:12           ` Lukasz Luba
2021-08-10 15:47             ` Quentin Perret
2021-08-11  5:03               ` Viresh Kumar
2021-08-11 11:38                 ` Lukasz Luba
2021-08-16 14:35                   ` Rafael J. Wysocki
2021-08-17  7:09                     ` Viresh Kumar
2021-08-11  4:01       ` Viresh Kumar
2021-08-16 14:19     ` Rafael J. Wysocki
2021-08-17  7:06       ` Viresh Kumar
2021-08-17  9:03         ` Lukasz Luba
2021-08-23 17:06           ` Vincent Donnefort

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf9d78fe-cff0-1992-2c15-7053e4431296@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-pm@vger.kernel.org \
    --cc=mka@chromium.org \
    --cc=peterz@infradead.org \
    --cc=qperret@google.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.donnefort@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.