Re: [RFT][PATCH 0/3] cpufreq / PM: QoS: Introduce frequency QoS and use it in cpufreq

From: Leonard Crestez <leonard.crestez@nxp.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Viresh Kumar" <viresh.kumar@linaro.org>,
	"Linux ACPI" <linux-acpi@vger.kernel.org>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Sudeep Holla" <sudeep.holla@arm.com>,
	"Dmitry Osipenko" <digetx@gmail.com>,
	"Matthias Kaehlcke" <mka@chromium.org>,
	"Kyungmin Park" <kyungmin.park@samsung.com>,
	"Chanwoo Choi" <cw00.choi@samsung.com>,
	"Artur Świgoń" <a.swigon@samsung.com>,
	"Georgi Djakov" <georgi.djakov@linaro.org>,
	dl-linux-imx <linux-imx@nxp.com>,
	"Saravana Kannan" <saravanak@google.com>,
	"MyungJoo Ham" <myungjoo.ham@samsung.com>
Subject: Re: [RFT][PATCH 0/3] cpufreq / PM: QoS: Introduce frequency QoS and use it in cpufreq
Date: Thu, 24 Oct 2019 17:47:14 +0000	[thread overview]
Message-ID: <VI1PR04MB7023C43A2E9B60A26B6DD0CCEE6A0@VI1PR04MB7023.eurprd04.prod.outlook.com> (raw)
In-Reply-To: CAJZ5v0iY4QScdQJW4xzJEMcfxkB2QDjBkR2oo3zBRL7x70PmnQ@mail.gmail.com

On 24.10.2019 16:42, Rafael J. Wysocki wrote:
> On Wed, Oct 23, 2019 at 3:33 PM Leonard Crestez <leonard.crestez@nxp.com> wrote:
>>
>> On 2019-10-23 11:54 AM, Rafael J. Wysocki wrote:
>>> On Wed, Oct 23, 2019 at 4:20 AM Leonard Crestez <leonard.crestez@nxp.com> wrote:
>>>> On 2019-10-23 1:48 AM, Rafael J. Wysocki wrote:
> 
> [cut]
> 
>>>>> But combining the lists of requests for all the CPUs in a policy
>>>>> defeats the idea of automatic aggregation of requests which really is
>>>>> what PM QoS is about.
>>>>
>>>> My primary interest is the "dev" part of dev_pm_qos: making pm_qos
>>>> requests tied to a specific device.
>>>
>>> The list of requests needs to be associated with the user of the
>>> effective constraint.  If that is the device, it is all good.
>>
>> The phrase "user of the effective constraint" is somewhat unclear.
> 
> Fair enough, so let me elaborate.
> 
> The effective constraint (ie. the one resulting from taking all of the
> requests in the relevant QoS list into account) affects the selection
> of an OPP, so it is natural to associate the QoS list producing it
> with a list of OPPs to select.  In the cpufreq case, the policy holds
> the list of OPPs and so it also should hold the corresponding QoS
> lists (for the min and max frequency limits).  It "uses" the effective
> constraints produced by those QoS lists by preventing the OPPs out of
> the between the min and max values  from being selected.
> 
> Essentially, the policy represents a power (clock/voltage) domain with
> multiple components (it doesn't matter what they are at this level of
> abstraction). While there can be multiple sources of QoS requests
> associated with each component, all of these requests ultimately need
> to be passed to the domain for aggregation, because that's where the
> frequency selection decisions are made and so that's where the
> effective constraint value needs to be known. Now, the natural way to
> allow requests from multiple sources to be passed for aggregation is
> to provide a QoS list that they can be added to. That really is what
> PM QoS is for.
> 
>> I'm using the target device as dev for dev_pm_qos, not the requestor.
>> This is consistent with how it was used for cpufreq: thermal called a
>> dev_pm_qos_add_request on with dev = cpu_dev not a thermal sensor or
>> anything else.
> 
> Not really, but close. :-)
> 
> Without my series (that is 5.4-rc4, say), the cpu_cooling driver adds
> its constraint to the device PM QoS of cpufreq_cdev which is a special
> device created by that driver.  That would be fine, except that the
> cpufreq core doesn't use that QoS.  It uses the device PM QoS of the
> policy->cpu device instead.  That is, that's where it adds its
> notifiers (see cpufreq_policy_alloc()), that's where user space
> requests are added (see cpufreq_online()), and (most important) that's
> where the effective constraint value is read from (see
> cpufreq_set_policy()).  That turns out to be problematic (in addition
> to the cpu_cooling driver's QoS requests going nowhere), because
> confusion ensues if the current policy->cpu goes away.

That behavior in cpu_cooling seems like a bug.

>> However looking at other dev_pm_qos users there are instances of a
>> driver calling dev_pm_qos_add_request on it's own device but this is not
>> a strict requirement, correct?
> 
> No, it isn't.
> 
>>>>> There have to be two lists of requests per policy, one for the max and
>>>>> one for the min frequency >
>>>>>> If cpufreq needs a group of CPUs to run at the same frequency then it
>>>>>> should deal with this by doing dev_pm_qos_read_frequency on each CPU
>>>>>> device and picking a frequency that attempts to satisfy all constraints.
>>>>>
>>>>> No, that would be combining the requests by hand.
>>>>
>>>> It's just a loop though.
>>>
>>> Yes, it is, and needs to be run on every change of an effective
>>> constraint for any CPU even if the total effective constraint doesn't
>>> change.  And, of course, the per-policy user space limits would need
>>> to be combined with that by hand.
>>>
>>> Not particularly straightforward if you asked me.
>>
>> Well, this cpu-to-policy aggregation could also use a pm_qos_constraint
>> object instead of looping.
> 
> Yes, it could, but then somebody would need to add those
> "intermediate" requests to a proper policy-level QoS and it would need
> an extra notifier invocation to update each of them on a "component"
> QoS change.
> 
> This is an interesting idea in case we ever need to improve the
> scalability of the QoS lists, but I'd rather use the simpler approach
> for now.

The advantage I see is reducing the exposure of cpufreq internals

>>>>> Well, the cpufreq sysfs is per-policy and not per-CPU and we really
>>>>> need a per-policy min and max frequency in cpufreq, for governors etc.
>>>>
>>>> Aggregation could be performed at two levels:
>>>>
>>>> 1) Per cpu device (by dev_pm_qos)
>>>> 2) Per policy (inside cpufreq)
>>>>
>>>> The per-cpu dev_pm_qos notifier would just update a per-policy
>>>> pm_qos_constraints object. The second step could even be done strictly
>>>> inside the cpufreq core using existing pm_qos, no need to invent new
>>>> frameworks.
>>>>
>>>> Maybe dev_pm_qos is not a very good fit for cpufreq because of these
>>>> "cpu device versus cpufreq_policy" issues but it makes a ton of sense
>>>> for devfreq. Can you maybe hold PATCH 3 from this series pending further
>>>> discussion?
>>>
>>> It can be reverted at any time if need be and in 5.4 that would be dead code.
>>
>> I guess I can post v10 of my "devfreq pm qos" which starts by reverting
>> "PATCH 3" of this series?
> 
> You may do that, but I would consider adding a struct freq_constraints
> pointer directly to struct dev_pm_info and using the new frequency QoS
> helpers to manage it.
> 
> Arguably, there is no need to bundle that with the rest of device PM
> QoS and doing the above would help to avoid some code duplication too.

Adding to struct dev_pm_info would increase sizeof(struct device) while 
dev_pm_qos only allocates memory when constraints are added. My 
expectation is that very few devices would even have min_freq and 
max_freq constraints.

Maybe struct dev_pm_qos could host a "struct freq_constraints freq" 
instead of two separate "struct pm_qos_constraints min/max_frequency"? 
This way there would be two users of freq_constraints: cpufreq_policy 
(which is not a device) and dev_pm_qos.

In the future freq_constraints might be extended to implement some logic 
for conflicts between min_freq and max_freq requests.

--
Regards,
Leonard