From: Lukasz Luba <lukasz.luba@arm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: rafael@kernel.org, srinivas.pandruvada@linux.intel.com,
linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
rui.zhang@intel.com, "Rafael J. Wysocki" <rjw@rjwysocki.net>,
Palmer Dabbelt <palmerdabbelt@google.com>,
Anup Patel <anup.patel@wdc.com>,
Atish Patra <atish.patra@wdc.com>, Marc Zyngier <maz@kernel.org>,
Andrew Jones <drjones@redhat.com>,
Michael Kelley <mikelley@microsoft.com>,
Mike Leach <mike.leach@linaro.org>,
Kajol Jain <kjain@linux.ibm.com>,
Daniel Jordan <daniel.m.jordan@oracle.com>,
Steven Price <steven.price@arm.com>
Subject: Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support
Date: Wed, 4 Nov 2020 10:57:20 +0000 [thread overview]
Message-ID: <58408860-1add-c7a3-4c7e-eafb0b6ae16c@arm.com> (raw)
In-Reply-To: <ebdd4728-9105-49ef-e2a6-a17c2a502f34@linaro.org>
On 11/4/20 10:47 AM, Daniel Lezcano wrote:
>
> Hi Lukasz,
>
>
> On 23/10/2020 15:27, Lukasz Luba wrote:
>> Hi Daniel,
>>
>>
>> On 10/6/20 1:20 PM, Daniel Lezcano wrote:
>>> With the powercap dtpm controller, we are able to plug devices with
>>> power limitation features in the tree.
>>>
>>> The following patch introduces the CPU power limitation based on the
>>> energy model and the performance states.
>>>
>>> The power limitation is done at the performance domain level. If some
>>> CPUs are unplugged, the corresponding power will be substracted from
>>> the performance domain total power.
>>>
>>> It is up to the platform to initialize the dtpm tree and add the CPU.
>>>
>>> Here is an example to create a simple tree with one root node called
>>> "pkg" and the cpu's performance domains.
>
> [ ... ]
>
>>> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid,
>>> + u64 power_limit)
>>> +{
>>> + struct dtpm *dtpm = to_dtpm(pcz);
>>> + struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> + struct em_perf_domain *pd;
>>> + unsigned long freq;
>>> + int i, nr_cpus;
>>> +
>>> + spin_lock(&dtpm->lock);
>>> +
>>> + power_limit = clamp_val(power_limit, dtpm->power_min,
>>> dtpm->power_max);
>>> +
>>> + pd = em_cpu_get(dtpm_cpu->cpu);
>>> +
>>> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> + for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> + u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
>>> +
>>> + if ((power * nr_cpus) > power_limit)
>>
>> We have one node in that DTPM hierarchy tree, which represents all CPUs
>> which are in 'related_cpus' mask. I saw below that we just remove the
>> node in hotplug.
>
> The last CPU hotplugged will remove the node.
>
>> I have put a comment below asking if we could change the registration,
>> which will affect power calculation.
>>
>>
>>> + break;
>>> + }
>>> +
>>> + freq = pd->table[i - 1].frequency;
>>> +
>>> + freq_qos_update_request(&dtpm_cpu->qos_req, freq);
>>> +
>>> + dtpm->power_limit = power_limit;
>>> +
>>> + spin_unlock(&dtpm->lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64
>>> *data)
>>> +{
>>> + struct dtpm *dtpm = to_dtpm(pcz);
>>> +
>>> + spin_lock(&dtpm->lock);
>>> + *data = dtpm->power_max;
>>> + spin_unlock(&dtpm->lock);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw)
>>> +{
>>> + struct dtpm *dtpm = to_dtpm(pcz);
>>> + struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> + struct em_perf_domain *pd;
>>> + unsigned long freq;
>>> + int i, nr_cpus;
>>> +
>>> + freq = cpufreq_quick_get(dtpm_cpu->cpu);
>>> + pd = em_cpu_get(dtpm_cpu->cpu);
>>> + nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> + for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> + if (pd->table[i].frequency < freq)
>>> + continue;
>>> +
>>> + *power_uw = pd->table[i].power *
>>> + MICROWATT_PER_MILLIWATT * nr_cpus;
>>
>> Same here, we have 'nr_cpus'.
>>
>>> +
>>> + return 0;
>>> + }
>>> +
>>> + return -EINVAL;
>>> +}
>>> +
>>> +static int cpu_release_zone(struct powercap_zone *pcz)
>>> +{
>>> + struct dtpm *dtpm = to_dtpm(pcz);
>>> + struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +
>>> + freq_qos_remove_request(&dtpm_cpu->qos_req);
>>> +
>>> + return dtpm_release_zone(pcz);
>>> +}
>>> +
>>> +static struct powercap_zone_constraint_ops pd_constraint_ops = {
>>> + .set_power_limit_uw = set_pd_power_limit,
>>> + .get_power_limit_uw = get_pd_power_limit,
>>> +};
>>> +
>>> +static struct powercap_zone_ops pd_zone_ops = {
>>> + .get_power_uw = get_pd_power_uw,
>>> + .release = cpu_release_zone,
>>> +};
>>> +
>>> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
>>> +{
>>> + struct cpufreq_policy *policy;
>>> + struct em_perf_domain *pd;
>>> + struct dtpm *dtpm;
>>> +
>>> + policy = cpufreq_cpu_get(cpu);
>>> +
>>> + if (!policy)
>>> + return 0;
>>> +
>>> + pd = em_cpu_get(cpu);
>>> + if (!pd)
>>> + return -EINVAL;
>>> +
>>> + dtpm = per_cpu(dtpm_per_cpu, cpu);
>>> +
>>> + power_sub(dtpm, pd);
>>> +
>>> + if (cpumask_weight(policy->cpus) != 1)
>>> + return 0;
>>> +
>>> + for_each_cpu(cpu, policy->related_cpus)
>>> + per_cpu(dtpm_per_cpu, cpu) = NULL;
>>
>> Hotplugging one CPU would affect others. I would keep them
>> all but marked somehow that CPU is offline.
>
> No, the last one will remove the node. This is checked in the test above
> (policy->cpus) != 1 ...
>
>>> +
>>> + dtpm_unregister(dtpm);
>>
>> Could we keep the node in the hierarchy on CPU hotplug?
>
> [ ... ]
>
>>> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h
>>> index 6696bdcfdb87..b62215a13baa 100644
>>> --- a/include/linux/dtpm.h
>>> +++ b/include/linux/dtpm.h
>>> @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct
>>> dtpm *dtpm,
>>> int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm
>>> *parent,
>>> struct powercap_zone_ops *ops, int nr_constraints,
>>> struct powercap_zone_constraint_ops *const_ops);
>>> +
>>> +int dtpm_register_cpu(struct dtpm *parent);
>>> +
>>> #endif
>>>
>>
>> I have a few comments for this DTPM CPU.
>>
>> 1. Maybe we can register these CPUs differently. First register
>> the parent node as a separate dtpm based on 'policy->related_cpus. Then
>> register new children nodes, one for each CPU. When the CPU is up, mark
>> it as 'active'.
>>
>> 2. We don't remove the node when the CPU is hotplugged, but we mark it
>> '!active' Or 'offline'. The power calculation could be done in upper
>> node, which takes into account that flag for children.
>>
>> 3. We would only remove the node when it's module is unloaded (e.g. GPU)
>>
>> That would make the tree more stable and also more detailed.
>> We would also account the power properly when one CPU went offline, but
>> the other are still there.
>>
>> What do you think?
>
> The paradigm of the DTPM is the intermediate nodes (have children), are
> aggregating the power of their children and do not represent the real
> devices. The leaves are the real devices which are power manageable.
OK, I see, it makes sense. Thanks for the explanation.
>
> In our case, the CPU DTPM is based on the performance state which is a
> group of CPUs, hence it is a leaf of the tree.
>
> I think you misunderstood the power is recomputed when the CPU is
> switched on/off and the node is removed when the last CPU is hotplugged.
Yes, you are right. I misunderstood the hotplug and then power calc.
>
> eg. 1000mW max per CPU, a performance domain with 4 CPUs.
>
> With all CPUs on, max power is 4000mW
> With 3 CPUs on, and 1 CPU off, max power is 3000mW
>
> etc...
>
> With 4 CPUs off, the node is removed.
>
> If the hardware evolves with a performance domain per CPU, we will end
> up with a leaf per CPU and a "cluster" on top of them.
>
>
Let me go again through the patches and then I will add my reviewed by.
I will also run LTP hotplug or LISA hotplug torture on this tree,
just to check it's fine.
Regards,
Lukasz
next prev parent reply other threads:[~2020-11-04 10:57 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano
2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano
2020-11-10 10:02 ` Lukasz Luba
2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano
2020-10-13 22:01 ` Ram Chandrasekar
2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano
2020-10-06 16:42 ` kernel test robot
2020-10-06 18:05 ` kernel test robot
2020-10-23 10:29 ` Lukasz Luba
2020-11-03 18:42 ` Daniel Lezcano
2020-11-10 9:59 ` Lukasz Luba
2020-11-10 11:05 ` Lukasz Luba
2020-11-10 14:59 ` Daniel Lezcano
2020-11-10 15:04 ` Lukasz Luba
2020-11-10 12:55 ` Daniel Lezcano
2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano
2020-10-23 13:27 ` Lukasz Luba
2020-11-04 10:47 ` Daniel Lezcano
2020-11-04 10:57 ` Lukasz Luba [this message]
2020-11-04 11:15 ` Daniel Lezcano
2020-11-10 12:50 ` Lukasz Luba
2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede
2020-10-12 10:30 ` Daniel Lezcano
2020-10-12 11:46 ` Hans de Goede
2020-10-12 16:02 ` Daniel Lezcano
2020-10-13 12:47 ` Hans de Goede
2020-10-12 16:37 ` Rafael J. Wysocki
2020-10-13 13:04 ` Hans de Goede
2020-10-14 13:33 ` Rafael J. Wysocki
2020-10-14 14:06 ` Hans de Goede
2020-10-14 15:42 ` Rafael J. Wysocki
2020-10-16 11:10 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede
2020-10-16 14:26 ` Elia Devito
[not found] ` <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com>
2020-10-16 14:43 ` Fw: [External] " Mark Pearson
2020-10-16 15:16 ` Elia Devito
2020-10-16 14:51 ` Rafael J. Wysocki
2020-10-18 9:41 ` Hans de Goede
2020-10-18 12:31 ` Rafael J. Wysocki
2020-10-19 18:43 ` Hans de Goede
[not found] ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com>
2020-10-19 18:49 ` Fw: [External] " Mark Pearson
2020-10-25 10:13 ` Hans de Goede
2020-10-20 12:34 ` Rafael J. Wysocki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58408860-1add-c7a3-4c7e-eafb0b6ae16c@arm.com \
--to=lukasz.luba@arm.com \
--cc=anup.patel@wdc.com \
--cc=atish.patra@wdc.com \
--cc=daniel.lezcano@linaro.org \
--cc=daniel.m.jordan@oracle.com \
--cc=drjones@redhat.com \
--cc=kjain@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mike.leach@linaro.org \
--cc=mikelley@microsoft.com \
--cc=palmerdabbelt@google.com \
--cc=rafael@kernel.org \
--cc=rjw@rjwysocki.net \
--cc=rui.zhang@intel.com \
--cc=srinivas.pandruvada@linux.intel.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).