linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukasz Luba <lukasz.luba@arm.com>
To: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: rafael@kernel.org, srinivas.pandruvada@linux.intel.com,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	rui.zhang@intel.com, "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Palmer Dabbelt <palmerdabbelt@google.com>,
	Anup Patel <anup.patel@wdc.com>,
	Atish Patra <atish.patra@wdc.com>, Marc Zyngier <maz@kernel.org>,
	Andrew Jones <drjones@redhat.com>,
	Michael Kelley <mikelley@microsoft.com>,
	Mike Leach <mike.leach@linaro.org>,
	Kajol Jain <kjain@linux.ibm.com>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	Steven Price <steven.price@arm.com>
Subject: Re: [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support
Date: Wed, 4 Nov 2020 10:57:20 +0000	[thread overview]
Message-ID: <58408860-1add-c7a3-4c7e-eafb0b6ae16c@arm.com> (raw)
In-Reply-To: <ebdd4728-9105-49ef-e2a6-a17c2a502f34@linaro.org>



On 11/4/20 10:47 AM, Daniel Lezcano wrote:
> 
> Hi Lukasz,
> 
> 
> On 23/10/2020 15:27, Lukasz Luba wrote:
>> Hi Daniel,
>>
>>
>> On 10/6/20 1:20 PM, Daniel Lezcano wrote:
>>> With the powercap dtpm controller, we are able to plug devices with
>>> power limitation features in the tree.
>>>
>>> The following patch introduces the CPU power limitation based on the
>>> energy model and the performance states.
>>>
>>> The power limitation is done at the performance domain level. If some
>>> CPUs are unplugged, the corresponding power will be substracted from
>>> the performance domain total power.
>>>
>>> It is up to the platform to initialize the dtpm tree and add the CPU.
>>>
>>> Here is an example to create a simple tree with one root node called
>>> "pkg" and the cpu's performance domains.
> 
> [ ... ]
> 
>>> +static int set_pd_power_limit(struct powercap_zone *pcz, int cid,
>>> +                  u64 power_limit)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +    struct em_perf_domain *pd;
>>> +    unsigned long freq;
>>> +    int i, nr_cpus;
>>> +
>>> +    spin_lock(&dtpm->lock);
>>> +
>>> +    power_limit = clamp_val(power_limit, dtpm->power_min,
>>> dtpm->power_max);
>>> +
>>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>>> +
>>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> +        u64 power = pd->table[i].power * MICROWATT_PER_MILLIWATT;
>>> +
>>> +        if ((power * nr_cpus) > power_limit)
>>
>> We have one node in that DTPM hierarchy tree, which represents all CPUs
>> which are in 'related_cpus' mask. I saw below that we just remove the
>> node in hotplug.
> 
> The last CPU hotplugged will remove the node.
> 
>> I have put a comment below asking if we could change the registration,
>> which will affect power calculation.
>>
>>
>>> +            break;
>>> +    }
>>> +
>>> +    freq = pd->table[i - 1].frequency;
>>> +
>>> +    freq_qos_update_request(&dtpm_cpu->qos_req, freq);
>>> +
>>> +    dtpm->power_limit = power_limit;
>>> +
>>> +    spin_unlock(&dtpm->lock);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int get_pd_power_limit(struct powercap_zone *pcz, int cid, u64
>>> *data)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +
>>> +    spin_lock(&dtpm->lock);
>>> +    *data = dtpm->power_max;
>>> +    spin_unlock(&dtpm->lock);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int get_pd_power_uw(struct powercap_zone *pcz, u64 *power_uw)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +    struct em_perf_domain *pd;
>>> +    unsigned long freq;
>>> +    int i, nr_cpus;
>>> +
>>> +    freq = cpufreq_quick_get(dtpm_cpu->cpu);
>>> +    pd = em_cpu_get(dtpm_cpu->cpu);
>>> +    nr_cpus = cpumask_weight(to_cpumask(pd->cpus));
>>> +
>>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>>> +
>>> +        if (pd->table[i].frequency < freq)
>>> +            continue;
>>> +
>>> +        *power_uw = pd->table[i].power *
>>> +            MICROWATT_PER_MILLIWATT * nr_cpus;
>>
>> Same here, we have 'nr_cpus'.
>>
>>> +
>>> +        return 0;
>>> +    }
>>> +
>>> +    return -EINVAL;
>>> +}
>>> +
>>> +static int cpu_release_zone(struct powercap_zone *pcz)
>>> +{
>>> +    struct dtpm *dtpm = to_dtpm(pcz);
>>> +    struct dtpm_cpu *dtpm_cpu = dtpm->private;
>>> +
>>> +    freq_qos_remove_request(&dtpm_cpu->qos_req);
>>> +
>>> +    return dtpm_release_zone(pcz);
>>> +}
>>> +
>>> +static struct powercap_zone_constraint_ops pd_constraint_ops = {
>>> +    .set_power_limit_uw = set_pd_power_limit,
>>> +    .get_power_limit_uw = get_pd_power_limit,
>>> +};
>>> +
>>> +static struct powercap_zone_ops pd_zone_ops = {
>>> +    .get_power_uw = get_pd_power_uw,
>>> +    .release = cpu_release_zone,
>>> +};
>>> +
>>> +static int cpuhp_dtpm_cpu_offline(unsigned int cpu)
>>> +{
>>> +    struct cpufreq_policy *policy;
>>> +    struct em_perf_domain *pd;
>>> +    struct dtpm *dtpm;
>>> +
>>> +    policy = cpufreq_cpu_get(cpu);
>>> +
>>> +    if (!policy)
>>> +        return 0;
>>> +
>>> +    pd = em_cpu_get(cpu);
>>> +    if (!pd)
>>> +        return -EINVAL;
>>> +
>>> +    dtpm = per_cpu(dtpm_per_cpu, cpu);
>>> +
>>> +    power_sub(dtpm, pd);
>>> +
>>> +    if (cpumask_weight(policy->cpus) != 1)
>>> +        return 0;
>>> +
>>> +    for_each_cpu(cpu, policy->related_cpus)
>>> +        per_cpu(dtpm_per_cpu, cpu) = NULL;
>>
>> Hotplugging one CPU would affect others. I would keep them
>> all but marked somehow that CPU is offline.
> 
> No, the last one will remove the node. This is checked in the test above
> (policy->cpus) != 1 ...
> 
>>> +
>>> +    dtpm_unregister(dtpm);
>>
>> Could we keep the node in the hierarchy on CPU hotplug?
> 
> [ ... ]
> 
>>> diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h
>>> index 6696bdcfdb87..b62215a13baa 100644
>>> --- a/include/linux/dtpm.h
>>> +++ b/include/linux/dtpm.h
>>> @@ -70,4 +70,7 @@ int dtpm_register_parent(const char *name, struct
>>> dtpm *dtpm,
>>>    int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm
>>> *parent,
>>>              struct powercap_zone_ops *ops, int nr_constraints,
>>>              struct powercap_zone_constraint_ops *const_ops);
>>> +
>>> +int dtpm_register_cpu(struct dtpm *parent);
>>> +
>>>    #endif
>>>
>>
>> I have a few comments for this DTPM CPU.
>>
>> 1. Maybe we can register these CPUs differently. First register
>> the parent node as a separate dtpm based on 'policy->related_cpus. Then
>> register new children nodes, one for each CPU. When the CPU is up, mark
>> it as 'active'.
>>
>> 2. We don't remove the node when the CPU is hotplugged, but we mark it
>> '!active' Or 'offline'. The power calculation could be done in upper
>> node, which takes into account that flag for children.
>>
>> 3. We would only remove the node when it's module is unloaded (e.g. GPU)
>>
>> That would make the tree more stable and also more detailed.
>> We would also account the power properly when one CPU went offline, but
>> the other are still there.
>>
>> What do you think?
> 
> The paradigm of the DTPM is the intermediate nodes (have children), are
> aggregating the power of their children and do not represent the real
> devices. The leaves are the real devices which are power manageable.

OK, I see, it makes sense. Thanks for the explanation.

> 
> In our case, the CPU DTPM is based on the performance state which is a
> group of CPUs, hence it is a leaf of the tree.
> 
> I think you misunderstood the power is recomputed when the CPU is
> switched on/off and the node is removed when the last CPU is hotplugged.

Yes, you are right. I misunderstood the hotplug and then power calc.

> 
> eg. 1000mW max per CPU, a performance domain with 4 CPUs.
> 
> With all CPUs on, max power is 4000mW
> With 3 CPUs on, and 1 CPU off, max power is 3000mW
> 
> etc...
> 
> With 4 CPUs off, the node is removed.
> 
> If the hardware evolves with a performance domain per CPU, we will end
> up with a leaf per CPU and a "cluster" on top of them.
> 
> 

Let me go again through the patches and then I will add my reviewed by.

I will also run LTP hotplug or LISA hotplug torture on this tree,
just to check it's fine.

Regards,
Lukasz

  reply	other threads:[~2020-11-04 10:57 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-06 12:20 [PATCH 0/4] powercap/dtpm: Add the DTPM framework Daniel Lezcano
2020-10-06 12:20 ` [PATCH 1/4] units: Add Watt units Daniel Lezcano
2020-11-10 10:02   ` Lukasz Luba
2020-10-06 12:20 ` [PATCH 2/4] Documentation/powercap/dtpm: Add documentation for dtpm Daniel Lezcano
2020-10-13 22:01   ` Ram Chandrasekar
2020-10-06 12:20 ` [PATCH 3/4] powercap/drivers/dtpm: Add API for dynamic thermal power management Daniel Lezcano
2020-10-06 16:42   ` kernel test robot
2020-10-06 18:05   ` kernel test robot
2020-10-23 10:29   ` Lukasz Luba
2020-11-03 18:42     ` Daniel Lezcano
2020-11-10  9:59   ` Lukasz Luba
2020-11-10 11:05     ` Lukasz Luba
2020-11-10 14:59       ` Daniel Lezcano
2020-11-10 15:04         ` Lukasz Luba
2020-11-10 12:55     ` Daniel Lezcano
2020-10-06 12:20 ` [PATCH 4/4] powercap/drivers/dtpm: Add CPU energy model based support Daniel Lezcano
2020-10-23 13:27   ` Lukasz Luba
2020-11-04 10:47     ` Daniel Lezcano
2020-11-04 10:57       ` Lukasz Luba [this message]
2020-11-04 11:15         ` Daniel Lezcano
2020-11-10 12:50   ` Lukasz Luba
2020-10-07 10:43 ` [PATCH 0/4] powercap/dtpm: Add the DTPM framework Hans de Goede
2020-10-12 10:30   ` Daniel Lezcano
2020-10-12 11:46     ` Hans de Goede
2020-10-12 16:02       ` Daniel Lezcano
2020-10-13 12:47         ` Hans de Goede
2020-10-12 16:37       ` Rafael J. Wysocki
2020-10-13 13:04         ` Hans de Goede
2020-10-14 13:33           ` Rafael J. Wysocki
2020-10-14 14:06             ` Hans de Goede
2020-10-14 15:42               ` Rafael J. Wysocki
2020-10-16 11:10                 ` [RFC] Documentation: Add documentation for new performance_profile sysfs class (Also Re: [PATCH 0/4] powercap/dtpm: Add the DTPM framework) Hans de Goede
2020-10-16 14:26                   ` Elia Devito
     [not found]                     ` <HK2PR0302MB2449214B28438ADC1790D468BD030@HK2PR0302MB2449.apcprd03.prod.outlook.com>
2020-10-16 14:43                       ` Fw: [External] " Mark Pearson
2020-10-16 15:16                         ` Elia Devito
2020-10-16 14:51                   ` Rafael J. Wysocki
2020-10-18  9:41                     ` Hans de Goede
2020-10-18 12:31                       ` Rafael J. Wysocki
2020-10-19 18:43                         ` Hans de Goede
     [not found]                           ` <HK2PR0302MB24494037019FBC7720976735BD1E0@HK2PR0302MB2449.apcprd03.prod.outlook.com>
2020-10-19 18:49                             ` Fw: [External] " Mark Pearson
2020-10-25 10:13                               ` Hans de Goede
2020-10-20 12:34                           ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58408860-1add-c7a3-4c7e-eafb0b6ae16c@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=anup.patel@wdc.com \
    --cc=atish.patra@wdc.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=drjones@redhat.com \
    --cc=kjain@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mike.leach@linaro.org \
    --cc=mikelley@microsoft.com \
    --cc=palmerdabbelt@google.com \
    --cc=rafael@kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=rui.zhang@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=steven.price@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).