* [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM @ 2022-02-07 7:30 Lukasz Luba 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-07 7:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, lukasz.luba, Pierre.Gondois Hi all, The Energy Model supports abstract scale power values. This might cause issues for some mechanisms like thermal governor IPA or DTPM, which expect that all devices provide sane power values. This patch set prevents from registering such devices for IPA and DTPM. Regards, Lukasz Lukasz Luba (2): thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling powercap: DTPM: Check Energy Model type for power values scale drivers/powercap/dtpm_cpu.c | 2 +- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/devfreq_cooling.c | 16 +++++++++++++--- 3 files changed, 15 insertions(+), 5 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-07 7:30 [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Lukasz Luba @ 2022-02-07 7:30 ` Lukasz Luba 2022-02-08 0:50 ` Matthias Kaehlcke 2022-02-17 18:18 ` Lukasz Luba 2022-02-07 7:30 ` [PATCH 2/2] powercap: DTPM: Check Energy Model type for power values scale Lukasz Luba 2022-02-07 10:41 ` [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Daniel Lezcano 2 siblings, 2 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-07 7:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, lukasz.luba, Pierre.Gondois The Energy Model supports power values either in Watts or in some abstract scale. When the 2nd option is in use, the thermal governor IPA should not be allowed to operate, since the relation between cooling devices is not properly defined. Thus, it might be possible that big GPU has lower power values in abstract scale than a Little CPU. To mitigate a misbehaviour of the thermal control algorithm, simply not register a cooling device capable of working with IPA. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/cpufreq_cooling.c | 2 +- drivers/thermal/devfreq_cooling.c | 16 +++++++++++++--- 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index 43b1ae8a7789..f831ed40b333 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -328,7 +328,7 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev, struct cpufreq_policy *policy; unsigned int nr_levels; - if (!em) + if (!em || !(em->flags & EM_PERF_DOMAIN_MILLIWATTS)) return false; policy = cpufreq_cdev->policy; diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 4310cb342a9f..7e8bd1368cab 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -336,6 +336,14 @@ static int devfreq_cooling_gen_tables(struct devfreq_cooling_device *dfc, return 0; } +static inline bool em_is_sane(struct em_perf_domain *em) +{ + if (!em || !(em->flags & EM_PERF_DOMAIN_MILLIWATTS)) + return false; + else + return true; +} + /** * of_devfreq_cooling_register_power() - Register devfreq cooling device, * with OF and power information. @@ -358,6 +366,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, struct thermal_cooling_device *cdev; struct device *dev = df->dev.parent; struct devfreq_cooling_device *dfc; + struct em_perf_domain *em; char *name; int err, num_opps; @@ -367,8 +376,9 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, dfc->devfreq = df; - dfc->em_pd = em_pd_get(dev); - if (dfc->em_pd) { + em = em_pd_get(dev); + if (em_is_sane(em)) { + dfc->em_pd = em; devfreq_cooling_ops.get_requested_power = devfreq_cooling_get_requested_power; devfreq_cooling_ops.state2power = devfreq_cooling_state2power; @@ -379,7 +389,7 @@ of_devfreq_cooling_register_power(struct device_node *np, struct devfreq *df, num_opps = em_pd_nr_perf_states(dfc->em_pd); } else { /* Backward compatibility for drivers which do not use IPA */ - dev_dbg(dev, "missing EM for cooling device\n"); + dev_dbg(dev, "missing proper EM for cooling device\n"); num_opps = dev_pm_opp_get_opp_count(dev); -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba @ 2022-02-08 0:50 ` Matthias Kaehlcke 2022-02-08 9:32 ` Lukasz Luba 2022-02-17 18:18 ` Lukasz Luba 1 sibling, 1 reply; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-08 0:50 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote: > The Energy Model supports power values either in Watts or in some abstract > scale. When the 2nd option is in use, the thermal governor IPA should not > be allowed to operate, since the relation between cooling devices is not > properly defined. Thus, it might be possible that big GPU has lower power > values in abstract scale than a Little CPU. To mitigate a misbehaviour > of the thermal control algorithm, simply not register a cooling device > capable of working with IPA. Ugh, this would break thermal throttling for existing devices that are currently supported in the upstream kernel. Wasn't the conclusion that it is the responsability of the device tree owners to ensure that cooling devices with different scales aren't used in the same thermal zone? That's also what's currently specified in the power allocator documentation: Another important thing is the consistent scale of the power values provided by the cooling devices. All of the cooling devices in a single thermal zone should have power values reported either in milli-Watts or scaled to the same 'abstract scale'. Which was actually added by yourself: commit 5a64f775691647c242aa40d34f3512e7b179a921 Author: Lukasz Luba <lukasz.luba@arm.com> Date: Tue Nov 3 09:05:58 2020 +0000 PM: EM: Clarify abstract scale usage for power values in Energy Model The Energy Model (EM) can store power values in milli-Watts or in abstract scale. This might cause issues in the subsystems which use the EM for estimating the device power, such as: - mixing of different scales in a subsystem which uses multiple (cooling) devices (e.g. thermal Intelligent Power Allocation (IPA)) - assuming that energy [milli-Joules] can be derived from the EM power values which might not be possible since the power scale doesn't have to be in milli-Watts To avoid misconfiguration add the requisite documentation to the EM and related subsystems: EAS and IPA. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> It's ugly to have the abstract scales in the first place, but that's unfortunately what we currently have for at least some cooling devices. IMO it would be preferable to stick to catching incompliant configurations in reviews, rather than breaking thermal throttling for existing devices with configurations that comply with the current documentation. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-08 0:50 ` Matthias Kaehlcke @ 2022-02-08 9:32 ` Lukasz Luba 2022-02-08 17:25 ` Matthias Kaehlcke 2022-02-16 17:21 ` Doug Anderson 0 siblings, 2 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-08 9:32 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 2/8/22 12:50 AM, Matthias Kaehlcke wrote: > On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote: >> The Energy Model supports power values either in Watts or in some abstract >> scale. When the 2nd option is in use, the thermal governor IPA should not >> be allowed to operate, since the relation between cooling devices is not >> properly defined. Thus, it might be possible that big GPU has lower power >> values in abstract scale than a Little CPU. To mitigate a misbehaviour >> of the thermal control algorithm, simply not register a cooling device >> capable of working with IPA. > > Ugh, this would break thermal throttling for existing devices that are > currently supported in the upstream kernel. Could you point me to those devices? I cannot find them in the mainline DT. There are no GPU devices which register Energy Model (EM) in upstream, neither using DT (which would be power in mW) nor explicitly providing EM get_power() callback. The EM is needed to have IPA. Please clarify which existing devices are going to be broken with this change. > > Wasn't the conclusion that it is the responsability of the device tree > owners to ensure that cooling devices with different scales aren't used > in the same thermal zone? It's based on assumption that someone has DT and control. There was also implicit assumption that IPA would work properly on such platform - but it won't. 1. You cannot have 2 thermal zones: one with CPUs and other with GPU only and both working with two instances of IPA. 2. The abstract power scale doesn't guaranty anything about power values and IPA was simply designed with milli-Watts in mind. So even working on CPUs only using bogoWatts, is not what we could guaranty in IPA. > > That's also what's currently specified in the power allocator > documentation: > > Another important thing is the consistent scale of the power values > provided by the cooling devices. All of the cooling devices in a single > thermal zone should have power values reported either in milli-Watts > or scaled to the same 'abstract scale'. This can change. We have removed the userspace governor from kernel recently. The trend is to implement thermal policy in FW. Dealing with some intermediate configurations are causing complicated design, support of the algorithm logic is also more complex. > > Which was actually added by yourself: > > commit 5a64f775691647c242aa40d34f3512e7b179a921 > Author: Lukasz Luba <lukasz.luba@arm.com> > Date: Tue Nov 3 09:05:58 2020 +0000 > > PM: EM: Clarify abstract scale usage for power values in Energy Model > > The Energy Model (EM) can store power values in milli-Watts or in abstract > scale. This might cause issues in the subsystems which use the EM for > estimating the device power, such as: > > - mixing of different scales in a subsystem which uses multiple > (cooling) devices (e.g. thermal Intelligent Power Allocation (IPA)) > > - assuming that energy [milli-Joules] can be derived from the EM power > values which might not be possible since the power scale doesn't have > to be in milli-Watts > > To avoid misconfiguration add the requisite documentation to the EM and > related subsystems: EAS and IPA. > > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > It's ugly to have the abstract scales in the first place, but that's > unfortunately what we currently have for at least some cooling devices. A few questions: Do you use 'we' as Chrome engineers? Could you point me to those devices please? Are they new or some old platforms which need just maintenance? How IPA works for you in such real platform configuration? If it would be possible could you share some plots of temperature, frequency and CPUs, GPU utilization? Do you maybe know how the real power was scaled for them? It would help me understand and judge. > > IMO it would be preferable to stick to catching incompliant configurations > in reviews, rather than breaking thermal throttling for existing devices > with configurations that comply with the current documentation. > Without access to the source code of those devices, it's hard for me to see if they are broken. Regards, Lukasz ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-08 9:32 ` Lukasz Luba @ 2022-02-08 17:25 ` Matthias Kaehlcke 2022-02-09 11:16 ` Lukasz Luba 2022-02-16 17:21 ` Doug Anderson 1 sibling, 1 reply; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-08 17:25 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > > On 2/8/22 12:50 AM, Matthias Kaehlcke wrote: > > On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote: > > > The Energy Model supports power values either in Watts or in some abstract > > > scale. When the 2nd option is in use, the thermal governor IPA should not > > > be allowed to operate, since the relation between cooling devices is not > > > properly defined. Thus, it might be possible that big GPU has lower power > > > values in abstract scale than a Little CPU. To mitigate a misbehaviour > > > of the thermal control algorithm, simply not register a cooling device > > > capable of working with IPA. > > > > Ugh, this would break thermal throttling for existing devices that are > > currently supported in the upstream kernel. > > Could you point me to those devices? I cannot find them in the mainline > DT. There are no GPU devices which register Energy Model (EM) in > upstream, neither using DT (which would be power in mW) nor explicitly > providing EM get_power() callback. The EM is needed to have IPA. > > Please clarify which existing devices are going to be broken with this > change. I was thinking about arch/arm64/boot/dts/qcom/sc7180-trogdor-*, and potentially other SC7180 boards that use IPA for the CPU thermal zones. Initially SC7180 used an abstract scale for the CPU energy model, however I realized your change doesn't actually impact SC7180 CPUs for two reasons: 1. The energy model of the CPUs is registered through cpufreq_register_em_with_opp dev_pm_opp_of_register_em em_dev_register_perf_domain em_dev_register_perf_domain() is called with 'milliwatts = true', regardless of the potentially abstract scale, so IPA would not be disabled with your change. 2. There is a patch from Doug that adjusted the dynamic power coefficients of the CPUs to be closer to reality: commit 82ea7d411d43f60dce878252558e926f957109f0 Author: Douglas Anderson <dianders@chromium.org> Date: Thu Sep 2 14:51:37 2021 -0700 arm64: dts: qcom: sc7180: Base dynamic CPU power coefficients in reality > > Wasn't the conclusion that it is the responsability of the device tree > > owners to ensure that cooling devices with different scales aren't used > > in the same thermal zone? > > It's based on assumption that someone has DT and control. There was also > implicit assumption that IPA would work properly on such platform - but > it won't. > > 1. You cannot have 2 thermal zones: one with CPUs and other with GPU > only and both working with two instances of IPA. It's not clear to me why such a configuration wouldn't work. Is it also a problem to have multiple CPU thermal zones (one for each core) that use multiple instances of IPA? SC7180 has separate thermal zones for each core (or thermal sensor), Chrome OS uses IPA for CPU thermal throttling. > 2. The abstract power scale doesn't guaranty anything about power values > and IPA was simply designed with milli-Watts in mind. So even working > on CPUs only using bogoWatts, is not what we could guaranty in IPA. That's bad news for SoCs that are configured with bogoWatt values, from past discussions I had the impression that this is unfortunately not uncommon. > It's ugly to have the abstract scales in the first place, but that's > unfortunately what we currently have for at least some cooling devices. > A few questions: > > Do you use 'we' as Chrome engineers? I was referring to the kernel, in particular QCOM SC7180. > Could you point me to those devices please? arch/arm64/boot/dts/qcom/sc7180-trogdor-* Though as per above they shouldn't be impacted by your change, since the CPUs always pretend to use milli-Watts. [skipped some questions/answers since sc7180 isn't actually impacted by the change] Thanks Matthias ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-08 17:25 ` Matthias Kaehlcke @ 2022-02-09 11:16 ` Lukasz Luba 2022-02-09 22:17 ` Matthias Kaehlcke 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-09 11:16 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >> >> >> On 2/8/22 12:50 AM, Matthias Kaehlcke wrote: >>> On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote: >>>> The Energy Model supports power values either in Watts or in some abstract >>>> scale. When the 2nd option is in use, the thermal governor IPA should not >>>> be allowed to operate, since the relation between cooling devices is not >>>> properly defined. Thus, it might be possible that big GPU has lower power >>>> values in abstract scale than a Little CPU. To mitigate a misbehaviour >>>> of the thermal control algorithm, simply not register a cooling device >>>> capable of working with IPA. >>> >>> Ugh, this would break thermal throttling for existing devices that are >>> currently supported in the upstream kernel. >> >> Could you point me to those devices? I cannot find them in the mainline >> DT. There are no GPU devices which register Energy Model (EM) in >> upstream, neither using DT (which would be power in mW) nor explicitly >> providing EM get_power() callback. The EM is needed to have IPA. >> >> Please clarify which existing devices are going to be broken with this >> change. > > I was thinking about arch/arm64/boot/dts/qcom/sc7180-trogdor-*, and > potentially other SC7180 boards that use IPA for the CPU thermal > zones. > > Initially SC7180 used an abstract scale for the CPU energy model, > however I realized your change doesn't actually impact SC7180 CPUs > for two reasons: > > 1. The energy model of the CPUs is registered through > > cpufreq_register_em_with_opp > dev_pm_opp_of_register_em > em_dev_register_perf_domain > > em_dev_register_perf_domain() is called with 'milliwatts = true', > regardless of the potentially abstract scale, so IPA would not be > disabled with your change. That good registration. > > 2. There is a patch from Doug that adjusted the dynamic power > coefficients of the CPUs to be closer to reality: > > commit 82ea7d411d43f60dce878252558e926f957109f0 > Author: Douglas Anderson <dianders@chromium.org> > Date: Thu Sep 2 14:51:37 2021 -0700 > > arm64: dts: qcom: sc7180: Base dynamic CPU power coefficients in reality Thanks for the link to the commit. I'll have a look. It's good that it doesn't break this board. > >>> Wasn't the conclusion that it is the responsability of the device tree >>> owners to ensure that cooling devices with different scales aren't used >>> in the same thermal zone? >> >> It's based on assumption that someone has DT and control. There was also >> implicit assumption that IPA would work properly on such platform - but >> it won't. >> >> 1. You cannot have 2 thermal zones: one with CPUs and other with GPU >> only and both working with two instances of IPA. > > It's not clear to me why such a configuration wouldn't work. Is it also a > problem to have multiple CPU thermal zones (one for each core) that use > multiple instances of IPA? SC7180 has separate thermal zones for each core > (or thermal sensor), Chrome OS uses IPA for CPU thermal throttling. Unfortunately, the control mechanism is not working best in such setup. Main idea of IPA is to find 'best' OPP for a cooling device, e.g. one in the middle of a freq table. Worst scenario is when we change our decision between lowest and highest OPP, in short periods. It's burning too much power at highest freq, then we throttle too much, then we unthrottle because temp is dropping by some 'good enough' value. This situation can happen due to a few reasons: 1. Power values from EM are not reflecting reality, e.g. scaled too much 2. We don't have proper information about CPU load and frequencies used 3. PID coefficients are not properly set 4. board has small physical thermal sink potential but silicon if 'hot' 5. the workload is too dynamic 6. there is another power hungry device (GPU, DSP, ISP) which is outside of our control loop, e.g. is controlled in other thermal zone and has impact on our temp sensor value Proper setup and tuning of IPA could bring quite good benefit, because it could pick the 'best at that moment' OPPs for the devices, instead of throttling too hard and then unthrottling. Unfortunately, it's tricky and needs time (experiments, understanding the system). We have been trying to easy this entry barrier for developers. Very good results brings the optimization of the PID coefficients. The automated mechanism would figure out best values for the given board based on the tests run. It's under review now, there are also shared results [1][2]. It would solve a some of the issues with complex tuning. I was going to give it a try for my old experiments with the bogoWatts devices, but I don't expect that this is a silver bullet. Regarding the 'two thermal zones with IPA issues' I'm prity sure it won't help. > >> 2. The abstract power scale doesn't guaranty anything about power values >> and IPA was simply designed with milli-Watts in mind. So even working >> on CPUs only using bogoWatts, is not what we could guaranty in IPA. > > That's bad news for SoCs that are configured with bogoWatt values, from > past discussions I had the impression that this is unfortunately not > uncommon. > >> It's ugly to have the abstract scales in the first place, but that's >> unfortunately what we currently have for at least some cooling devices. > >> A few questions: >> >> Do you use 'we' as Chrome engineers? > > I was referring to the kernel, in particular QCOM SC7180. Thanks for sharing the name. > >> Could you point me to those devices please? > > arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > Though as per above they shouldn't be impacted by your change, since the > CPUs always pretend to use milli-Watts. > > [skipped some questions/answers since sc7180 isn't actually impacted by > the change] Thank you Matthias. I will investigate your setup to get better understanding. Regards, Lukasz > > Thanks > > Matthias > [1] https://nbviewer.org/gist/chemis01/0e86ad81508860659a57338dae8274f9 [2] https://lore.kernel.org/all/20211217184907.2103677-1-chetan.mistry@arm.com/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-09 11:16 ` Lukasz Luba @ 2022-02-09 22:17 ` Matthias Kaehlcke 2022-02-16 15:35 ` Lukasz Luba 0 siblings, 1 reply; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-09 22:17 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > > > On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > > On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > > > > > > > > On 2/8/22 12:50 AM, Matthias Kaehlcke wrote: > > > > On Mon, Feb 07, 2022 at 07:30:35AM +0000, Lukasz Luba wrote: > > > > > The Energy Model supports power values either in Watts or in some abstract > > > > > scale. When the 2nd option is in use, the thermal governor IPA should not > > > > > be allowed to operate, since the relation between cooling devices is not > > > > > properly defined. Thus, it might be possible that big GPU has lower power > > > > > values in abstract scale than a Little CPU. To mitigate a misbehaviour > > > > > of the thermal control algorithm, simply not register a cooling device > > > > > capable of working with IPA. > > > > > > > > Ugh, this would break thermal throttling for existing devices that are > > > > currently supported in the upstream kernel. > > > > > > Could you point me to those devices? I cannot find them in the mainline > > > DT. There are no GPU devices which register Energy Model (EM) in > > > upstream, neither using DT (which would be power in mW) nor explicitly > > > providing EM get_power() callback. The EM is needed to have IPA. > > > > > > Please clarify which existing devices are going to be broken with this > > > change. > > > > I was thinking about arch/arm64/boot/dts/qcom/sc7180-trogdor-*, and > > potentially other SC7180 boards that use IPA for the CPU thermal > > zones. > > > > Initially SC7180 used an abstract scale for the CPU energy model, > > however I realized your change doesn't actually impact SC7180 CPUs > > for two reasons: > > > > 1. The energy model of the CPUs is registered through > > > > cpufreq_register_em_with_opp > > dev_pm_opp_of_register_em > > em_dev_register_perf_domain > > > > em_dev_register_perf_domain() is called with 'milliwatts = true', > > regardless of the potentially abstract scale, so IPA would not be > > disabled with your change. > > That good registration. > > > > > 2. There is a patch from Doug that adjusted the dynamic power > > coefficients of the CPUs to be closer to reality: > > > > commit 82ea7d411d43f60dce878252558e926f957109f0 > > Author: Douglas Anderson <dianders@chromium.org> > > Date: Thu Sep 2 14:51:37 2021 -0700 > > > > arm64: dts: qcom: sc7180: Base dynamic CPU power coefficients in reality > > Thanks for the link to the commit. I'll have a look. It's good that it > doesn't break this board. > > > > > > > Wasn't the conclusion that it is the responsability of the device tree > > > > owners to ensure that cooling devices with different scales aren't used > > > > in the same thermal zone? > > > > > > It's based on assumption that someone has DT and control. There was also > > > implicit assumption that IPA would work properly on such platform - but > > > it won't. > > > > > > 1. You cannot have 2 thermal zones: one with CPUs and other with GPU > > > only and both working with two instances of IPA. > > > > It's not clear to me why such a configuration wouldn't work. Is it also a > > problem to have multiple CPU thermal zones (one for each core) that use > > multiple instances of IPA? SC7180 has separate thermal zones for each core > > (or thermal sensor), Chrome OS uses IPA for CPU thermal throttling. > > Unfortunately, the control mechanism is not working best in such setup. > Main idea of IPA is to find 'best' OPP for a cooling device, e.g. > one in the middle of a freq table. Worst scenario is when we change > our decision between lowest and highest OPP, in short periods. > It's burning too much power at highest freq, then we throttle too much, > then we unthrottle because temp is dropping by some 'good enough' value. > This situation can happen due to a few reasons: > 1. Power values from EM are not reflecting reality, e.g. scaled too much > 2. We don't have proper information about CPU load and frequencies used > 3. PID coefficients are not properly set > 4. board has small physical thermal sink potential but silicon if 'hot' > 5. the workload is too dynamic > 6. there is another power hungry device (GPU, DSP, ISP) which is outside > of our control loop, e.g. is controlled in other thermal zone and has > impact on our temp sensor value Thanks for the details! > Proper setup and tuning of IPA could bring quite good benefit, because > it could pick the 'best at that moment' OPPs for the devices, instead > of throttling too hard and then unthrottling. Unfortunately, it's > tricky and needs time (experiments, understanding the system). On some sc7180 ChromeOS boards we observed a behavior as you describe, we mitigated it by tuning the device 'sustainable-power' device tree property and the trip point temperatures. > We have been trying to easy this entry barrier for developers. Very good > results brings the optimization of the PID coefficients. The automated > mechanism would figure out best values for the given board based on the > tests run. It's under review now, there are also shared results [1][2]. > It would solve a some of the issues with complex tuning. Good to know, thanks for the pointer! > I was going to give it a try for my old experiments with the bogoWatts > devices, but I don't expect that this is a silver bullet. Regarding > the 'two thermal zones with IPA issues' I'm prity sure it won't help. > > > > > > 2. The abstract power scale doesn't guaranty anything about power values > > > and IPA was simply designed with milli-Watts in mind. So even working > > > on CPUs only using bogoWatts, is not what we could guaranty in IPA. > > > > That's bad news for SoCs that are configured with bogoWatt values, from > > past discussions I had the impression that this is unfortunately not > > uncommon. > > > > > It's ugly to have the abstract scales in the first place, but that's > > > unfortunately what we currently have for at least some cooling devices. > > > > > A few questions: > > > > > > Do you use 'we' as Chrome engineers? > > > > I was referring to the kernel, in particular QCOM SC7180. > > Thanks for sharing the name. > > > > > > Could you point me to those devices please? > > > > arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > > > Though as per above they shouldn't be impacted by your change, since the > > CPUs always pretend to use milli-Watts. > > > > [skipped some questions/answers since sc7180 isn't actually impacted by > > the change] > > Thank you Matthias. I will investigate your setup to get better > understanding. Thanks! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-09 22:17 ` Matthias Kaehlcke @ 2022-02-16 15:35 ` Lukasz Luba 2022-02-16 17:33 ` Doug Anderson 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-16 15:35 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Douglas Anderson, Stephen Boyd, Rajendra Nayak, Bjorn Andersson Hi Matthias, On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >> >> >> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>> >>>> [snip] >>>> Could you point me to those devices please? >>> >>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>> >>> Though as per above they shouldn't be impacted by your change, since the >>> CPUs always pretend to use milli-Watts. >>> >>> [skipped some questions/answers since sc7180 isn't actually impacted by >>> the change] >> >> Thank you Matthias. I will investigate your setup to get better >> understanding. > > Thanks! > I've checked those DT files and related code. As you already said, this patch is safe for them. So we can apply it IMO. -------------Off-topic------------------ Not in $subject comments: AFAICS based on two files which define thermal zones: sc7180-trogdor-homestar.dtsi sc7180-trogdor-coachz.dtsi only the 'big' cores are used as cooling devices in the 'skin_temp_thermal' - the CPU6 and CPU7. I assume you don't want to model at all the power usage from the Little cluster (which is quite big: 6 CPUs), do you? I can see that the Little CPUs have small dyn-power-coeff ~30% of the big and lower max freq, but still might be worth to add them to IPA. You might give them more 'weight', to make sure they receive more power during power split. You also don't have GPU cooling device in that thermal zone. Based on my experience if your GPU is a power hungry one, e.g. 2-4Watts, you might get better results when you model this 'hot' device (which impacts your temp sensor reported value). Regards, Lukasz ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 15:35 ` Lukasz Luba @ 2022-02-16 17:33 ` Doug Anderson 2022-02-16 22:13 ` Matthias Kaehlcke 2022-02-17 10:10 ` Daniel Lezcano 0 siblings, 2 replies; 40+ messages in thread From: Doug Anderson @ 2022-02-16 17:33 UTC (permalink / raw) To: Lukasz Luba Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson Hi, On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > Hi Matthias, > > On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > > On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > >> > >> > >> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > >>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > >>>> > >>>> > > [snip] > > >>>> Could you point me to those devices please? > >>> > >>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* > >>> > >>> Though as per above they shouldn't be impacted by your change, since the > >>> CPUs always pretend to use milli-Watts. > >>> > >>> [skipped some questions/answers since sc7180 isn't actually impacted by > >>> the change] > >> > >> Thank you Matthias. I will investigate your setup to get better > >> understanding. > > > > Thanks! > > > > I've checked those DT files and related code. > As you already said, this patch is safe for them. > So we can apply it IMO. > > > -------------Off-topic------------------ > Not in $subject comments: > > AFAICS based on two files which define thermal zones: > sc7180-trogdor-homestar.dtsi > sc7180-trogdor-coachz.dtsi > > only the 'big' cores are used as cooling devices in the > 'skin_temp_thermal' - the CPU6 and CPU7. > > I assume you don't want to model at all the power usage > from the Little cluster (which is quite big: 6 CPUs), do you? > I can see that the Little CPUs have small dyn-power-coeff > ~30% of the big and lower max freq, but still might be worth > to add them to IPA. You might give them more 'weight', to > make sure they receive more power during power split. > > You also don't have GPU cooling device in that thermal zone. > Based on my experience if your GPU is a power hungry one, > e.g. 2-4Watts, you might get better results when you model > this 'hot' device (which impacts your temp sensor reported value). I think the two boards you point at (homestar and coachz) are just the two that override the default defined in the SoC dtsi file. If you look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling map. You can also see the cooling maps for the littles. I guess we don't have a `dynamic-power-coefficient` for the GPU, though? Seems like we should, but I haven't dug through all the code here... -Doug ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 17:33 ` Doug Anderson @ 2022-02-16 22:13 ` Matthias Kaehlcke 2022-02-16 22:43 ` Lukasz Luba 2022-02-17 9:59 ` Daniel Lezcano 2022-02-17 10:10 ` Daniel Lezcano 1 sibling, 2 replies; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-16 22:13 UTC (permalink / raw) To: Doug Anderson Cc: Lukasz Luba, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On Wed, Feb 16, 2022 at 09:33:50AM -0800, Doug Anderson wrote: > Hi, > > On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > Hi Matthias, > > > > On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > > > On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > > >> > > >> > > >> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > > >>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > >>>> > > >>>> > > > > [snip] > > > > >>>> Could you point me to those devices please? > > >>> > > >>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > >>> > > >>> Though as per above they shouldn't be impacted by your change, since the > > >>> CPUs always pretend to use milli-Watts. > > >>> > > >>> [skipped some questions/answers since sc7180 isn't actually impacted by > > >>> the change] > > >> > > >> Thank you Matthias. I will investigate your setup to get better > > >> understanding. > > > > > > Thanks! > > > > > > > I've checked those DT files and related code. > > As you already said, this patch is safe for them. > > So we can apply it IMO. > > > > > > -------------Off-topic------------------ > > Not in $subject comments: > > > > AFAICS based on two files which define thermal zones: > > sc7180-trogdor-homestar.dtsi > > sc7180-trogdor-coachz.dtsi > > > > only the 'big' cores are used as cooling devices in the > > 'skin_temp_thermal' - the CPU6 and CPU7. > > > > I assume you don't want to model at all the power usage > > from the Little cluster (which is quite big: 6 CPUs), do you? > > I can see that the Little CPUs have small dyn-power-coeff > > ~30% of the big and lower max freq, but still might be worth > > to add them to IPA. You might give them more 'weight', to > > make sure they receive more power during power split. In experiments we saw that including the little cores as cooling devices for 'skin_temp_thermal' didn't have a significant impact on thermals, so we left them out. > > You also don't have GPU cooling device in that thermal zone. > > Based on my experience if your GPU is a power hungry one, > > e.g. 2-4Watts, you might get better results when you model > > this 'hot' device (which impacts your temp sensor reported value). > > I think the two boards you point at (homestar and coachz) are just the > two that override the default defined in the SoC dtsi file. If you > look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > map. You can also see the cooling maps for the littles. Yep, plus thermal zones with cooling maps for the big cores. > I guess we don't have a `dynamic-power-coefficient` for the GPU, > though? Seems like we should, but I haven't dug through all the code > here... To my knowledge the SC7x80 GPU doesn't register an energy model, which is one of the reasons the GPU wasn't included as cooling device for 'skin_temp_thermal'. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 22:13 ` Matthias Kaehlcke @ 2022-02-16 22:43 ` Lukasz Luba 2022-02-17 0:26 ` Matthias Kaehlcke 2022-02-17 10:15 ` Daniel Lezcano 2022-02-17 9:59 ` Daniel Lezcano 1 sibling, 2 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-16 22:43 UTC (permalink / raw) To: Matthias Kaehlcke, Doug Anderson Cc: LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 2/16/22 10:13 PM, Matthias Kaehlcke wrote: > On Wed, Feb 16, 2022 at 09:33:50AM -0800, Doug Anderson wrote: >> Hi, >> >> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>> >>> Hi Matthias, >>> >>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>> >>>>> >>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>> >>>>>>> >>> >>> [snip] >>> >>>>>>> Could you point me to those devices please? >>>>>> >>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>> >>>>>> Though as per above they shouldn't be impacted by your change, since the >>>>>> CPUs always pretend to use milli-Watts. >>>>>> >>>>>> [skipped some questions/answers since sc7180 isn't actually impacted by >>>>>> the change] >>>>> >>>>> Thank you Matthias. I will investigate your setup to get better >>>>> understanding. >>>> >>>> Thanks! >>>> >>> >>> I've checked those DT files and related code. >>> As you already said, this patch is safe for them. >>> So we can apply it IMO. >>> >>> >>> -------------Off-topic------------------ >>> Not in $subject comments: >>> >>> AFAICS based on two files which define thermal zones: >>> sc7180-trogdor-homestar.dtsi >>> sc7180-trogdor-coachz.dtsi >>> >>> only the 'big' cores are used as cooling devices in the >>> 'skin_temp_thermal' - the CPU6 and CPU7. >>> >>> I assume you don't want to model at all the power usage >>> from the Little cluster (which is quite big: 6 CPUs), do you? >>> I can see that the Little CPUs have small dyn-power-coeff >>> ~30% of the big and lower max freq, but still might be worth >>> to add them to IPA. You might give them more 'weight', to >>> make sure they receive more power during power split. > > In experiments we saw that including the little cores as cooling > devices for 'skin_temp_thermal' didn't have a significant impact on > thermals, so we left them out. > >>> You also don't have GPU cooling device in that thermal zone. >>> Based on my experience if your GPU is a power hungry one, >>> e.g. 2-4Watts, you might get better results when you model >>> this 'hot' device (which impacts your temp sensor reported value). >> >> I think the two boards you point at (homestar and coachz) are just the >> two that override the default defined in the SoC dtsi file. If you >> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >> map. You can also see the cooling maps for the littles. > > Yep, plus thermal zones with cooling maps for the big cores. > >> I guess we don't have a `dynamic-power-coefficient` for the GPU, >> though? Seems like we should, but I haven't dug through all the code >> here... > > To my knowledge the SC7x80 GPU doesn't register an energy model, which is > one of the reasons the GPU wasn't included as cooling device for > 'skin_temp_thermal'. > You can give it a try by editing the DT and adding in the GPU node the 'dynamic-power-coefficient' + probably small modification in the driver code. If the GPU driver registers the cooling device in the new way, you would also get EM registered thanks to the devfreq cooling new code (commit: 84e0d87c9944eb36ae6037a). You can check an example from Panfrost GPU driver [1]. I can see some upstream MSM GPU driver, but I don't know if that is your GPU driver. It registers the 'old' way the devfreq cooling [2] but it would be easy to change to use the new function. The GPU driver would use the same dev_pm_opp_of_register_em() as your CPUs do, so EM would be in 'milli-Watts' (so should be fine). [1] https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/panfrost/panfrost_devfreq.c#L153 [2] https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/msm/msm_gpu_devfreq.c#L129 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 22:43 ` Lukasz Luba @ 2022-02-17 0:26 ` Matthias Kaehlcke 2022-02-17 10:15 ` Daniel Lezcano 1 sibling, 0 replies; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-17 0:26 UTC (permalink / raw) To: Lukasz Luba Cc: Doug Anderson, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On Wed, Feb 16, 2022 at 10:43:34PM +0000, Lukasz Luba wrote: > > > On 2/16/22 10:13 PM, Matthias Kaehlcke wrote: > > On Wed, Feb 16, 2022 at 09:33:50AM -0800, Doug Anderson wrote: > > > Hi, > > > > > > On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > > > > > Hi Matthias, > > > > > > > > On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > > > > > On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > > > > > > > > > > > > > > > > > > On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > > > > > > > On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > > > > > > > > > > > > > > > > > > > > > > > [snip] > > > > > > > > > > > > Could you point me to those devices please? > > > > > > > > > > > > > > arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > > > > > > > > > > > > > Though as per above they shouldn't be impacted by your change, since the > > > > > > > CPUs always pretend to use milli-Watts. > > > > > > > > > > > > > > [skipped some questions/answers since sc7180 isn't actually impacted by > > > > > > > the change] > > > > > > > > > > > > Thank you Matthias. I will investigate your setup to get better > > > > > > understanding. > > > > > > > > > > Thanks! > > > > > > > > > > > > > I've checked those DT files and related code. > > > > As you already said, this patch is safe for them. > > > > So we can apply it IMO. > > > > > > > > > > > > -------------Off-topic------------------ > > > > Not in $subject comments: > > > > > > > > AFAICS based on two files which define thermal zones: > > > > sc7180-trogdor-homestar.dtsi > > > > sc7180-trogdor-coachz.dtsi > > > > > > > > only the 'big' cores are used as cooling devices in the > > > > 'skin_temp_thermal' - the CPU6 and CPU7. > > > > > > > > I assume you don't want to model at all the power usage > > > > from the Little cluster (which is quite big: 6 CPUs), do you? > > > > I can see that the Little CPUs have small dyn-power-coeff > > > > ~30% of the big and lower max freq, but still might be worth > > > > to add them to IPA. You might give them more 'weight', to > > > > make sure they receive more power during power split. > > > > In experiments we saw that including the little cores as cooling > > devices for 'skin_temp_thermal' didn't have a significant impact on > > thermals, so we left them out. > > > > > > You also don't have GPU cooling device in that thermal zone. > > > > Based on my experience if your GPU is a power hungry one, > > > > e.g. 2-4Watts, you might get better results when you model > > > > this 'hot' device (which impacts your temp sensor reported value). > > > > > > I think the two boards you point at (homestar and coachz) are just the > > > two that override the default defined in the SoC dtsi file. If you > > > look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > > > map. You can also see the cooling maps for the littles. > > > > Yep, plus thermal zones with cooling maps for the big cores. > > > > > I guess we don't have a `dynamic-power-coefficient` for the GPU, > > > though? Seems like we should, but I haven't dug through all the code > > > here... > > > > To my knowledge the SC7x80 GPU doesn't register an energy model, which is > > one of the reasons the GPU wasn't included as cooling device for > > 'skin_temp_thermal'. > > > > You can give it a try by editing the DT and adding in the > GPU node the 'dynamic-power-coefficient' + probably > small modification in the driver code. > > If the GPU driver registers the cooling device in the new way, you > would also get EM registered thanks to the devfreq cooling new code > (commit: 84e0d87c9944eb36ae6037a). > > You can check an example from Panfrost GPU driver [1]. Ah, I missed that, thanks for the pointer! > I can see some upstream MSM GPU driver, but I don't know if that is > your GPU driver. It registers the 'old' way the devfreq cooling [2] > but it would be easy to change to use the new function. > The GPU driver would use the same dev_pm_opp_of_register_em() as > your CPUs do, so EM would be in 'milli-Watts' (so should be fine). Yep, that's whay we are using. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 22:43 ` Lukasz Luba 2022-02-17 0:26 ` Matthias Kaehlcke @ 2022-02-17 10:15 ` Daniel Lezcano 1 sibling, 0 replies; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 10:15 UTC (permalink / raw) To: Lukasz Luba, Matthias Kaehlcke, Doug Anderson Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 16/02/2022 23:43, Lukasz Luba wrote: > > > On 2/16/22 10:13 PM, Matthias Kaehlcke wrote: >> On Wed, Feb 16, 2022 at 09:33:50AM -0800, Doug Anderson wrote: >>> Hi, >>> >>> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>> >>>> Hi Matthias, >>>> >>>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>>> >>>>>> >>>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>>> >>>>>>>> >>>> >>>> [snip] >>>> >>>>>>>> Could you point me to those devices please? >>>>>>> >>>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>>> >>>>>>> Though as per above they shouldn't be impacted by your change, >>>>>>> since the >>>>>>> CPUs always pretend to use milli-Watts. >>>>>>> >>>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>>> impacted by >>>>>>> the change] >>>>>> >>>>>> Thank you Matthias. I will investigate your setup to get better >>>>>> understanding. >>>>> >>>>> Thanks! >>>>> >>>> >>>> I've checked those DT files and related code. >>>> As you already said, this patch is safe for them. >>>> So we can apply it IMO. >>>> >>>> >>>> -------------Off-topic------------------ >>>> Not in $subject comments: >>>> >>>> AFAICS based on two files which define thermal zones: >>>> sc7180-trogdor-homestar.dtsi >>>> sc7180-trogdor-coachz.dtsi >>>> >>>> only the 'big' cores are used as cooling devices in the >>>> 'skin_temp_thermal' - the CPU6 and CPU7. >>>> >>>> I assume you don't want to model at all the power usage >>>> from the Little cluster (which is quite big: 6 CPUs), do you? >>>> I can see that the Little CPUs have small dyn-power-coeff >>>> ~30% of the big and lower max freq, but still might be worth >>>> to add them to IPA. You might give them more 'weight', to >>>> make sure they receive more power during power split. >> >> In experiments we saw that including the little cores as cooling >> devices for 'skin_temp_thermal' didn't have a significant impact on >> thermals, so we left them out. >> >>>> You also don't have GPU cooling device in that thermal zone. >>>> Based on my experience if your GPU is a power hungry one, >>>> e.g. 2-4Watts, you might get better results when you model >>>> this 'hot' device (which impacts your temp sensor reported value). >>> >>> I think the two boards you point at (homestar and coachz) are just the >>> two that override the default defined in the SoC dtsi file. If you >>> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >>> map. You can also see the cooling maps for the littles. >> >> Yep, plus thermal zones with cooling maps for the big cores. >> >>> I guess we don't have a `dynamic-power-coefficient` for the GPU, >>> though? Seems like we should, but I haven't dug through all the code >>> here... >> >> To my knowledge the SC7x80 GPU doesn't register an energy model, which is >> one of the reasons the GPU wasn't included as cooling device for >> 'skin_temp_thermal'. >> > > You can give it a try by editing the DT and adding in the > GPU node the 'dynamic-power-coefficient' + probably > small modification in the driver code. That should not work for sc7180 as the DT does not specify the voltage values. The QCom backend should retrieve the voltage values for each OPPs from a register or a hardcoded table coming from the SoC documentation. However, I can confirm dynamic-power-coefficient will add an energy model on GPU (or any devfreq device) if the voltage is specified in the OPP description. Checked with panfrost driver and memory controller on rk3399. > If the GPU driver registers the cooling device in the new way, you > would also get EM registered thanks to the devfreq cooling new code > (commit: 84e0d87c9944eb36ae6037a). > > You can check an example from Panfrost GPU driver [1]. > > I can see some upstream MSM GPU driver, but I don't know if that is > your GPU driver. It registers the 'old' way the devfreq cooling [2] > but it would be easy to change to use the new function. > The GPU driver would use the same dev_pm_opp_of_register_em() as > your CPUs do, so EM would be in 'milli-Watts' (so should be fine). > > > [1] > https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/panfrost/panfrost_devfreq.c#L153 > > [2] > https://elixir.bootlin.com/linux/v5.16/source/drivers/gpu/drm/msm/msm_gpu_devfreq.c#L129 > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 22:13 ` Matthias Kaehlcke 2022-02-16 22:43 ` Lukasz Luba @ 2022-02-17 9:59 ` Daniel Lezcano 1 sibling, 0 replies; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 9:59 UTC (permalink / raw) To: Matthias Kaehlcke, Doug Anderson Cc: Lukasz Luba, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 16/02/2022 23:13, Matthias Kaehlcke wrote: > On Wed, Feb 16, 2022 at 09:33:50AM -0800, Doug Anderson wrote: >> Hi, >> >> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>> >>> Hi Matthias, >>> >>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>> >>>>> >>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>> >>>>>>> >>> >>> [snip] >>> >>>>>>> Could you point me to those devices please? >>>>>> >>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>> >>>>>> Though as per above they shouldn't be impacted by your change, since the >>>>>> CPUs always pretend to use milli-Watts. >>>>>> >>>>>> [skipped some questions/answers since sc7180 isn't actually impacted by >>>>>> the change] >>>>> >>>>> Thank you Matthias. I will investigate your setup to get better >>>>> understanding. >>>> >>>> Thanks! >>>> >>> >>> I've checked those DT files and related code. >>> As you already said, this patch is safe for them. >>> So we can apply it IMO. >>> >>> >>> -------------Off-topic------------------ >>> Not in $subject comments: >>> >>> AFAICS based on two files which define thermal zones: >>> sc7180-trogdor-homestar.dtsi >>> sc7180-trogdor-coachz.dtsi >>> >>> only the 'big' cores are used as cooling devices in the >>> 'skin_temp_thermal' - the CPU6 and CPU7. >>> >>> I assume you don't want to model at all the power usage >>> from the Little cluster (which is quite big: 6 CPUs), do you? >>> I can see that the Little CPUs have small dyn-power-coeff >>> ~30% of the big and lower max freq, but still might be worth >>> to add them to IPA. You might give them more 'weight', to >>> make sure they receive more power during power split. > > In experiments we saw that including the little cores as cooling > devices for 'skin_temp_thermal' didn't have a significant impact on > thermals, so we left them out. I agree, that was also my conclusion after doing some measurements. Basically, the little cores are always cold and are victims of the big cores heat dissipation. They of course contribute a bit to the heat but capping their performance does not change the temperature trend of the whole. That is less true with silver-gold were it is the same micro-arch but different frequencies. >>> You also don't have GPU cooling device in that thermal zone. >>> Based on my experience if your GPU is a power hungry one, >>> e.g. 2-4Watts, you might get better results when you model >>> this 'hot' device (which impacts your temp sensor reported value). >> >> I think the two boards you point at (homestar and coachz) are just the >> two that override the default defined in the SoC dtsi file. If you >> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >> map. You can also see the cooling maps for the littles. > > Yep, plus thermal zones with cooling maps for the big cores. > >> I guess we don't have a `dynamic-power-coefficient` for the GPU, >> though? Seems like we should, but I haven't dug through all the code >> here... > > To my knowledge the SC7x80 GPU doesn't register an energy model, which is > one of the reasons the GPU wasn't included as cooling device for > 'skin_temp_thermal'. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 17:33 ` Doug Anderson 2022-02-16 22:13 ` Matthias Kaehlcke @ 2022-02-17 10:10 ` Daniel Lezcano 2022-02-17 10:47 ` Lukasz Luba 1 sibling, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 10:10 UTC (permalink / raw) To: Doug Anderson, Lukasz Luba Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 16/02/2022 18:33, Doug Anderson wrote: > Hi, > > On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> Hi Matthias, >> >> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>> >>>> >>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>> >>>>>> >> >> [snip] >> >>>>>> Could you point me to those devices please? >>>>> >>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>> >>>>> Though as per above they shouldn't be impacted by your change, since the >>>>> CPUs always pretend to use milli-Watts. >>>>> >>>>> [skipped some questions/answers since sc7180 isn't actually impacted by >>>>> the change] >>>> >>>> Thank you Matthias. I will investigate your setup to get better >>>> understanding. >>> >>> Thanks! >>> >> >> I've checked those DT files and related code. >> As you already said, this patch is safe for them. >> So we can apply it IMO. >> >> >> -------------Off-topic------------------ >> Not in $subject comments: >> >> AFAICS based on two files which define thermal zones: >> sc7180-trogdor-homestar.dtsi >> sc7180-trogdor-coachz.dtsi >> >> only the 'big' cores are used as cooling devices in the >> 'skin_temp_thermal' - the CPU6 and CPU7. >> >> I assume you don't want to model at all the power usage >> from the Little cluster (which is quite big: 6 CPUs), do you? >> I can see that the Little CPUs have small dyn-power-coeff >> ~30% of the big and lower max freq, but still might be worth >> to add them to IPA. You might give them more 'weight', to >> make sure they receive more power during power split. >> >> You also don't have GPU cooling device in that thermal zone. >> Based on my experience if your GPU is a power hungry one, >> e.g. 2-4Watts, you might get better results when you model >> this 'hot' device (which impacts your temp sensor reported value). > > I think the two boards you point at (homestar and coachz) are just the > two that override the default defined in the SoC dtsi file. If you > look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > map. You can also see the cooling maps for the littles. > > I guess we don't have a `dynamic-power-coefficient` for the GPU, > though? Seems like we should, but I haven't dug through all the code > here... The dynamic-power-coefficient is available for OPPs which includes CPUfreq and devfreq. As the GPU is managed by devfreq, setting the dynamic-power-coefficient makes the energy model available for it. However, the OPPs must define the frequency and the voltage. That is the case for most platforms except on QCom platform. That may not be specified as it uses a frequency index and the hardware does the voltage change in our back. The QCom cpufreq backend get the voltage table from a register (or whatever) and completes the voltage values for the OPPs, thus adding the information which is missing in the device tree. The energy model can then initializes itself and allows the usage of the Energy Aware Scheduler. However this piece of code is missing for the GPU part. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 10:10 ` Daniel Lezcano @ 2022-02-17 10:47 ` Lukasz Luba 2022-02-17 11:28 ` Daniel Lezcano 2022-02-17 16:37 ` Doug Anderson 0 siblings, 2 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 10:47 UTC (permalink / raw) To: Daniel Lezcano, Doug Anderson, Matthias Kaehlcke Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson Hi Daniel, On 2/17/22 10:10 AM, Daniel Lezcano wrote: > On 16/02/2022 18:33, Doug Anderson wrote: >> Hi, >> >> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>> >>> Hi Matthias, >>> >>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>> >>>>> >>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>> >>>>>>> >>> >>> [snip] >>> >>>>>>> Could you point me to those devices please? >>>>>> >>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>> >>>>>> Though as per above they shouldn't be impacted by your change, >>>>>> since the >>>>>> CPUs always pretend to use milli-Watts. >>>>>> >>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>> impacted by >>>>>> the change] >>>>> >>>>> Thank you Matthias. I will investigate your setup to get better >>>>> understanding. >>>> >>>> Thanks! >>>> >>> >>> I've checked those DT files and related code. >>> As you already said, this patch is safe for them. >>> So we can apply it IMO. >>> >>> >>> -------------Off-topic------------------ >>> Not in $subject comments: >>> >>> AFAICS based on two files which define thermal zones: >>> sc7180-trogdor-homestar.dtsi >>> sc7180-trogdor-coachz.dtsi >>> >>> only the 'big' cores are used as cooling devices in the >>> 'skin_temp_thermal' - the CPU6 and CPU7. >>> >>> I assume you don't want to model at all the power usage >>> from the Little cluster (which is quite big: 6 CPUs), do you? >>> I can see that the Little CPUs have small dyn-power-coeff >>> ~30% of the big and lower max freq, but still might be worth >>> to add them to IPA. You might give them more 'weight', to >>> make sure they receive more power during power split. >>> >>> You also don't have GPU cooling device in that thermal zone. >>> Based on my experience if your GPU is a power hungry one, >>> e.g. 2-4Watts, you might get better results when you model >>> this 'hot' device (which impacts your temp sensor reported value). >> >> I think the two boards you point at (homestar and coachz) are just the >> two that override the default defined in the SoC dtsi file. If you >> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >> map. You can also see the cooling maps for the littles. >> >> I guess we don't have a `dynamic-power-coefficient` for the GPU, >> though? Seems like we should, but I haven't dug through all the code >> here... > > The dynamic-power-coefficient is available for OPPs which includes > CPUfreq and devfreq. As the GPU is managed by devfreq, setting the > dynamic-power-coefficient makes the energy model available for it. > > However, the OPPs must define the frequency and the voltage. That is the > case for most platforms except on QCom platform. > > That may not be specified as it uses a frequency index and the hardware > does the voltage change in our back. The QCom cpufreq backend get the > voltage table from a register (or whatever) and completes the voltage > values for the OPPs, thus adding the information which is missing in the > device tree. The energy model can then initializes itself and allows the > usage of the Energy Aware Scheduler. > > However this piece of code is missing for the GPU part. > Thank you for joining the discussion. I don't know about that Qcom GPU voltage information is missing. If the voltage is not available (only the frequencies), there is another way. There is an 'advanced' EM which uses registration function: em_dev_register_perf_domain(). It uses a local driver callback to get power for each found frequency. It has benefit because there is no restriction to 'fit' into the math formula, instead just avg power values can be feed into EM. It's called 'advanced' EM [1]. Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten was proposing from ~2014 this upstream, but EAS wasn't merged back then): where to store these power-freq values, which are then used by the callback. We have the 'dynamic-power-coefficient' in DT, but it has limitations. It would be good to have this simple array attached to the GPU/CPU node. IMHO it meet the requirement of DT, it describes the HW (it would have HZ and Watts values). Doug, Matthias could you have a look at that function and its usage, please [1]? If you guys would support me in this, I would start, with an RFC proposal, a discussion on LKML. [1] https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 10:47 ` Lukasz Luba @ 2022-02-17 11:28 ` Daniel Lezcano 2022-02-17 12:11 ` Lukasz Luba 2022-02-17 16:37 ` Doug Anderson 1 sibling, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 11:28 UTC (permalink / raw) To: Lukasz Luba, Doug Anderson, Matthias Kaehlcke Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 17/02/2022 11:47, Lukasz Luba wrote: > Hi Daniel, > > On 2/17/22 10:10 AM, Daniel Lezcano wrote: >> On 16/02/2022 18:33, Doug Anderson wrote: >>> Hi, >>> >>> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>> >>>> Hi Matthias, >>>> >>>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>>> >>>>>> >>>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>>> >>>>>>>> >>>> >>>> [snip] >>>> >>>>>>>> Could you point me to those devices please? >>>>>>> >>>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>>> >>>>>>> Though as per above they shouldn't be impacted by your change, >>>>>>> since the >>>>>>> CPUs always pretend to use milli-Watts. >>>>>>> >>>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>>> impacted by >>>>>>> the change] >>>>>> >>>>>> Thank you Matthias. I will investigate your setup to get better >>>>>> understanding. >>>>> >>>>> Thanks! >>>>> >>>> >>>> I've checked those DT files and related code. >>>> As you already said, this patch is safe for them. >>>> So we can apply it IMO. >>>> >>>> >>>> -------------Off-topic------------------ >>>> Not in $subject comments: >>>> >>>> AFAICS based on two files which define thermal zones: >>>> sc7180-trogdor-homestar.dtsi >>>> sc7180-trogdor-coachz.dtsi >>>> >>>> only the 'big' cores are used as cooling devices in the >>>> 'skin_temp_thermal' - the CPU6 and CPU7. >>>> >>>> I assume you don't want to model at all the power usage >>>> from the Little cluster (which is quite big: 6 CPUs), do you? >>>> I can see that the Little CPUs have small dyn-power-coeff >>>> ~30% of the big and lower max freq, but still might be worth >>>> to add them to IPA. You might give them more 'weight', to >>>> make sure they receive more power during power split. >>>> >>>> You also don't have GPU cooling device in that thermal zone. >>>> Based on my experience if your GPU is a power hungry one, >>>> e.g. 2-4Watts, you might get better results when you model >>>> this 'hot' device (which impacts your temp sensor reported value). >>> >>> I think the two boards you point at (homestar and coachz) are just the >>> two that override the default defined in the SoC dtsi file. If you >>> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >>> map. You can also see the cooling maps for the littles. >>> >>> I guess we don't have a `dynamic-power-coefficient` for the GPU, >>> though? Seems like we should, but I haven't dug through all the code >>> here... >> >> The dynamic-power-coefficient is available for OPPs which includes >> CPUfreq and devfreq. As the GPU is managed by devfreq, setting the >> dynamic-power-coefficient makes the energy model available for it. >> >> However, the OPPs must define the frequency and the voltage. That is >> the case for most platforms except on QCom platform. >> >> That may not be specified as it uses a frequency index and the >> hardware does the voltage change in our back. The QCom cpufreq backend >> get the voltage table from a register (or whatever) and completes the >> voltage values for the OPPs, thus adding the information which is >> missing in the device tree. The energy model can then initializes >> itself and allows the usage of the Energy Aware Scheduler. >> >> However this piece of code is missing for the GPU part. >> > > Thank you for joining the discussion. I don't know about that Qcom > GPU voltage information is missing. > > If the voltage is not available (only the frequencies), there is > another way. There is an 'advanced' EM which uses registration function: > em_dev_register_perf_domain(). It uses a local driver callback to get > power for each found frequency. It has benefit because there is no > restriction to 'fit' into the math formula, instead just avg power > values can be feed into EM. It's called 'advanced' EM [1]. > > Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten > was proposing from ~2014 this upstream, but EAS wasn't merged back > then): > where to store these power-freq values, which are then used by the > callback. Why not make it more generic and replace the frequency by a performance index, so it can be used by any kind of perf limiter? > We have the 'dynamic-power-coefficient' in DT, but > it has limitations. It would be good to have this simple array > attached to the GPU/CPU node. IMHO it meet the requirement of DT, > it describes the HW (it would have HZ and Watts values). > > Doug, Matthias could you have a look at that function and its > usage, please [1]? > If you guys would support me in this, I would start, with an RFC > proposal, a discussion on LKML. > > > [1] > https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 > -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 11:28 ` Daniel Lezcano @ 2022-02-17 12:11 ` Lukasz Luba 2022-02-17 12:33 ` Daniel Lezcano 2022-02-17 16:46 ` Matthias Kaehlcke 0 siblings, 2 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 12:11 UTC (permalink / raw) To: Daniel Lezcano Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, Doug Anderson, Matthias Kaehlcke On 2/17/22 11:28 AM, Daniel Lezcano wrote: > On 17/02/2022 11:47, Lukasz Luba wrote: >> Hi Daniel, >> >> On 2/17/22 10:10 AM, Daniel Lezcano wrote: >>> On 16/02/2022 18:33, Doug Anderson wrote: >>>> Hi, >>>> >>>> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> >>>> wrote: >>>>> >>>>> Hi Matthias, >>>>> >>>>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>>>> >>>>>>> >>>>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>>>> >>>>>>>>> >>>>> >>>>> [snip] >>>>> >>>>>>>>> Could you point me to those devices please? >>>>>>>> >>>>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>>>> >>>>>>>> Though as per above they shouldn't be impacted by your change, >>>>>>>> since the >>>>>>>> CPUs always pretend to use milli-Watts. >>>>>>>> >>>>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>>>> impacted by >>>>>>>> the change] >>>>>>> >>>>>>> Thank you Matthias. I will investigate your setup to get better >>>>>>> understanding. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>>> I've checked those DT files and related code. >>>>> As you already said, this patch is safe for them. >>>>> So we can apply it IMO. >>>>> >>>>> >>>>> -------------Off-topic------------------ >>>>> Not in $subject comments: >>>>> >>>>> AFAICS based on two files which define thermal zones: >>>>> sc7180-trogdor-homestar.dtsi >>>>> sc7180-trogdor-coachz.dtsi >>>>> >>>>> only the 'big' cores are used as cooling devices in the >>>>> 'skin_temp_thermal' - the CPU6 and CPU7. >>>>> >>>>> I assume you don't want to model at all the power usage >>>>> from the Little cluster (which is quite big: 6 CPUs), do you? >>>>> I can see that the Little CPUs have small dyn-power-coeff >>>>> ~30% of the big and lower max freq, but still might be worth >>>>> to add them to IPA. You might give them more 'weight', to >>>>> make sure they receive more power during power split. >>>>> >>>>> You also don't have GPU cooling device in that thermal zone. >>>>> Based on my experience if your GPU is a power hungry one, >>>>> e.g. 2-4Watts, you might get better results when you model >>>>> this 'hot' device (which impacts your temp sensor reported value). >>>> >>>> I think the two boards you point at (homestar and coachz) are just the >>>> two that override the default defined in the SoC dtsi file. If you >>>> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >>>> map. You can also see the cooling maps for the littles. >>>> >>>> I guess we don't have a `dynamic-power-coefficient` for the GPU, >>>> though? Seems like we should, but I haven't dug through all the code >>>> here... >>> >>> The dynamic-power-coefficient is available for OPPs which includes >>> CPUfreq and devfreq. As the GPU is managed by devfreq, setting the >>> dynamic-power-coefficient makes the energy model available for it. >>> >>> However, the OPPs must define the frequency and the voltage. That is >>> the case for most platforms except on QCom platform. >>> >>> That may not be specified as it uses a frequency index and the >>> hardware does the voltage change in our back. The QCom cpufreq >>> backend get the voltage table from a register (or whatever) and >>> completes the voltage values for the OPPs, thus adding the >>> information which is missing in the device tree. The energy model can >>> then initializes itself and allows the usage of the Energy Aware >>> Scheduler. >>> >>> However this piece of code is missing for the GPU part. >>> >> >> Thank you for joining the discussion. I don't know about that Qcom >> GPU voltage information is missing. >> >> If the voltage is not available (only the frequencies), there is >> another way. There is an 'advanced' EM which uses registration function: >> em_dev_register_perf_domain(). It uses a local driver callback to get >> power for each found frequency. It has benefit because there is no >> restriction to 'fit' into the math formula, instead just avg power >> values can be feed into EM. It's called 'advanced' EM [1]. >> >> Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten >> was proposing from ~2014 this upstream, but EAS wasn't merged back >> then): >> where to store these power-freq values, which are then used by the >> callback. > > Why not make it more generic and replace the frequency by a performance > index, so it can be used by any kind of perf limiter? For that DT array, yes, it can be an index, so effectively it could be a simple 1d array. something like: msm_gpu_energy_model: msm-gpu-energy-model { compatible = "energy-model" /* Values are sorted micro-Watts which correspond to each OPP or performance state. The total amount of them must match number of OPPs. */ power-microwatt = <100000>, <230000>, <380000>, <600000>; }; then in gpu node instead of having 'dynamic-power-coefficient', which is useless because voltage is missing, we would have 'energy-model', like: energy-model = <&msm_gpu_energy_model>; If you agree to continue this topic. I will send an RFC so we could further discuss this idea. This $subject doesn't fit well. Thank you again for your feedback Daniel! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 12:11 ` Lukasz Luba @ 2022-02-17 12:33 ` Daniel Lezcano 2022-02-17 12:37 ` Lukasz Luba 2022-02-17 16:46 ` Matthias Kaehlcke 1 sibling, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 12:33 UTC (permalink / raw) To: Lukasz Luba Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, Doug Anderson, Matthias Kaehlcke, Vincent Guittot On 17/02/2022 13:11, Lukasz Luba wrote: [ ... ] >> Why not make it more generic and replace the frequency by a >> performance index, so it can be used by any kind of perf limiter? > > For that DT array, yes, it can be an index, so effectively it could be > a simple 1d array. > > something like: > > msm_gpu_energy_model: msm-gpu-energy-model { > compatible = "energy-model" > /* Values are sorted micro-Watts which correspond to each OPP > or performance state. The total amount of them must match > number of OPPs. */ > power-microwatt = <100000>, > <230000>, > <380000>, > <600000>; > }; > > then in gpu node instead of having 'dynamic-power-coefficient', > which is useless because voltage is missing, we would have > 'energy-model', like: > > energy-model = <&msm_gpu_energy_model>; > > > If you agree to continue this topic. I will send an RFC so we could > further discuss this idea. This $subject doesn't fit well. Yes, definitively I agree to continue on this topic. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 12:33 ` Daniel Lezcano @ 2022-02-17 12:37 ` Lukasz Luba 0 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 12:37 UTC (permalink / raw) To: Daniel Lezcano Cc: LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, Doug Anderson, Matthias Kaehlcke, Vincent Guittot On 2/17/22 12:33 PM, Daniel Lezcano wrote: > On 17/02/2022 13:11, Lukasz Luba wrote: > > [ ... ] > >>> Why not make it more generic and replace the frequency by a >>> performance index, so it can be used by any kind of perf limiter? >> >> For that DT array, yes, it can be an index, so effectively it could be >> a simple 1d array. >> >> something like: >> >> msm_gpu_energy_model: msm-gpu-energy-model { >> compatible = "energy-model" >> /* Values are sorted micro-Watts which correspond to each OPP >> or performance state. The total amount of them must match >> number of OPPs. */ >> power-microwatt = <100000>, >> <230000>, >> <380000>, >> <600000>; >> }; >> >> then in gpu node instead of having 'dynamic-power-coefficient', >> which is useless because voltage is missing, we would have >> 'energy-model', like: >> >> energy-model = <&msm_gpu_energy_model>; >> >> >> If you agree to continue this topic. I will send an RFC so we could >> further discuss this idea. This $subject doesn't fit well. > > Yes, definitively I agree to continue on this topic. > > Great! I'm going to craft something... ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 12:11 ` Lukasz Luba 2022-02-17 12:33 ` Daniel Lezcano @ 2022-02-17 16:46 ` Matthias Kaehlcke 1 sibling, 0 replies; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-17 16:46 UTC (permalink / raw) To: Lukasz Luba Cc: Daniel Lezcano, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, Doug Anderson On Thu, Feb 17, 2022 at 12:11:27PM +0000, Lukasz Luba wrote: > > > On 2/17/22 11:28 AM, Daniel Lezcano wrote: > > On 17/02/2022 11:47, Lukasz Luba wrote: > > > Hi Daniel, > > > > > > On 2/17/22 10:10 AM, Daniel Lezcano wrote: > > > > On 16/02/2022 18:33, Doug Anderson wrote: > > > > > Hi, > > > > > > > > > > On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba > > > > > <lukasz.luba@arm.com> wrote: > > > > > > > > > > > > Hi Matthias, > > > > > > > > > > > > On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > > > > > > > On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > > > > > > > > > > > > > > > > > > > > > > > > On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > > > > > > > > > On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [snip] > > > > > > > > > > > > > > > > Could you point me to those devices please? > > > > > > > > > > > > > > > > > > arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > > > > > > > > > > > > > > > > > Though as per above they shouldn't be > > > > > > > > > impacted by your change, since the > > > > > > > > > CPUs always pretend to use milli-Watts. > > > > > > > > > > > > > > > > > > [skipped some questions/answers since sc7180 > > > > > > > > > isn't actually impacted by > > > > > > > > > the change] > > > > > > > > > > > > > > > > Thank you Matthias. I will investigate your setup to get better > > > > > > > > understanding. > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > I've checked those DT files and related code. > > > > > > As you already said, this patch is safe for them. > > > > > > So we can apply it IMO. > > > > > > > > > > > > > > > > > > -------------Off-topic------------------ > > > > > > Not in $subject comments: > > > > > > > > > > > > AFAICS based on two files which define thermal zones: > > > > > > sc7180-trogdor-homestar.dtsi > > > > > > sc7180-trogdor-coachz.dtsi > > > > > > > > > > > > only the 'big' cores are used as cooling devices in the > > > > > > 'skin_temp_thermal' - the CPU6 and CPU7. > > > > > > > > > > > > I assume you don't want to model at all the power usage > > > > > > from the Little cluster (which is quite big: 6 CPUs), do you? > > > > > > I can see that the Little CPUs have small dyn-power-coeff > > > > > > ~30% of the big and lower max freq, but still might be worth > > > > > > to add them to IPA. You might give them more 'weight', to > > > > > > make sure they receive more power during power split. > > > > > > > > > > > > You also don't have GPU cooling device in that thermal zone. > > > > > > Based on my experience if your GPU is a power hungry one, > > > > > > e.g. 2-4Watts, you might get better results when you model > > > > > > this 'hot' device (which impacts your temp sensor reported value). > > > > > > > > > > I think the two boards you point at (homestar and coachz) are just the > > > > > two that override the default defined in the SoC dtsi file. If you > > > > > look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > > > > > map. You can also see the cooling maps for the littles. > > > > > > > > > > I guess we don't have a `dynamic-power-coefficient` for the GPU, > > > > > though? Seems like we should, but I haven't dug through all the code > > > > > here... > > > > > > > > The dynamic-power-coefficient is available for OPPs which > > > > includes CPUfreq and devfreq. As the GPU is managed by devfreq, > > > > setting the dynamic-power-coefficient makes the energy model > > > > available for it. > > > > > > > > However, the OPPs must define the frequency and the voltage. > > > > That is the case for most platforms except on QCom platform. > > > > > > > > That may not be specified as it uses a frequency index and the > > > > hardware does the voltage change in our back. The QCom cpufreq > > > > backend get the voltage table from a register (or whatever) and > > > > completes the voltage values for the OPPs, thus adding the > > > > information which is missing in the device tree. The energy > > > > model can then initializes itself and allows the usage of the > > > > Energy Aware Scheduler. > > > > > > > > However this piece of code is missing for the GPU part. > > > > > > > > > > Thank you for joining the discussion. I don't know about that Qcom > > > GPU voltage information is missing. > > > > > > If the voltage is not available (only the frequencies), there is > > > another way. There is an 'advanced' EM which uses registration function: > > > em_dev_register_perf_domain(). It uses a local driver callback to get > > > power for each found frequency. It has benefit because there is no > > > restriction to 'fit' into the math formula, instead just avg power > > > values can be feed into EM. It's called 'advanced' EM [1]. > > > > > > Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten > > > was proposing from ~2014 this upstream, but EAS wasn't merged back > > > then): > > > where to store these power-freq values, which are then used by the > > > callback. > > > > Why not make it more generic and replace the frequency by a performance > > index, so it can be used by any kind of perf limiter? > > For that DT array, yes, it can be an index, so effectively it could be > a simple 1d array. > > something like: > > msm_gpu_energy_model: msm-gpu-energy-model { > compatible = "energy-model" > /* Values are sorted micro-Watts which correspond to each OPP > or performance state. The total amount of them must match > number of OPPs. */ > power-microwatt = <100000>, > <230000>, > <380000>, > <600000>; > }; IIUC for the QCOM GPU the voltages/power consumption per OPP aren't fixed but can vary between different SoCs of the same model. If the ranges aren't too wide it might still be suitable to have a table with the average power consumption. Another question is whether QCOM would be willing to provide information about the GPU power consumption. For the SC7180 CPUs they only provided bogoWatt numbers. Once boards with a given SoC/GPU are available to the public someone could come up with such a table based on measurements, similar to what Doug did for the SC7180 CPUs (commit 82ea7d411d43f). > then in gpu node instead of having 'dynamic-power-coefficient', > which is useless because voltage is missing, we would have > 'energy-model', like: > > energy-model = <&msm_gpu_energy_model>; > > > If you agree to continue this topic. I will send an RFC so we could > further discuss this idea. This $subject doesn't fit well. > > Thank you again for your feedback Daniel! Thanks Lukasz and Daniel for looking into this! m. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 10:47 ` Lukasz Luba 2022-02-17 11:28 ` Daniel Lezcano @ 2022-02-17 16:37 ` Doug Anderson 2022-02-17 17:14 ` Matthias Kaehlcke 2022-02-17 18:27 ` Daniel Lezcano 1 sibling, 2 replies; 40+ messages in thread From: Doug Anderson @ 2022-02-17 16:37 UTC (permalink / raw) To: Lukasz Luba Cc: Daniel Lezcano, Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, jorcrous, Rob Clark Hi, On Thu, Feb 17, 2022 at 2:47 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > Hi Daniel, > > On 2/17/22 10:10 AM, Daniel Lezcano wrote: > > On 16/02/2022 18:33, Doug Anderson wrote: > >> Hi, > >> > >> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > >>> > >>> Hi Matthias, > >>> > >>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > >>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > >>>>> > >>>>> > >>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > >>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > >>>>>>> > >>>>>>> > >>> > >>> [snip] > >>> > >>>>>>> Could you point me to those devices please? > >>>>>> > >>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* > >>>>>> > >>>>>> Though as per above they shouldn't be impacted by your change, > >>>>>> since the > >>>>>> CPUs always pretend to use milli-Watts. > >>>>>> > >>>>>> [skipped some questions/answers since sc7180 isn't actually > >>>>>> impacted by > >>>>>> the change] > >>>>> > >>>>> Thank you Matthias. I will investigate your setup to get better > >>>>> understanding. > >>>> > >>>> Thanks! > >>>> > >>> > >>> I've checked those DT files and related code. > >>> As you already said, this patch is safe for them. > >>> So we can apply it IMO. > >>> > >>> > >>> -------------Off-topic------------------ > >>> Not in $subject comments: > >>> > >>> AFAICS based on two files which define thermal zones: > >>> sc7180-trogdor-homestar.dtsi > >>> sc7180-trogdor-coachz.dtsi > >>> > >>> only the 'big' cores are used as cooling devices in the > >>> 'skin_temp_thermal' - the CPU6 and CPU7. > >>> > >>> I assume you don't want to model at all the power usage > >>> from the Little cluster (which is quite big: 6 CPUs), do you? > >>> I can see that the Little CPUs have small dyn-power-coeff > >>> ~30% of the big and lower max freq, but still might be worth > >>> to add them to IPA. You might give them more 'weight', to > >>> make sure they receive more power during power split. > >>> > >>> You also don't have GPU cooling device in that thermal zone. > >>> Based on my experience if your GPU is a power hungry one, > >>> e.g. 2-4Watts, you might get better results when you model > >>> this 'hot' device (which impacts your temp sensor reported value). > >> > >> I think the two boards you point at (homestar and coachz) are just the > >> two that override the default defined in the SoC dtsi file. If you > >> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > >> map. You can also see the cooling maps for the littles. > >> > >> I guess we don't have a `dynamic-power-coefficient` for the GPU, > >> though? Seems like we should, but I haven't dug through all the code > >> here... > > > > The dynamic-power-coefficient is available for OPPs which includes > > CPUfreq and devfreq. As the GPU is managed by devfreq, setting the > > dynamic-power-coefficient makes the energy model available for it. > > > > However, the OPPs must define the frequency and the voltage. That is the > > case for most platforms except on QCom platform. > > > > That may not be specified as it uses a frequency index and the hardware > > does the voltage change in our back. The QCom cpufreq backend get the > > voltage table from a register (or whatever) and completes the voltage > > values for the OPPs, thus adding the information which is missing in the > > device tree. The energy model can then initializes itself and allows the > > usage of the Energy Aware Scheduler. > > > > However this piece of code is missing for the GPU part. > > > > Thank you for joining the discussion. I don't know about that Qcom > GPU voltage information is missing. > > If the voltage is not available (only the frequencies), there is > another way. There is an 'advanced' EM which uses registration function: > em_dev_register_perf_domain(). It uses a local driver callback to get > power for each found frequency. It has benefit because there is no > restriction to 'fit' into the math formula, instead just avg power > values can be feed into EM. It's called 'advanced' EM [1]. It seems like there _should_ be a way to get the voltage out for GPU operating points, like is done with cpufreq in qcom_cpufreq_hw_read_lut(), but it might need someone with Qualcomm documentation to help with it. Maybe Rajendra would be able to help? Adding Jordon and Rob to this conversation in case they're aware of anything. As you said, we could just list a power for each frequency, though. I'm actually not sure which one would be more accurate across a range of devices with different "corners": specifying a dynamic power coefficient used for all "corners" and then using the actual voltage and doing the math, or specifying a power number for each frequency and ignoring the actual voltage used. In any case we're trying to get ballpark numbers and not every device will be exactly the same, so probably it doesn't matter that much. > Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten > was proposing from ~2014 this upstream, but EAS wasn't merged back > then): > where to store these power-freq values, which are then used by the > callback. We have the 'dynamic-power-coefficient' in DT, but > it has limitations. It would be good to have this simple array > attached to the GPU/CPU node. IMHO it meet the requirement of DT, > it describes the HW (it would have HZ and Watts values). > > Doug, Matthias could you have a look at that function and its > usage, please [1]? > If you guys would support me in this, I would start, with an RFC > proposal, a discussion on LKML. > > [1] > https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 Matthias: I think you've spent more time on the thermal stuff than me so I'll assume you'll follow-up here. If not then please yell! Ideally, though, someone from Qualcomm would jump in an own this. Basically it allows more intelligently throttling the GPU and CPU together in tandem instead of treating them separately IIUC, right? -Doug ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 16:37 ` Doug Anderson @ 2022-02-17 17:14 ` Matthias Kaehlcke 2022-02-17 18:05 ` Lukasz Luba 2022-02-17 18:27 ` Daniel Lezcano 1 sibling, 1 reply; 40+ messages in thread From: Matthias Kaehlcke @ 2022-02-17 17:14 UTC (permalink / raw) To: Doug Anderson Cc: Lukasz Luba, Daniel Lezcano, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, jorcrous, Rob Clark On Thu, Feb 17, 2022 at 08:37:39AM -0800, Doug Anderson wrote: > Hi, > > On Thu, Feb 17, 2022 at 2:47 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > > Hi Daniel, > > > > On 2/17/22 10:10 AM, Daniel Lezcano wrote: > > > On 16/02/2022 18:33, Doug Anderson wrote: > > >> Hi, > > >> > > >> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > >>> > > >>> Hi Matthias, > > >>> > > >>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: > > >>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: > > >>>>> > > >>>>> > > >>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: > > >>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: > > >>>>>>> > > >>>>>>> > > >>> > > >>> [snip] > > >>> > > >>>>>>> Could you point me to those devices please? > > >>>>>> > > >>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* > > >>>>>> > > >>>>>> Though as per above they shouldn't be impacted by your change, > > >>>>>> since the > > >>>>>> CPUs always pretend to use milli-Watts. > > >>>>>> > > >>>>>> [skipped some questions/answers since sc7180 isn't actually > > >>>>>> impacted by > > >>>>>> the change] > > >>>>> > > >>>>> Thank you Matthias. I will investigate your setup to get better > > >>>>> understanding. > > >>>> > > >>>> Thanks! > > >>>> > > >>> > > >>> I've checked those DT files and related code. > > >>> As you already said, this patch is safe for them. > > >>> So we can apply it IMO. > > >>> > > >>> > > >>> -------------Off-topic------------------ > > >>> Not in $subject comments: > > >>> > > >>> AFAICS based on two files which define thermal zones: > > >>> sc7180-trogdor-homestar.dtsi > > >>> sc7180-trogdor-coachz.dtsi > > >>> > > >>> only the 'big' cores are used as cooling devices in the > > >>> 'skin_temp_thermal' - the CPU6 and CPU7. > > >>> > > >>> I assume you don't want to model at all the power usage > > >>> from the Little cluster (which is quite big: 6 CPUs), do you? > > >>> I can see that the Little CPUs have small dyn-power-coeff > > >>> ~30% of the big and lower max freq, but still might be worth > > >>> to add them to IPA. You might give them more 'weight', to > > >>> make sure they receive more power during power split. > > >>> > > >>> You also don't have GPU cooling device in that thermal zone. > > >>> Based on my experience if your GPU is a power hungry one, > > >>> e.g. 2-4Watts, you might get better results when you model > > >>> this 'hot' device (which impacts your temp sensor reported value). > > >> > > >> I think the two boards you point at (homestar and coachz) are just the > > >> two that override the default defined in the SoC dtsi file. If you > > >> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling > > >> map. You can also see the cooling maps for the littles. > > >> > > >> I guess we don't have a `dynamic-power-coefficient` for the GPU, > > >> though? Seems like we should, but I haven't dug through all the code > > >> here... > > > > > > The dynamic-power-coefficient is available for OPPs which includes > > > CPUfreq and devfreq. As the GPU is managed by devfreq, setting the > > > dynamic-power-coefficient makes the energy model available for it. > > > > > > However, the OPPs must define the frequency and the voltage. That is the > > > case for most platforms except on QCom platform. > > > > > > That may not be specified as it uses a frequency index and the hardware > > > does the voltage change in our back. The QCom cpufreq backend get the > > > voltage table from a register (or whatever) and completes the voltage > > > values for the OPPs, thus adding the information which is missing in the > > > device tree. The energy model can then initializes itself and allows the > > > usage of the Energy Aware Scheduler. > > > > > > However this piece of code is missing for the GPU part. > > > > > > > Thank you for joining the discussion. I don't know about that Qcom > > GPU voltage information is missing. > > > > If the voltage is not available (only the frequencies), there is > > another way. There is an 'advanced' EM which uses registration function: > > em_dev_register_perf_domain(). It uses a local driver callback to get > > power for each found frequency. It has benefit because there is no > > restriction to 'fit' into the math formula, instead just avg power > > values can be feed into EM. It's called 'advanced' EM [1]. > > It seems like there _should_ be a way to get the voltage out for GPU > operating points, like is done with cpufreq in > qcom_cpufreq_hw_read_lut(), but it might need someone with Qualcomm > documentation to help with it. Maybe Rajendra would be able to help? > Adding Jordon and Rob to this conversation in case they're aware of > anything. > > As you said, we could just list a power for each frequency, though. > > I'm actually not sure which one would be more accurate across a range > of devices with different "corners": specifying a dynamic power > coefficient used for all "corners" and then using the actual voltage > and doing the math, or specifying a power number for each frequency > and ignoring the actual voltage used. In any case we're trying to get > ballpark numbers and not every device will be exactly the same, so > probably it doesn't matter that much. > > > > Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten > > was proposing from ~2014 this upstream, but EAS wasn't merged back > > then): > > where to store these power-freq values, which are then used by the > > callback. We have the 'dynamic-power-coefficient' in DT, but > > it has limitations. It would be good to have this simple array > > attached to the GPU/CPU node. IMHO it meet the requirement of DT, > > it describes the HW (it would have HZ and Watts values). > > > > Doug, Matthias could you have a look at that function and its > > usage, please [1]? > > If you guys would support me in this, I would start, with an RFC > > proposal, a discussion on LKML. > > > > [1] > > https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 > > Matthias: I think you've spent more time on the thermal stuff than me > so I'll assume you'll follow-up here. If not then please yell! > > Ideally, though, someone from Qualcomm would jump in an own this. > Basically it allows more intelligently throttling the GPU and CPU > together in tandem instead of treating them separately IIUC, right? Yes, I think for the em_dev_register_perf_domain() route support from Qualcomm would be needed since "Drivers must provide a callback function returning <frequency, power> tuples for each performance state. ". ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 17:14 ` Matthias Kaehlcke @ 2022-02-17 18:05 ` Lukasz Luba 0 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 18:05 UTC (permalink / raw) To: Matthias Kaehlcke, Doug Anderson Cc: Daniel Lezcano, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, jorcrous, Rob Clark On 2/17/22 5:14 PM, Matthias Kaehlcke wrote: > On Thu, Feb 17, 2022 at 08:37:39AM -0800, Doug Anderson wrote: >> Hi, >> >> On Thu, Feb 17, 2022 at 2:47 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>> >>> Hi Daniel, >>> >>> On 2/17/22 10:10 AM, Daniel Lezcano wrote: >>>> On 16/02/2022 18:33, Doug Anderson wrote: >>>>> Hi, >>>>> >>>>> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>>>> >>>>>> Hi Matthias, >>>>>> >>>>>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>>>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> [snip] >>>>>> >>>>>>>>>> Could you point me to those devices please? >>>>>>>>> >>>>>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>>>>> >>>>>>>>> Though as per above they shouldn't be impacted by your change, >>>>>>>>> since the >>>>>>>>> CPUs always pretend to use milli-Watts. >>>>>>>>> >>>>>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>>>>> impacted by >>>>>>>>> the change] >>>>>>>> >>>>>>>> Thank you Matthias. I will investigate your setup to get better >>>>>>>> understanding. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>> >>>>>> I've checked those DT files and related code. >>>>>> As you already said, this patch is safe for them. >>>>>> So we can apply it IMO. >>>>>> >>>>>> >>>>>> -------------Off-topic------------------ >>>>>> Not in $subject comments: >>>>>> >>>>>> AFAICS based on two files which define thermal zones: >>>>>> sc7180-trogdor-homestar.dtsi >>>>>> sc7180-trogdor-coachz.dtsi >>>>>> >>>>>> only the 'big' cores are used as cooling devices in the >>>>>> 'skin_temp_thermal' - the CPU6 and CPU7. >>>>>> >>>>>> I assume you don't want to model at all the power usage >>>>>> from the Little cluster (which is quite big: 6 CPUs), do you? >>>>>> I can see that the Little CPUs have small dyn-power-coeff >>>>>> ~30% of the big and lower max freq, but still might be worth >>>>>> to add them to IPA. You might give them more 'weight', to >>>>>> make sure they receive more power during power split. >>>>>> >>>>>> You also don't have GPU cooling device in that thermal zone. >>>>>> Based on my experience if your GPU is a power hungry one, >>>>>> e.g. 2-4Watts, you might get better results when you model >>>>>> this 'hot' device (which impacts your temp sensor reported value). >>>>> >>>>> I think the two boards you point at (homestar and coachz) are just the >>>>> two that override the default defined in the SoC dtsi file. If you >>>>> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >>>>> map. You can also see the cooling maps for the littles. >>>>> >>>>> I guess we don't have a `dynamic-power-coefficient` for the GPU, >>>>> though? Seems like we should, but I haven't dug through all the code >>>>> here... >>>> >>>> The dynamic-power-coefficient is available for OPPs which includes >>>> CPUfreq and devfreq. As the GPU is managed by devfreq, setting the >>>> dynamic-power-coefficient makes the energy model available for it. >>>> >>>> However, the OPPs must define the frequency and the voltage. That is the >>>> case for most platforms except on QCom platform. >>>> >>>> That may not be specified as it uses a frequency index and the hardware >>>> does the voltage change in our back. The QCom cpufreq backend get the >>>> voltage table from a register (or whatever) and completes the voltage >>>> values for the OPPs, thus adding the information which is missing in the >>>> device tree. The energy model can then initializes itself and allows the >>>> usage of the Energy Aware Scheduler. >>>> >>>> However this piece of code is missing for the GPU part. >>>> >>> >>> Thank you for joining the discussion. I don't know about that Qcom >>> GPU voltage information is missing. >>> >>> If the voltage is not available (only the frequencies), there is >>> another way. There is an 'advanced' EM which uses registration function: >>> em_dev_register_perf_domain(). It uses a local driver callback to get >>> power for each found frequency. It has benefit because there is no >>> restriction to 'fit' into the math formula, instead just avg power >>> values can be feed into EM. It's called 'advanced' EM [1]. >> >> It seems like there _should_ be a way to get the voltage out for GPU >> operating points, like is done with cpufreq in >> qcom_cpufreq_hw_read_lut(), but it might need someone with Qualcomm >> documentation to help with it. Maybe Rajendra would be able to help? >> Adding Jordon and Rob to this conversation in case they're aware of >> anything. >> >> As you said, we could just list a power for each frequency, though. >> >> I'm actually not sure which one would be more accurate across a range >> of devices with different "corners": specifying a dynamic power >> coefficient used for all "corners" and then using the actual voltage >> and doing the math, or specifying a power number for each frequency >> and ignoring the actual voltage used. In any case we're trying to get >> ballpark numbers and not every device will be exactly the same, so >> probably it doesn't matter that much. >> >> >>> Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten >>> was proposing from ~2014 this upstream, but EAS wasn't merged back >>> then): >>> where to store these power-freq values, which are then used by the >>> callback. We have the 'dynamic-power-coefficient' in DT, but >>> it has limitations. It would be good to have this simple array >>> attached to the GPU/CPU node. IMHO it meet the requirement of DT, >>> it describes the HW (it would have HZ and Watts values). >>> >>> Doug, Matthias could you have a look at that function and its >>> usage, please [1]? >>> If you guys would support me in this, I would start, with an RFC >>> proposal, a discussion on LKML. >>> >>> [1] >>> https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 >> >> Matthias: I think you've spent more time on the thermal stuff than me >> so I'll assume you'll follow-up here. If not then please yell! >> >> Ideally, though, someone from Qualcomm would jump in an own this. >> Basically it allows more intelligently throttling the GPU and CPU >> together in tandem instead of treating them separately IIUC, right? > > Yes, I think for the em_dev_register_perf_domain() route support from > Qualcomm would be needed since "Drivers must provide a callback > function returning <frequency, power> tuples for each performance > state. ". > Not necessarily. It might be done 'generically' by fwk. There are other benefits of this 'energy-model' entry in the DT. I'll list them in the cover letter. Let me send an RFC, so we could discuss there. Thanks guys! Regards, Lukasz ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 16:37 ` Doug Anderson 2022-02-17 17:14 ` Matthias Kaehlcke @ 2022-02-17 18:27 ` Daniel Lezcano 1 sibling, 0 replies; 40+ messages in thread From: Daniel Lezcano @ 2022-02-17 18:27 UTC (permalink / raw) To: Doug Anderson, Lukasz Luba Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson, jorcrous, Rob Clark, Manaf Meethalavalappu Pallikunhi Adding Manaf (QCom) in the loop On 17/02/2022 17:37, Doug Anderson wrote: > Hi, > > On Thu, Feb 17, 2022 at 2:47 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> Hi Daniel, >> >> On 2/17/22 10:10 AM, Daniel Lezcano wrote: >>> On 16/02/2022 18:33, Doug Anderson wrote: >>>> Hi, >>>> >>>> On Wed, Feb 16, 2022 at 7:35 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>>> >>>>> Hi Matthias, >>>>> >>>>> On 2/9/22 10:17 PM, Matthias Kaehlcke wrote: >>>>>> On Wed, Feb 09, 2022 at 11:16:36AM +0000, Lukasz Luba wrote: >>>>>>> >>>>>>> >>>>>>> On 2/8/22 5:25 PM, Matthias Kaehlcke wrote: >>>>>>>> On Tue, Feb 08, 2022 at 09:32:28AM +0000, Lukasz Luba wrote: >>>>>>>>> >>>>>>>>> >>>>> >>>>> [snip] >>>>> >>>>>>>>> Could you point me to those devices please? >>>>>>>> >>>>>>>> arch/arm64/boot/dts/qcom/sc7180-trogdor-* >>>>>>>> >>>>>>>> Though as per above they shouldn't be impacted by your change, >>>>>>>> since the >>>>>>>> CPUs always pretend to use milli-Watts. >>>>>>>> >>>>>>>> [skipped some questions/answers since sc7180 isn't actually >>>>>>>> impacted by >>>>>>>> the change] >>>>>>> >>>>>>> Thank you Matthias. I will investigate your setup to get better >>>>>>> understanding. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>>> I've checked those DT files and related code. >>>>> As you already said, this patch is safe for them. >>>>> So we can apply it IMO. >>>>> >>>>> >>>>> -------------Off-topic------------------ >>>>> Not in $subject comments: >>>>> >>>>> AFAICS based on two files which define thermal zones: >>>>> sc7180-trogdor-homestar.dtsi >>>>> sc7180-trogdor-coachz.dtsi >>>>> >>>>> only the 'big' cores are used as cooling devices in the >>>>> 'skin_temp_thermal' - the CPU6 and CPU7. >>>>> >>>>> I assume you don't want to model at all the power usage >>>>> from the Little cluster (which is quite big: 6 CPUs), do you? >>>>> I can see that the Little CPUs have small dyn-power-coeff >>>>> ~30% of the big and lower max freq, but still might be worth >>>>> to add them to IPA. You might give them more 'weight', to >>>>> make sure they receive more power during power split. >>>>> >>>>> You also don't have GPU cooling device in that thermal zone. >>>>> Based on my experience if your GPU is a power hungry one, >>>>> e.g. 2-4Watts, you might get better results when you model >>>>> this 'hot' device (which impacts your temp sensor reported value). >>>> >>>> I think the two boards you point at (homestar and coachz) are just the >>>> two that override the default defined in the SoC dtsi file. If you >>>> look in sc7180.dtsi you'll see 'gpuss1-thermal' which has a cooling >>>> map. You can also see the cooling maps for the littles. >>>> >>>> I guess we don't have a `dynamic-power-coefficient` for the GPU, >>>> though? Seems like we should, but I haven't dug through all the code >>>> here... >>> >>> The dynamic-power-coefficient is available for OPPs which includes >>> CPUfreq and devfreq. As the GPU is managed by devfreq, setting the >>> dynamic-power-coefficient makes the energy model available for it. >>> >>> However, the OPPs must define the frequency and the voltage. That is the >>> case for most platforms except on QCom platform. >>> >>> That may not be specified as it uses a frequency index and the hardware >>> does the voltage change in our back. The QCom cpufreq backend get the >>> voltage table from a register (or whatever) and completes the voltage >>> values for the OPPs, thus adding the information which is missing in the >>> device tree. The energy model can then initializes itself and allows the >>> usage of the Energy Aware Scheduler. >>> >>> However this piece of code is missing for the GPU part. >>> >> >> Thank you for joining the discussion. I don't know about that Qcom >> GPU voltage information is missing. >> >> If the voltage is not available (only the frequencies), there is >> another way. There is an 'advanced' EM which uses registration function: >> em_dev_register_perf_domain(). It uses a local driver callback to get >> power for each found frequency. It has benefit because there is no >> restriction to 'fit' into the math formula, instead just avg power >> values can be feed into EM. It's called 'advanced' EM [1]. > > It seems like there _should_ be a way to get the voltage out for GPU > operating points, like is done with cpufreq in > qcom_cpufreq_hw_read_lut(), but it might need someone with Qualcomm > documentation to help with it. Maybe Rajendra would be able to help? > Adding Jordon and Rob to this conversation in case they're aware of > anything. > > As you said, we could just list a power for each frequency, though. > > I'm actually not sure which one would be more accurate across a range > of devices with different "corners": specifying a dynamic power > coefficient used for all "corners" and then using the actual voltage > and doing the math, or specifying a power number for each frequency > and ignoring the actual voltage used. In any case we're trying to get > ballpark numbers and not every device will be exactly the same, so > probably it doesn't matter that much. > > >> Now we hit (again) the DT & EM issue (it's an old one, IIRC Morten >> was proposing from ~2014 this upstream, but EAS wasn't merged back >> then): >> where to store these power-freq values, which are then used by the >> callback. We have the 'dynamic-power-coefficient' in DT, but >> it has limitations. It would be good to have this simple array >> attached to the GPU/CPU node. IMHO it meet the requirement of DT, >> it describes the HW (it would have HZ and Watts values). >> >> Doug, Matthias could you have a look at that function and its >> usage, please [1]? >> If you guys would support me in this, I would start, with an RFC >> proposal, a discussion on LKML. >> >> [1] >> https://elixir.bootlin.com/linux/v5.17-rc4/source/Documentation/power/energy-model.rst#L87 > > Matthias: I think you've spent more time on the thermal stuff than me > so I'll assume you'll follow-up here. If not then please yell! > > Ideally, though, someone from Qualcomm would jump in an own this. > Basically it allows more intelligently throttling the GPU and CPU > together in tandem instead of treating them separately IIUC, right? > > -Doug -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-08 9:32 ` Lukasz Luba 2022-02-08 17:25 ` Matthias Kaehlcke @ 2022-02-16 17:21 ` Doug Anderson 2022-02-16 23:28 ` Lukasz Luba 1 sibling, 1 reply; 40+ messages in thread From: Doug Anderson @ 2022-02-16 17:21 UTC (permalink / raw) To: Lukasz Luba Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson Hi, On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > > > Another important thing is the consistent scale of the power values > > provided by the cooling devices. All of the cooling devices in a single > > thermal zone should have power values reported either in milli-Watts > > or scaled to the same 'abstract scale'. > > This can change. We have removed the userspace governor from kernel > recently. The trend is to implement thermal policy in FW. Dealing with > some intermediate configurations are causing complicated design, support > of the algorithm logic is also more complex. One thing that didn't get addressed is the whole "The trend is to implement thermal policy in FW". I'm not sure I can get on board with that trend. IMO "moving to FW" isn't a super great trend. FW is harder to update than kernel and trying to keep it in sync with the kernel isn't wonderful. Unless something _has_ to be in FW I personally prefer it to be in the kernel. ...although now that I re-read this, I'm not sure which firmware you might be talking about. Is this the AP firmware, or some companion chip / coprocessor? Even so, I'd still rather see things done in the kernel when possible... -Doug ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 17:21 ` Doug Anderson @ 2022-02-16 23:28 ` Lukasz Luba 2022-02-17 16:50 ` Doug Anderson 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-16 23:28 UTC (permalink / raw) To: Doug Anderson Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 2/16/22 5:21 PM, Doug Anderson wrote: > Hi, > > On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >>> Another important thing is the consistent scale of the power values >>> provided by the cooling devices. All of the cooling devices in a single >>> thermal zone should have power values reported either in milli-Watts >>> or scaled to the same 'abstract scale'. >> >> This can change. We have removed the userspace governor from kernel >> recently. The trend is to implement thermal policy in FW. Dealing with >> some intermediate configurations are causing complicated design, support >> of the algorithm logic is also more complex. > > One thing that didn't get addressed is the whole "The trend is to > implement thermal policy in FW". I'm not sure I can get on board with > that trend. IMO "moving to FW" isn't a super great trend. FW is harder > to update than kernel and trying to keep it in sync with the kernel > isn't wonderful. Unless something _has_ to be in FW I personally > prefer it to be in the kernel. There are pros and cons for both approaches (as always). Although, there are some use cases, where the kernel is not able to react that fast, e.g. sudden power usage changes, which can cause that the power rail is not able to sustain within required conditions. When we are talking about tough requirements for those power & thermal policies, the mechanism must be fast, precised and reliable. Here you can find Arm reference FW implementation and an IPA clone in there (I have been reviewing this) [1][2]. As you can see there is a new FW feature set: "MPMM, Traffic-cop and Thermal management". Apart from Arm implementation, there are already known thermal monitoring mechanisms in HW/FW. Like in the new Qcom SoCs which are using this driver code [3]. The driver receives an interrupt about throttling conditions and just populates the thermal pressure. > > ...although now that I re-read this, I'm not sure which firmware you > might be talking about. Is this the AP firmware, or some companion > chip / coprocessor? Even so, I'd still rather see things done in the > kernel when possible... It's a FW run on a dedicated microprocessor. In Arm SoCs it's usually some Cortex-M. We communicated with it from the kernel via SCMI drivers (using shared memory and mailboxes). We recommend to use the SCMI protocol to send e.g. 'performance request' to the FW via 'fast channel' instead of having an implementation of PMIC and clock, and do the voltage & freq change in the kernel (using drivers & locking). That implementation allows to avoid costly locking and allows to go via that SCMI cpufreq driver [4] and SCMI perf layer [5] the task scheduler. We don't need a dedicated 'sugov' kthread in a Deadline policy to do that work and preempt the currently running task. IMHO the FW approach opens new opportunities. Regards, Lukasz [1] https://github.com/ARM-software/SCP-firmware/pull/588 [2] https://github.com/ARM-software/SCP-firmware/pull/588/commits/59c62ead5eb66353ae805c367bfa86192e28c410 [3] https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/cpufreq/qcom-cpufreq-hw.c#L287 [4] https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L65 [5] https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/firmware/arm_scmi/perf.c#L465 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-16 23:28 ` Lukasz Luba @ 2022-02-17 16:50 ` Doug Anderson 2022-02-17 17:58 ` Lukasz Luba 0 siblings, 1 reply; 40+ messages in thread From: Doug Anderson @ 2022-02-17 16:50 UTC (permalink / raw) To: Lukasz Luba Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson Hi, On Wed, Feb 16, 2022 at 3:28 PM Lukasz Luba <lukasz.luba@arm.com> wrote: > > On 2/16/22 5:21 PM, Doug Anderson wrote: > > Hi, > > > > On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@arm.com> wrote: > >> > >>> Another important thing is the consistent scale of the power values > >>> provided by the cooling devices. All of the cooling devices in a single > >>> thermal zone should have power values reported either in milli-Watts > >>> or scaled to the same 'abstract scale'. > >> > >> This can change. We have removed the userspace governor from kernel > >> recently. The trend is to implement thermal policy in FW. Dealing with > >> some intermediate configurations are causing complicated design, support > >> of the algorithm logic is also more complex. > > > > One thing that didn't get addressed is the whole "The trend is to > > implement thermal policy in FW". I'm not sure I can get on board with > > that trend. IMO "moving to FW" isn't a super great trend. FW is harder > > to update than kernel and trying to keep it in sync with the kernel > > isn't wonderful. Unless something _has_ to be in FW I personally > > prefer it to be in the kernel. > > There are pros and cons for both approaches (as always). > > Although, there are some use cases, where the kernel is not able to > react that fast, e.g. sudden power usage changes, which can cause > that the power rail is not able to sustain within required conditions. > When we are talking about tough requirements for those power & thermal > policies, the mechanism must be fast, precised and reliable. > > Here you can find Arm reference FW implementation and an IPA clone > in there (I have been reviewing this) [1][2]. > > As you can see there is a new FW feature set: > "MPMM, Traffic-cop and Thermal management". > > Apart from Arm implementation, there are already known thermal > monitoring mechanisms in HW/FW. Like in the new Qcom SoCs which > are using this driver code [3]. The driver receives an interrupt > about throttling conditions and just populates the thermal pressure. Yeah, this has come up in another context recently too. Right on on the Qcom SoCs I'm working with (sc7180 on Chromebooks) we've essentially disabled all the HW/FW throttling (LMH), preferring to let Linux manage things. We chose to do it this way with the assumption that Linux would be able to make better decisions than the firmware and it was easier to understand / update than an opaque vendor-provided blob. LMH is still there with super high limits in case Linux goofs up (we don't want to damage the CPU) but it's not the primary means of throttling. As you said, Linux reacts a bit slower, though I've heard that might be fixed soon-ish? So far on sc7180 Chromebooks it hasn't been a problem because we have more total thermal mass and the CPUs in sc7180 don't actually generate that much heat compared to other CPUs. We also have thermal interrupts enabled, which helps. That being said, improvements are certainly welcome! > > ...although now that I re-read this, I'm not sure which firmware you > > might be talking about. Is this the AP firmware, or some companion > > chip / coprocessor? Even so, I'd still rather see things done in the > > kernel when possible... > > It's a FW run on a dedicated microprocessor. In Arm SoCs it's usually > some Cortex-M. We communicated with it from the kernel via SCMI drivers > (using shared memory and mailboxes). We recommend to use the SCMI > protocol to send e.g. 'performance request' to the FW via 'fast > channel' instead of having an implementation of PMIC and clock, and do > the voltage & freq change in the kernel (using drivers & locking). That > implementation allows to avoid costly locking and allows to go via > that SCMI cpufreq driver [4] and SCMI perf layer [5] the task scheduler. > We don't need a dedicated 'sugov' kthread in a Deadline policy to > do that work and preempt the currently running task. > > IMHO the FW approach opens new opportunities. > > Regards, > Lukasz > > [1] https://github.com/ARM-software/SCP-firmware/pull/588 > [2] > https://github.com/ARM-software/SCP-firmware/pull/588/commits/59c62ead5eb66353ae805c367bfa86192e28c410 > [3] > https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/cpufreq/qcom-cpufreq-hw.c#L287 > [4] > https://elixir.bootlin.com/linux/latest/source/drivers/cpufreq/scmi-cpufreq.c#L65 > [5] > https://elixir.bootlin.com/linux/v5.17-rc4/source/drivers/firmware/arm_scmi/perf.c#L465 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 16:50 ` Doug Anderson @ 2022-02-17 17:58 ` Lukasz Luba 0 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 17:58 UTC (permalink / raw) To: Doug Anderson Cc: Matthias Kaehlcke, LKML, Linux PM, amit daniel kachhap, Daniel Lezcano, Viresh Kumar, Rafael J. Wysocki, Amit Kucheria, Zhang Rui, Dietmar Eggemann, Pierre.Gondois, Stephen Boyd, Rajendra Nayak, Bjorn Andersson On 2/17/22 4:50 PM, Doug Anderson wrote: > Hi, > > On Wed, Feb 16, 2022 at 3:28 PM Lukasz Luba <lukasz.luba@arm.com> wrote: >> >> On 2/16/22 5:21 PM, Doug Anderson wrote: >>> Hi, >>> >>> On Tue, Feb 8, 2022 at 1:32 AM Lukasz Luba <lukasz.luba@arm.com> wrote: >>>> >>>>> Another important thing is the consistent scale of the power values >>>>> provided by the cooling devices. All of the cooling devices in a single >>>>> thermal zone should have power values reported either in milli-Watts >>>>> or scaled to the same 'abstract scale'. >>>> >>>> This can change. We have removed the userspace governor from kernel >>>> recently. The trend is to implement thermal policy in FW. Dealing with >>>> some intermediate configurations are causing complicated design, support >>>> of the algorithm logic is also more complex. >>> >>> One thing that didn't get addressed is the whole "The trend is to >>> implement thermal policy in FW". I'm not sure I can get on board with >>> that trend. IMO "moving to FW" isn't a super great trend. FW is harder >>> to update than kernel and trying to keep it in sync with the kernel >>> isn't wonderful. Unless something _has_ to be in FW I personally >>> prefer it to be in the kernel. >> >> There are pros and cons for both approaches (as always). >> >> Although, there are some use cases, where the kernel is not able to >> react that fast, e.g. sudden power usage changes, which can cause >> that the power rail is not able to sustain within required conditions. >> When we are talking about tough requirements for those power & thermal >> policies, the mechanism must be fast, precised and reliable. >> >> Here you can find Arm reference FW implementation and an IPA clone >> in there (I have been reviewing this) [1][2]. >> >> As you can see there is a new FW feature set: >> "MPMM, Traffic-cop and Thermal management". >> >> Apart from Arm implementation, there are already known thermal >> monitoring mechanisms in HW/FW. Like in the new Qcom SoCs which >> are using this driver code [3]. The driver receives an interrupt >> about throttling conditions and just populates the thermal pressure. > > Yeah, this has come up in another context recently too. Right on on > the Qcom SoCs I'm working with (sc7180 on Chromebooks) we've > essentially disabled all the HW/FW throttling (LMH), preferring to let > Linux manage things. We chose to do it this way with the assumption > that Linux would be able to make better decisions than the firmware > and it was easier to understand / update than an opaque > vendor-provided blob. LMH is still there with super high limits in > case Linux goofs up (we don't want to damage the CPU) but it's not the > primary means of throttling. > > As you said, Linux reacts a bit slower, though I've heard that might > be fixed soon-ish? So far on sc7180 Chromebooks it hasn't been a > problem because we have more total thermal mass and the CPUs in sc7180 > don't actually generate that much heat compared to other CPUs. We also > have thermal interrupts enabled, which helps. That being said, > improvements are certainly welcome! > Thanks Doug for sharing this with me. I'll keep this in mind, your requirements, configuration and usage. I've learned recently that some SoCs start throttling very early during boot, even before the cpumask is available. That was causing kernel to blow up (deference of a cpumask NULL pointer in that LMH [1]). [1] https://lore.kernel.org/linux-pm/20220128032554.155132-2-bjorn.andersson@linaro.org/ ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba 2022-02-08 0:50 ` Matthias Kaehlcke @ 2022-02-17 18:18 ` Lukasz Luba 2022-02-22 17:05 ` Lukasz Luba 1 sibling, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-17 18:18 UTC (permalink / raw) To: daniel.lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel Hi Daniel, On 2/7/22 7:30 AM, Lukasz Luba wrote: > The Energy Model supports power values either in Watts or in some abstract > scale. When the 2nd option is in use, the thermal governor IPA should not > be allowed to operate, since the relation between cooling devices is not > properly defined. Thus, it might be possible that big GPU has lower power > values in abstract scale than a Little CPU. To mitigate a misbehaviour > of the thermal control algorithm, simply not register a cooling device > capable of working with IPA. > > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/thermal/cpufreq_cooling.c | 2 +- > drivers/thermal/devfreq_cooling.c | 16 +++++++++++++--- > 2 files changed, 14 insertions(+), 4 deletions(-) The discussion in below this patch went slightly off-topic but it was valuable. It clarified also there are no broken platforms with this change. Could you take the patch into the thermal tree, please? Regards, Lukasz ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-17 18:18 ` Lukasz Luba @ 2022-02-22 17:05 ` Lukasz Luba 2022-02-22 18:12 ` Daniel Lezcano 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-22 17:05 UTC (permalink / raw) To: daniel.lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel Hi Daniel, gentle ping On 2/17/22 18:18, Lukasz Luba wrote: > Hi Daniel, > > > On 2/7/22 7:30 AM, Lukasz Luba wrote: >> The Energy Model supports power values either in Watts or in some >> abstract >> scale. When the 2nd option is in use, the thermal governor IPA should not >> be allowed to operate, since the relation between cooling devices is not >> properly defined. Thus, it might be possible that big GPU has lower power >> values in abstract scale than a Little CPU. To mitigate a misbehaviour >> of the thermal control algorithm, simply not register a cooling device >> capable of working with IPA. >> >> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> >> --- >> drivers/thermal/cpufreq_cooling.c | 2 +- >> drivers/thermal/devfreq_cooling.c | 16 +++++++++++++--- >> 2 files changed, 14 insertions(+), 4 deletions(-) > > The discussion in below this patch went slightly off-topic but it was > valuable. It clarified also there are no broken platforms with this > change. > > Could you take the patch into the thermal tree, please? > > Regards, > Lukasz ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-22 17:05 ` Lukasz Luba @ 2022-02-22 18:12 ` Daniel Lezcano 2022-02-22 18:31 ` Lukasz Luba 0 siblings, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-22 18:12 UTC (permalink / raw) To: Lukasz Luba Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel Hi Lukasz, I don't think it makes sense to remove the support of the energy model if the units are abstracts. IIUC, regarding your previous answer, we don't really know what will do the SoC vendor with these numbers and likely they will provide consistent abstract values which won't prevent a correct behavior. What would be the benefit of giving inconsistent abstract values which will be unusable except of giving a broken energy model? Your proposed changes would be acceptable if the energy model has a broken flag IMO On 22/02/2022 18:05, Lukasz Luba wrote: > Hi Daniel, > > gentle ping > > On 2/17/22 18:18, Lukasz Luba wrote: >> Hi Daniel, >> >> >> On 2/7/22 7:30 AM, Lukasz Luba wrote: >>> The Energy Model supports power values either in Watts or in some >>> abstract >>> scale. When the 2nd option is in use, the thermal governor IPA should >>> not >>> be allowed to operate, since the relation between cooling devices is not >>> properly defined. Thus, it might be possible that big GPU has lower >>> power >>> values in abstract scale than a Little CPU. To mitigate a misbehaviour >>> of the thermal control algorithm, simply not register a cooling device >>> capable of working with IPA. >>> >>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> >>> --- >>> drivers/thermal/cpufreq_cooling.c | 2 +- >>> drivers/thermal/devfreq_cooling.c | 16 +++++++++++++--- >>> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> The discussion in below this patch went slightly off-topic but it was >> valuable. It clarified also there are no broken platforms with this >> change. >> >> Could you take the patch into the thermal tree, please? >> >> Regards, >> Lukasz -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-22 18:12 ` Daniel Lezcano @ 2022-02-22 18:31 ` Lukasz Luba 2022-02-22 22:10 ` Daniel Lezcano 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-22 18:31 UTC (permalink / raw) To: Daniel Lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel On 2/22/22 18:12, Daniel Lezcano wrote: > > Hi Lukasz, > > I don't think it makes sense to remove the support of the energy model > if the units are abstracts. > > IIUC, regarding your previous answer, we don't really know what will do > the SoC vendor with these numbers and likely they will provide > consistent abstract values which won't prevent a correct behavior. > > What would be the benefit of giving inconsistent abstract values which > will be unusable except of giving a broken energy model? The power values in the EM which has abstract scale, would make sense to EAS, but not for IPA or DTPM. Those platforms which want to enable EAS, but don't need IPA, would register such '<a_good_name_here>' EM. > > Your proposed changes would be acceptable if the energy model has a > broken flag IMO That is doable. I can add that flag, so we can call it 'artificial' EM (when this new flag is set). Let me craft the RFC patch with this new flag proposal then. Do you agree? Can I also add you as 'Suggested-by'? Thank you for coming back to me with the comments. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-22 18:31 ` Lukasz Luba @ 2022-02-22 22:10 ` Daniel Lezcano 2022-02-23 9:10 ` Lukasz Luba 0 siblings, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-22 22:10 UTC (permalink / raw) To: Lukasz Luba Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel Hi Lukasz, On 22/02/2022 19:31, Lukasz Luba wrote: > > > On 2/22/22 18:12, Daniel Lezcano wrote: >> >> Hi Lukasz, >> >> I don't think it makes sense to remove the support of the energy model >> if the units are abstracts. >> >> IIUC, regarding your previous answer, we don't really know what will >> do the SoC vendor with these numbers and likely they will provide >> consistent abstract values which won't prevent a correct behavior. >> >> What would be the benefit of giving inconsistent abstract values which >> will be unusable except of giving a broken energy model? > > The power values in the EM which has abstract scale, would make sense to > EAS, but not for IPA or DTPM. Those platforms which want to enable EAS, > but don't need IPA, would register such '<a_good_name_here>' EM. Sorry, but I don't understand why DTPM can not deal with abstract values? >> Your proposed changes would be acceptable if the energy model has a >> broken flag IMO > > That is doable. I can add that flag, so we can call it 'artificial' EM > (when this new flag is set). It is too soon IMO, I would like to see the numbers first so we can take an enlighten decision. Right now, it is unclear what the numbers will be. > Let me craft the RFC patch with this new flag proposal then. > Do you agree? Can I also add you as 'Suggested-by'? > > Thank you for coming back to me with the comments. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling 2022-02-22 22:10 ` Daniel Lezcano @ 2022-02-23 9:10 ` Lukasz Luba 0 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-23 9:10 UTC (permalink / raw) To: Daniel Lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, Matthias Kaehlcke, Doug Anderson, linux-pm, linux-kernel On 2/22/22 22:10, Daniel Lezcano wrote: > > Hi Lukasz, > > On 22/02/2022 19:31, Lukasz Luba wrote: >> >> >> On 2/22/22 18:12, Daniel Lezcano wrote: >>> >>> Hi Lukasz, >>> >>> I don't think it makes sense to remove the support of the energy >>> model if the units are abstracts. >>> >>> IIUC, regarding your previous answer, we don't really know what will >>> do the SoC vendor with these numbers and likely they will provide >>> consistent abstract values which won't prevent a correct behavior. >>> >>> What would be the benefit of giving inconsistent abstract values >>> which will be unusable except of giving a broken energy model? >> >> The power values in the EM which has abstract scale, would make sense >> to EAS, but not for IPA or DTPM. Those platforms which want to enable >> EAS, >> but don't need IPA, would register such '<a_good_name_here>' EM. > > Sorry, but I don't understand why DTPM can not deal with abstract values? They will be totally meaningless/bogus. > > >>> Your proposed changes would be acceptable if the energy model has a >>> broken flag IMO >> >> That is doable. I can add that flag, so we can call it 'artificial' EM >> (when this new flag is set). > > It is too soon IMO, I would like to see the numbers first so we can take > an enlighten decision. Right now, it is unclear what the numbers will be. We are going to add new support from our roadmap for platforms which don't have this power information but are going to use EAS. I'm going to send some patches soon which create that support. Pierre is going to send the platform code. I want to make sure that this platform won't register power actors for IPA. Other thermal governors will work, since they don't use EM for making a decision. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH 2/2] powercap: DTPM: Check Energy Model type for power values scale 2022-02-07 7:30 [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Lukasz Luba 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba @ 2022-02-07 7:30 ` Lukasz Luba 2022-02-07 10:41 ` [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Daniel Lezcano 2 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-07 7:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: amit.kachhap, daniel.lezcano, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, lukasz.luba, Pierre.Gondois The Energy Model power values might be in an abstract scale. In such case it's safe to bail out during the registration, since the PowerCap framework supports only micro-Watts. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/powercap/dtpm_cpu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c index b740866b228d..e6bcde081de4 100644 --- a/drivers/powercap/dtpm_cpu.c +++ b/drivers/powercap/dtpm_cpu.c @@ -188,7 +188,7 @@ static int cpuhp_dtpm_cpu_online(unsigned int cpu) return 0; pd = em_cpu_get(cpu); - if (!pd) + if (!pd || !(pd->flags & EM_PERF_DOMAIN_MILLIWATTS)) return -EINVAL; dtpm_cpu = per_cpu(dtpm_per_cpu, cpu); -- 2.17.1 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM 2022-02-07 7:30 [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Lukasz Luba 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba 2022-02-07 7:30 ` [PATCH 2/2] powercap: DTPM: Check Energy Model type for power values scale Lukasz Luba @ 2022-02-07 10:41 ` Daniel Lezcano 2022-02-07 11:44 ` Lukasz Luba 2 siblings, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-07 10:41 UTC (permalink / raw) To: Lukasz Luba, linux-kernel, linux-pm Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois On 07/02/2022 08:30, Lukasz Luba wrote: > Hi all, > > The Energy Model supports abstract scale power values. This might cause > issues for some mechanisms like thermal governor IPA or DTPM, which > expect that all devices provide sane power values. This patch set prevents > from registering such devices for IPA and DTPM. Does it mean for example big and little have both 0-100 ? -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM 2022-02-07 10:41 ` [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Daniel Lezcano @ 2022-02-07 11:44 ` Lukasz Luba 2022-02-08 7:39 ` Daniel Lezcano 0 siblings, 1 reply; 40+ messages in thread From: Lukasz Luba @ 2022-02-07 11:44 UTC (permalink / raw) To: Daniel Lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, linux-kernel, linux-pm On 2/7/22 10:41 AM, Daniel Lezcano wrote: > On 07/02/2022 08:30, Lukasz Luba wrote: >> Hi all, >> >> The Energy Model supports abstract scale power values. This might cause >> issues for some mechanisms like thermal governor IPA or DTPM, which >> expect that all devices provide sane power values. This patch set >> prevents >> from registering such devices for IPA and DTPM. > > > Does it mean for example big and little have both 0-100 ? > > Unfortunately, these can be any numbers. I hope at least the CPUs Big and Little power have sense: Little power is not higher than Big power. The purpose of EM is to enable EAS, so this power relation between Big and Little should have sense. Someone who is not willing to or cannot expose real power values, still wants the EAS to operate (my assumption and hope). The SCMI FW can provide abstract power values. It's in the SCMI spec. Thus, creating these abstract scale power values for big.LITTLE the right way should result in properly working EAS. I can also have hope for GPU vs. Big power, but it is a weaker hope. The second is more tricky to distinguish even if you have a domain knowledge, but not the real measurements with you. The GPU power values is also a 'sensitive' knowledge to share. Open source guys can do that (after measurements), but some vendor's engineers probably can't. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM 2022-02-07 11:44 ` Lukasz Luba @ 2022-02-08 7:39 ` Daniel Lezcano 2022-02-08 9:47 ` Lukasz Luba 0 siblings, 1 reply; 40+ messages in thread From: Daniel Lezcano @ 2022-02-08 7:39 UTC (permalink / raw) To: Lukasz Luba Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, linux-kernel, linux-pm On 07/02/2022 12:44, Lukasz Luba wrote: > > > On 2/7/22 10:41 AM, Daniel Lezcano wrote: >> On 07/02/2022 08:30, Lukasz Luba wrote: >>> Hi all, >>> >>> The Energy Model supports abstract scale power values. This might cause >>> issues for some mechanisms like thermal governor IPA or DTPM, which >>> expect that all devices provide sane power values. This patch set >>> prevents >>> from registering such devices for IPA and DTPM. >> >> >> Does it mean for example big and little have both 0-100 ? >> >> > > Unfortunately, these can be any numbers. I hope at least the CPUs > Big and Little power have sense: Little power is not higher > than Big power. The purpose of EM is to enable EAS, so this power > relation between Big and Little should have sense. Someone > who is not willing to or cannot expose real power values, still > wants the EAS to operate (my assumption and hope). The SCMI FW can > provide abstract power values. It's in the SCMI spec. Thus, > creating these abstract scale power values for big.LITTLE the right > way should result in properly working EAS. > > I can also have hope for GPU vs. Big power, but it is a weaker hope. > The second is more tricky to distinguish even if you have a domain > knowledge, but not the real measurements with you. The GPU power > values is also a 'sensitive' knowledge to share. Open source guys can do > that (after measurements), but some vendor's engineers probably can't. So basically, we don't know, right ? At this point the different subsystems (cpufreq cooling device and dtpm) disabled by these patches can deal with abstract scale values, like they do today with the very approximate power numbers we have defined in the DT. Let's wait and see how the different SoC vendors implement the SCMI spec. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM 2022-02-08 7:39 ` Daniel Lezcano @ 2022-02-08 9:47 ` Lukasz Luba 0 siblings, 0 replies; 40+ messages in thread From: Lukasz Luba @ 2022-02-08 9:47 UTC (permalink / raw) To: Daniel Lezcano Cc: amit.kachhap, viresh.kumar, rafael, amitk, rui.zhang, dietmar.eggemann, Pierre.Gondois, linux-kernel, linux-pm On 2/8/22 7:39 AM, Daniel Lezcano wrote: > On 07/02/2022 12:44, Lukasz Luba wrote: >> >> >> On 2/7/22 10:41 AM, Daniel Lezcano wrote: >>> On 07/02/2022 08:30, Lukasz Luba wrote: >>>> Hi all, >>>> >>>> The Energy Model supports abstract scale power values. This might cause >>>> issues for some mechanisms like thermal governor IPA or DTPM, which >>>> expect that all devices provide sane power values. This patch set >>>> prevents >>>> from registering such devices for IPA and DTPM. >>> >>> >>> Does it mean for example big and little have both 0-100 ? >>> >>> >> >> Unfortunately, these can be any numbers. I hope at least the CPUs >> Big and Little power have sense: Little power is not higher >> than Big power. The purpose of EM is to enable EAS, so this power >> relation between Big and Little should have sense. Someone >> who is not willing to or cannot expose real power values, still >> wants the EAS to operate (my assumption and hope). The SCMI FW can >> provide abstract power values. It's in the SCMI spec. Thus, >> creating these abstract scale power values for big.LITTLE the right >> way should result in properly working EAS. >> >> I can also have hope for GPU vs. Big power, but it is a weaker hope. >> The second is more tricky to distinguish even if you have a domain >> knowledge, but not the real measurements with you. The GPU power >> values is also a 'sensitive' knowledge to share. Open source guys can do >> that (after measurements), but some vendor's engineers probably can't. > > So basically, we don't know, right ? Well, we know - power is not in milli-Watts. The PowerCap and DTPM have this sysfs contract of providing micro-Watts. Don't we break this if we know for sure that the power values in EM are not in milli-Watts? > > At this point the different subsystems (cpufreq cooling device and dtpm) > disabled by these patches can deal with abstract scale values, like they > do today with the very approximate power numbers we have defined in the DT. > > Let's wait and see how the different SoC vendors implement the SCMI spec. > I'm not sure if that would be visible to us. The current path to feed abstract power values from SCMI cpufreq already exists and is based on protocol response. The DT is not involved in this configuration. ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2022-02-23 9:10 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-07 7:30 [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Lukasz Luba 2022-02-07 7:30 ` [PATCH 1/2] thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Lukasz Luba 2022-02-08 0:50 ` Matthias Kaehlcke 2022-02-08 9:32 ` Lukasz Luba 2022-02-08 17:25 ` Matthias Kaehlcke 2022-02-09 11:16 ` Lukasz Luba 2022-02-09 22:17 ` Matthias Kaehlcke 2022-02-16 15:35 ` Lukasz Luba 2022-02-16 17:33 ` Doug Anderson 2022-02-16 22:13 ` Matthias Kaehlcke 2022-02-16 22:43 ` Lukasz Luba 2022-02-17 0:26 ` Matthias Kaehlcke 2022-02-17 10:15 ` Daniel Lezcano 2022-02-17 9:59 ` Daniel Lezcano 2022-02-17 10:10 ` Daniel Lezcano 2022-02-17 10:47 ` Lukasz Luba 2022-02-17 11:28 ` Daniel Lezcano 2022-02-17 12:11 ` Lukasz Luba 2022-02-17 12:33 ` Daniel Lezcano 2022-02-17 12:37 ` Lukasz Luba 2022-02-17 16:46 ` Matthias Kaehlcke 2022-02-17 16:37 ` Doug Anderson 2022-02-17 17:14 ` Matthias Kaehlcke 2022-02-17 18:05 ` Lukasz Luba 2022-02-17 18:27 ` Daniel Lezcano 2022-02-16 17:21 ` Doug Anderson 2022-02-16 23:28 ` Lukasz Luba 2022-02-17 16:50 ` Doug Anderson 2022-02-17 17:58 ` Lukasz Luba 2022-02-17 18:18 ` Lukasz Luba 2022-02-22 17:05 ` Lukasz Luba 2022-02-22 18:12 ` Daniel Lezcano 2022-02-22 18:31 ` Lukasz Luba 2022-02-22 22:10 ` Daniel Lezcano 2022-02-23 9:10 ` Lukasz Luba 2022-02-07 7:30 ` [PATCH 2/2] powercap: DTPM: Check Energy Model type for power values scale Lukasz Luba 2022-02-07 10:41 ` [PATCH 0/2] Ignore Energy Model with abstract scale in IPA and DTPM Daniel Lezcano 2022-02-07 11:44 ` Lukasz Luba 2022-02-08 7:39 ` Daniel Lezcano 2022-02-08 9:47 ` Lukasz Luba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).