* [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it @ 2017-03-14 13:06 Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw) To: linux-pm; +Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval Hi, This patchset introduces a new interface for devfreq cooling in thermal framework. The previous version of the patch can be seen here [1]. I have simplified the implementation and introduced resource utilization scaling factor. The current implementation in the thermal devfreq cooling subsystem uses pre-calculated power table for each device to make a decision about allowed running state. When the driver registers itself to the thermal devfreq cooling subsystem, the framework creates the power table. The power table is then used by the thermal subsystem to keep the device in the thermal envelope. In the previous implementation the pre-calculated device's power table was scaled by current 'utilization' ('busy_time' and 'total_time' taken from devfreq 'last_status'). This idea meets the expectations of the devices which know better the actual power that they consume (thanks to power counters). When some parts/features of the device are not used the power value might be lower, while the frequency and utilization are the same. The proposed implementation provides possibility to register a driver to thermal devfreq cooling subsystem and use the driver's code during the calculation of the power in runtime. The device driver can still use pre-calculated power table when these new functions are not provided (the new extension can co-exist with old implementation). The first patch contains some refactoring for getting the voltage, the second implements the new feature, the third one changes trace function. Patchset is based on v4.11-rc2. Changes v3: - use new OPP interface (no need to lock rcu) v2 [2]: - removed 'flags' and power2state function, - split into a few patches, - simplified the logic of the new interface, - added resource utilization scaling factor, v1 [1]: - basic implementation Best Regards, Lukasz Luba [1] https://marc.info/?l=linux-pm&m=147395070729989&w=2 [2] http://marc.info/?l=linux-pm&m=148587920122854&w=2 Lukasz Luba (3): thermal: devfreq_cooling: refactor code and add get_voltage function thermal: devfreq_cooling: add new interface for direct power read trace: thermal: add another parameter *power to the tracing function drivers/thermal/devfreq_cooling.c | 117 ++++++++++++++++++++++++++++---------- include/linux/devfreq_cooling.h | 17 ++++++ include/trace/events/thermal.h | 11 ++-- 3 files changed, 110 insertions(+), 35 deletions(-) -- 2.9.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function 2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba @ 2017-03-14 13:06 ` Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba 2 siblings, 0 replies; 6+ messages in thread From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw) To: linux-pm; +Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval Move the code which gets the voltage for a given frequency. This code will be resused in few places. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/devfreq_cooling.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 7743a78..bc4c78d 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -164,6 +164,28 @@ freq_get_state(struct devfreq_cooling_device *dfc, unsigned long freq) return THERMAL_CSTATE_INVALID; } +static unsigned long get_voltage(struct devfreq *df, unsigned long freq) +{ + struct device *dev = df->dev.parent; + unsigned long voltage; + struct dev_pm_opp *opp; + + opp = dev_pm_opp_find_freq_exact(dev, freq, true); + if (IS_ERR(opp) && (PTR_ERR(opp) == -ERANGE)) + opp = dev_pm_opp_find_freq_exact(dev, freq, false); + + voltage = dev_pm_opp_get_voltage(opp) / 1000; /* mV */ + dev_pm_opp_put(opp); + + if (voltage == 0) { + dev_warn_ratelimited(dev, + "Failed to get voltage for frequency %lu: %ld\n", + freq, IS_ERR(opp) ? PTR_ERR(opp) : 0); + } + + return voltage; +} + /** * get_static_power() - calculate the static power * @dfc: Pointer to devfreq cooling device @@ -178,26 +200,15 @@ static unsigned long get_static_power(struct devfreq_cooling_device *dfc, unsigned long freq) { struct devfreq *df = dfc->devfreq; - struct device *dev = df->dev.parent; unsigned long voltage; - struct dev_pm_opp *opp; if (!dfc->power_ops->get_static_power) return 0; - opp = dev_pm_opp_find_freq_exact(dev, freq, true); - if (IS_ERR(opp) && (PTR_ERR(opp) == -ERANGE)) - opp = dev_pm_opp_find_freq_exact(dev, freq, false); - - voltage = dev_pm_opp_get_voltage(opp) / 1000; /* mV */ - dev_pm_opp_put(opp); + voltage = get_voltage(df, freq); - if (voltage == 0) { - dev_warn_ratelimited(dev, - "Failed to get voltage for frequency %lu: %ld\n", - freq, IS_ERR(opp) ? PTR_ERR(opp) : 0); + if (voltage == 0) return 0; - } return dfc->power_ops->get_static_power(df, voltage); } -- 2.9.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read 2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba @ 2017-03-14 13:06 ` Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba 2 siblings, 0 replies; 6+ messages in thread From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw) To: linux-pm; +Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval This patch introduces a new interface for device drivers connected to devfreq_cooling in the thermal framework: get_real_power(). Some devices have more sophisticated methods (like power counters) to approximate the actual power that they use. In the previous implementation we had a pre-calculated power table which was then scaled by 'utilization' ('busy_time' and 'total_time' taken from devfreq 'last_status'). With this new interface the driver can provide more precise data regarding actual power to the thermal governor every time the power budget is calculated. We then use this value and calculate the real resource utilization scaling factor. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/devfreq_cooling.c | 78 ++++++++++++++++++++++++++++++--------- include/linux/devfreq_cooling.h | 17 +++++++++ 2 files changed, 78 insertions(+), 17 deletions(-) diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index bc4c78d..4411ab8 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -28,6 +28,8 @@ #include <trace/events/thermal.h> +#define SCALE_ERROR_MITIGATION 100 + static DEFINE_IDA(devfreq_ida); /** @@ -45,6 +47,12 @@ static DEFINE_IDA(devfreq_ida); * @freq_table_size: Size of the @freq_table and @power_table * @power_ops: Pointer to devfreq_cooling_power, used to generate the * @power_table. + * @res_util: Resource utilization scaling factor for the power. + * It is multiplied by 100 to minimize the error. It is used + * for estimation of the power budget instead of using + * 'utilization' (which is 'busy_time / 'total_time'). + * The 'res_util' range is from 100 to (power_table[state] * 100) + * for the corresponding 'state'. */ struct devfreq_cooling_device { int id; @@ -55,6 +63,7 @@ struct devfreq_cooling_device { u32 *freq_table; size_t freq_table_size; struct devfreq_cooling_power *power_ops; + u32 res_util; }; /** @@ -253,27 +262,55 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd struct devfreq_dev_status *status = &df->last_status; unsigned long state; unsigned long freq = status->current_frequency; - u32 dyn_power, static_power; + unsigned long voltage; + u32 dyn_power = 0; + u32 static_power = 0; + int res; /* Get dynamic power for state */ state = freq_get_state(dfc, freq); - if (state == THERMAL_CSTATE_INVALID) - return -EAGAIN; + if (state == THERMAL_CSTATE_INVALID) { + res = -EAGAIN; + goto fail; + } - dyn_power = dfc->power_table[state]; + if (dfc->power_ops->get_real_power) { + voltage = get_voltage(df, freq); + if (voltage == 0) { + res = -EINVAL; + goto fail; + } - /* Scale dynamic power for utilization */ - dyn_power = (dyn_power * status->busy_time) / status->total_time; + res = dfc->power_ops->get_real_power(df, power, freq, voltage); + if (!res) { + dfc->res_util = dfc->power_table[state]; + dfc->res_util *= SCALE_ERROR_MITIGATION; - /* Get static power */ - static_power = get_static_power(dfc, freq); + if (*power > 1) + dfc->res_util /= *power; + } else { + goto fail; + } + } else { + dyn_power = dfc->power_table[state]; + + /* Scale dynamic power for utilization */ + dyn_power *= status->busy_time; + dyn_power /= status->total_time; + /* Get static power */ + static_power = get_static_power(dfc, freq); + + *power = dyn_power + static_power; + } trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power, static_power); - *power = dyn_power + static_power; - return 0; +fail: + /* It is safe to set max in this case */ + dfc->res_util = SCALE_ERROR_MITIGATION; + return res; } static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev, @@ -306,23 +343,30 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, unsigned long busy_time; s32 dyn_power; u32 static_power; + s32 est_power; int i; - static_power = get_static_power(dfc, freq); + if (dfc->power_ops->get_real_power) { + /* Scale for resource utilization */ + est_power = power * dfc->res_util; + est_power /= SCALE_ERROR_MITIGATION; + } else { + static_power = get_static_power(dfc, freq); - dyn_power = power - static_power; - dyn_power = dyn_power > 0 ? dyn_power : 0; + dyn_power = power - static_power; + dyn_power = dyn_power > 0 ? dyn_power : 0; - /* Scale dynamic power for utilization */ - busy_time = status->busy_time ?: 1; - dyn_power = (dyn_power * status->total_time) / busy_time; + /* Scale dynamic power for utilization */ + busy_time = status->busy_time ?: 1; + est_power = (dyn_power * status->total_time) / busy_time; + } /* * Find the first cooling state that is within the power * budget for dynamic power. */ for (i = 0; i < dfc->freq_table_size - 1; i++) - if (dyn_power >= dfc->power_table[i]) + if (est_power >= dfc->power_table[i]) break; *state = i; diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h index c35d0c0..2e07744 100644 --- a/include/linux/devfreq_cooling.h +++ b/include/linux/devfreq_cooling.h @@ -34,6 +34,21 @@ * If get_dynamic_power() is NULL, then the * dynamic power is calculated as * @dyn_power_coeff * frequency * voltage^2 + * @get_real_power: When this it is set, the framework uses it to ask the + * device driver for the actual power. + * Some devices have more sophisticated methods + * (like power counters) to approximate the actual power + * that they use. + * This function provides more accurate data to the + * thermal governor. When the driver does not provide + * such function, framework just uses pre-calculated + * table and scale the power by 'utilization' + * (based on 'busy_time' and 'total_time' taken from + * devfreq 'last_status'). + * The value returned by this function must be lower + * or equal than the maximum power value + * for the current state + * (which can be found in power_table[state]). */ struct devfreq_cooling_power { unsigned long (*get_static_power)(struct devfreq *devfreq, @@ -41,6 +56,8 @@ struct devfreq_cooling_power { unsigned long (*get_dynamic_power)(struct devfreq *devfreq, unsigned long freq, unsigned long voltage); + int (*get_real_power)(struct devfreq *df, u32 *power, + unsigned long freq, unsigned long voltage); unsigned long dyn_power_coeff; }; -- 2.9.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function 2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba @ 2017-03-14 13:06 ` Lukasz Luba 2017-03-14 13:44 ` Steven Rostedt 2 siblings, 1 reply; 6+ messages in thread From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw) To: linux-pm Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval, Steven Rostedt, Ingo Molnar This patch adds another parameter to the trace function: trace_thermal_power_devfreq_get_power(). In case when we call directly driver's code for the real power, we do not have static/dynamic_power values. Instead we get total power in the '*power' value. The 'static_power' and 'dynamic_power' are set to 0. Therefore, we have to trace that '*power' value in this scenario. CC: Steven Rostedt <rostedt@goodmis.org> CC: Ingo Molnar <mingo@redhat.com> CC: Zhang Rui <rui.zhang@intel.com> CC: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/devfreq_cooling.c | 2 +- include/trace/events/thermal.h | 11 +++++++---- 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 4411ab8..ca3ebe3 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -304,7 +304,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd } trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power, - static_power); + static_power, *power); return 0; fail: diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h index 2b4a8ff..6cde5b3 100644 --- a/include/trace/events/thermal.h +++ b/include/trace/events/thermal.h @@ -151,9 +151,9 @@ TRACE_EVENT(thermal_power_cpu_limit, TRACE_EVENT(thermal_power_devfreq_get_power, TP_PROTO(struct thermal_cooling_device *cdev, struct devfreq_dev_status *status, unsigned long freq, - u32 dynamic_power, u32 static_power), + u32 dynamic_power, u32 static_power, u32 power), - TP_ARGS(cdev, status, freq, dynamic_power, static_power), + TP_ARGS(cdev, status, freq, dynamic_power, static_power, power), TP_STRUCT__entry( __string(type, cdev->type ) @@ -161,6 +161,7 @@ TRACE_EVENT(thermal_power_devfreq_get_power, __field(u32, load ) __field(u32, dynamic_power ) __field(u32, static_power ) + __field(u32, power) ), TP_fast_assign( @@ -169,11 +170,13 @@ TRACE_EVENT(thermal_power_devfreq_get_power, __entry->load = (100 * status->busy_time) / status->total_time; __entry->dynamic_power = dynamic_power; __entry->static_power = static_power; + __entry->power = power; ), - TP_printk("type=%s freq=%lu load=%u dynamic_power=%u static_power=%u", + TP_printk("type=%s freq=%lu load=%u dynamic_power=%u static_power=%u power=%u", __get_str(type), __entry->freq, - __entry->load, __entry->dynamic_power, __entry->static_power) + __entry->load, __entry->dynamic_power, __entry->static_power, + __entry->power) ); TRACE_EVENT(thermal_power_devfreq_limit, -- 2.9.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function 2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba @ 2017-03-14 13:44 ` Steven Rostedt 0 siblings, 0 replies; 6+ messages in thread From: Steven Rostedt @ 2017-03-14 13:44 UTC (permalink / raw) To: Lukasz Luba Cc: linux-pm, chris.diamand, javi.merino, rui.zhang, edubezval, Ingo Molnar On Tue, 14 Mar 2017 13:06:16 +0000 Lukasz Luba <lukasz.luba@arm.com> wrote: > This patch adds another parameter to the trace function: > trace_thermal_power_devfreq_get_power(). > > In case when we call directly driver's code for the real power, > we do not have static/dynamic_power values. Instead we get total > power in the '*power' value. The 'static_power' and > 'dynamic_power' are set to 0. > > Therefore, we have to trace that '*power' value in this scenario. > > CC: Steven Rostedt <rostedt@goodmis.org> > CC: Ingo Molnar <mingo@redhat.com> > CC: Zhang Rui <rui.zhang@intel.com> > CC: Eduardo Valentin <edubezval@gmail.com> > Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> > --- > drivers/thermal/devfreq_cooling.c | 2 +- > include/trace/events/thermal.h | 11 +++++++---- > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c > index 4411ab8..ca3ebe3 100644 > --- a/drivers/thermal/devfreq_cooling.c > +++ b/drivers/thermal/devfreq_cooling.c > @@ -304,7 +304,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd > } > > trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power, > - static_power); > + static_power, *power); I'm curious. Can you disassemble the above, and take a look at how it handles that dereference? I may make sense to pass in the pointer and do the dereference in the TP_fast_assign(): __entry->power = *power; I like to make the call site as light as possible and do as much work in the tracepoint code that is feasible. This helps keep tracepoints light weight when tracing is off. -- Steve > > return 0; ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 0/3] devfreq_cooling: let the driver supply the @ 2017-05-04 11:34 Lukasz Luba 2017-05-04 11:34 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba 0 siblings, 1 reply; 6+ messages in thread From: Lukasz Luba @ 2017-05-04 11:34 UTC (permalink / raw) To: linux-pm; +Cc: rui.zhang, edubezval, javi.merino, chris.diamand, lukasz.luba Hi, This patchset introduces a new interface for devfreq cooling in thermal framework. The first version of the patch can be seen here [1], second [2]. I have simplified the implementation and introduced resource utilization scaling factor. The current implementation in the thermal devfreq cooling subsystem uses pre-calculated power table for each device to make a decision about allowed running state. When the driver registers itself to the thermal devfreq cooling subsystem, the framework creates the power table. The power table is then used by the thermal subsystem to keep the device in the thermal envelope. In the previous implementation the pre-calculated device's power table was scaled by current 'utilization' ('busy_time' and 'total_time' taken from devfreq 'last_status'). This idea meets the expectations of the devices which know better the actual power that they consume (thanks to power counters). When some parts/features of the device are not used the power value might be lower, while the frequency and utilization are the same. The proposed implementation provides possibility to register a driver to thermal devfreq cooling subsystem and use the driver's code during the calculation of the power in runtime. The device driver can still use pre-calculated power table when these new functions are not provided (the new extension can co-exist with old implementation). The first patch contains some refactoring for getting the voltage, the second implements the new feature, the third one changes trace function. Patchset is based on v4.11-rc8. Changes v3: - refactor OPP code to fit into new dev_pm_opp API v2: - removed 'flags' and power2state function, - split into a few patches, - simplified the logic of the new interface, - added resource utilization scaling factor, Regards, Lukasz Luba [1] https://marc.info/?l=linux-pm&m=147395070729989&w=2 [2] http://marc.info/?l=linux-pm&m=148587920122854&w=2 Lukasz Luba (3): thermal: devfreq_cooling: refactor code and add get_voltage function thermal: devfreq_cooling: add new interface for direct power read trace: thermal: add another parameter 'power' to the tracing function drivers/thermal/devfreq_cooling.c | 152 ++++++++++++++++++++++++++++---------- include/linux/devfreq_cooling.h | 19 +++++ include/trace/events/thermal.h | 11 ++- 3 files changed, 137 insertions(+), 45 deletions(-) -- 1.9.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read 2017-05-04 11:34 [PATCH v3 0/3] devfreq_cooling: let the driver supply the Lukasz Luba @ 2017-05-04 11:34 ` Lukasz Luba 0 siblings, 0 replies; 6+ messages in thread From: Lukasz Luba @ 2017-05-04 11:34 UTC (permalink / raw) To: linux-pm; +Cc: rui.zhang, edubezval, javi.merino, chris.diamand, lukasz.luba This patch introduces a new interface for device drivers connected to devfreq_cooling in the thermal framework: get_real_power(). Some devices have more sophisticated methods (like power counters) to approximate the actual power that they use. In the previous implementation we had a pre-calculated power table which was then scaled by 'utilization' ('busy_time' and 'total_time' taken from devfreq 'last_status'). With this new interface the driver can provide more precise data regarding actual power to the thermal governor every time the power budget is calculated. We then use this value and calculate the real resource utilization scaling factor. Reviewed-by: Chris Diamand <chris.diamand@arm.com> Acked-by: Javi Merino <javi.merino@kernel.org> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> --- drivers/thermal/devfreq_cooling.c | 105 +++++++++++++++++++++++++++++--------- include/linux/devfreq_cooling.h | 19 +++++++ 2 files changed, 101 insertions(+), 23 deletions(-) diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index af9d328..26c3157 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -28,6 +28,8 @@ #include <trace/events/thermal.h> +#define SCALE_ERROR_MITIGATION 100 + static DEFINE_IDA(devfreq_ida); /** @@ -45,6 +47,12 @@ * @freq_table_size: Size of the @freq_table and @power_table * @power_ops: Pointer to devfreq_cooling_power, used to generate the * @power_table. + * @res_util: Resource utilization scaling factor for the power. + * It is multiplied by 100 to minimize the error. It is used + * for estimation of the power budget instead of using + * 'utilization' (which is 'busy_time / 'total_time'). + * The 'res_util' range is from 100 to (power_table[state] * 100) + * for the corresponding 'state'. */ struct devfreq_cooling_device { int id; @@ -55,6 +63,8 @@ struct devfreq_cooling_device { u32 *freq_table; size_t freq_table_size; struct devfreq_cooling_power *power_ops; + u32 res_util; + int capped_state; }; /** @@ -250,6 +260,16 @@ static unsigned long get_voltage(struct devfreq *df, unsigned long freq) return power; } + +static inline unsigned long get_total_power(struct devfreq_cooling_device *dfc, + unsigned long freq, + unsigned long voltage) +{ + return get_static_power(dfc, freq) + get_dynamic_power(dfc, freq, + voltage); +} + + static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cdev, struct thermal_zone_device *tz, u32 *power) @@ -259,27 +279,55 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd struct devfreq_dev_status *status = &df->last_status; unsigned long state; unsigned long freq = status->current_frequency; - u32 dyn_power, static_power; + unsigned long voltage; + u32 dyn_power = 0; + u32 static_power = 0; + int res; - /* Get dynamic power for state */ state = freq_get_state(dfc, freq); - if (state == THERMAL_CSTATE_INVALID) - return -EAGAIN; + if (state == THERMAL_CSTATE_INVALID) { + res = -EAGAIN; + goto fail; + } - dyn_power = dfc->power_table[state]; + if (dfc->power_ops->get_real_power) { + voltage = get_voltage(df, freq); + if (voltage == 0) { + res = -EINVAL; + goto fail; + } - /* Scale dynamic power for utilization */ - dyn_power = (dyn_power * status->busy_time) / status->total_time; + res = dfc->power_ops->get_real_power(df, power, freq, voltage); + if (!res) { + state = dfc->capped_state; + dfc->res_util = dfc->power_table[state]; + dfc->res_util *= SCALE_ERROR_MITIGATION; - /* Get static power */ - static_power = get_static_power(dfc, freq); + if (*power > 1) + dfc->res_util /= *power; + } else { + goto fail; + } + } else { + dyn_power = dfc->power_table[state]; + + /* Scale dynamic power for utilization */ + dyn_power *= status->busy_time; + dyn_power /= status->total_time; + /* Get static power */ + static_power = get_static_power(dfc, freq); + + *power = dyn_power + static_power; + } trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power, static_power); - *power = dyn_power + static_power; - return 0; +fail: + /* It is safe to set max in this case */ + dfc->res_util = SCALE_ERROR_MITIGATION; + return res; } static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev, @@ -312,26 +360,34 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, unsigned long busy_time; s32 dyn_power; u32 static_power; + s32 est_power; int i; - static_power = get_static_power(dfc, freq); + if (dfc->power_ops->get_real_power) { + /* Scale for resource utilization */ + est_power = power * dfc->res_util; + est_power /= SCALE_ERROR_MITIGATION; + } else { + static_power = get_static_power(dfc, freq); - dyn_power = power - static_power; - dyn_power = dyn_power > 0 ? dyn_power : 0; + dyn_power = power - static_power; + dyn_power = dyn_power > 0 ? dyn_power : 0; - /* Scale dynamic power for utilization */ - busy_time = status->busy_time ?: 1; - dyn_power = (dyn_power * status->total_time) / busy_time; + /* Scale dynamic power for utilization */ + busy_time = status->busy_time ?: 1; + est_power = (dyn_power * status->total_time) / busy_time; + } /* * Find the first cooling state that is within the power * budget for dynamic power. */ for (i = 0; i < dfc->freq_table_size - 1; i++) - if (dyn_power >= dfc->power_table[i]) + if (est_power >= dfc->power_table[i]) break; *state = i; + dfc->capped_state = i; trace_thermal_power_devfreq_limit(cdev, freq, *state, power); return 0; } @@ -387,7 +443,7 @@ static int devfreq_cooling_gen_tables(struct devfreq_cooling_device *dfc) } for (i = 0, freq = ULONG_MAX; i < num_opps; i++, freq--) { - unsigned long power_dyn, voltage; + unsigned long power, voltage; struct dev_pm_opp *opp; opp = dev_pm_opp_find_freq_floor(dev, &freq); @@ -400,12 +456,15 @@ static int devfreq_cooling_gen_tables(struct devfreq_cooling_device *dfc) dev_pm_opp_put(opp); if (dfc->power_ops) { - power_dyn = get_dynamic_power(dfc, freq, voltage); + if (dfc->power_ops->get_real_power) + power = get_total_power(dfc, freq, voltage); + else + power = get_dynamic_power(dfc, freq, voltage); - dev_dbg(dev, "Dynamic power table: %lu MHz @ %lu mV: %lu = %lu mW\n", - freq / 1000000, voltage, power_dyn, power_dyn); + dev_dbg(dev, "Power table: %lu MHz @ %lu mV: %lu = %lu mW\n", + freq / 1000000, voltage, power, power); - power_table[i] = power_dyn; + power_table[i] = power; } freq_table[i] = freq; diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h index c35d0c0..4635f95 100644 --- a/include/linux/devfreq_cooling.h +++ b/include/linux/devfreq_cooling.h @@ -34,6 +34,23 @@ * If get_dynamic_power() is NULL, then the * dynamic power is calculated as * @dyn_power_coeff * frequency * voltage^2 + * @get_real_power: When this is set, the framework uses it to ask the + * device driver for the actual power. + * Some devices have more sophisticated methods + * (like power counters) to approximate the actual power + * that they use. + * This function provides more accurate data to the + * thermal governor. When the driver does not provide + * such function, framework just uses pre-calculated + * table and scale the power by 'utilization' + * (based on 'busy_time' and 'total_time' taken from + * devfreq 'last_status'). + * The value returned by this function must be lower + * or equal than the maximum power value + * for the current state + * (which can be found in power_table[state]). + * When this interface is used, the power_table holds + * max total (static + dynamic) power value for each OPP. */ struct devfreq_cooling_power { unsigned long (*get_static_power)(struct devfreq *devfreq, @@ -41,6 +58,8 @@ struct devfreq_cooling_power { unsigned long (*get_dynamic_power)(struct devfreq *devfreq, unsigned long freq, unsigned long voltage); + int (*get_real_power)(struct devfreq *df, u32 *power, + unsigned long freq, unsigned long voltage); unsigned long dyn_power_coeff; }; -- 1.9.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-05-04 11:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba 2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba 2017-03-14 13:44 ` Steven Rostedt 2017-05-04 11:34 [PATCH v3 0/3] devfreq_cooling: let the driver supply the Lukasz Luba 2017-05-04 11:34 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.