* [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function
2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba
@ 2017-03-14 13:06 ` Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba
2 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw)
To: linux-pm; +Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval
Move the code which gets the voltage for a given frequency.
This code will be resused in few places.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
drivers/thermal/devfreq_cooling.c | 37 ++++++++++++++++++++++++-------------
1 file changed, 24 insertions(+), 13 deletions(-)
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 7743a78..bc4c78d 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -164,6 +164,28 @@ freq_get_state(struct devfreq_cooling_device *dfc, unsigned long freq)
return THERMAL_CSTATE_INVALID;
}
+static unsigned long get_voltage(struct devfreq *df, unsigned long freq)
+{
+ struct device *dev = df->dev.parent;
+ unsigned long voltage;
+ struct dev_pm_opp *opp;
+
+ opp = dev_pm_opp_find_freq_exact(dev, freq, true);
+ if (IS_ERR(opp) && (PTR_ERR(opp) == -ERANGE))
+ opp = dev_pm_opp_find_freq_exact(dev, freq, false);
+
+ voltage = dev_pm_opp_get_voltage(opp) / 1000; /* mV */
+ dev_pm_opp_put(opp);
+
+ if (voltage == 0) {
+ dev_warn_ratelimited(dev,
+ "Failed to get voltage for frequency %lu: %ld\n",
+ freq, IS_ERR(opp) ? PTR_ERR(opp) : 0);
+ }
+
+ return voltage;
+}
+
/**
* get_static_power() - calculate the static power
* @dfc: Pointer to devfreq cooling device
@@ -178,26 +200,15 @@ static unsigned long
get_static_power(struct devfreq_cooling_device *dfc, unsigned long freq)
{
struct devfreq *df = dfc->devfreq;
- struct device *dev = df->dev.parent;
unsigned long voltage;
- struct dev_pm_opp *opp;
if (!dfc->power_ops->get_static_power)
return 0;
- opp = dev_pm_opp_find_freq_exact(dev, freq, true);
- if (IS_ERR(opp) && (PTR_ERR(opp) == -ERANGE))
- opp = dev_pm_opp_find_freq_exact(dev, freq, false);
-
- voltage = dev_pm_opp_get_voltage(opp) / 1000; /* mV */
- dev_pm_opp_put(opp);
+ voltage = get_voltage(df, freq);
- if (voltage == 0) {
- dev_warn_ratelimited(dev,
- "Failed to get voltage for frequency %lu: %ld\n",
- freq, IS_ERR(opp) ? PTR_ERR(opp) : 0);
+ if (voltage == 0)
return 0;
- }
return dfc->power_ops->get_static_power(df, voltage);
}
--
2.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read
2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba
@ 2017-03-14 13:06 ` Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba
2 siblings, 0 replies; 6+ messages in thread
From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw)
To: linux-pm; +Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval
This patch introduces a new interface for device drivers connected to
devfreq_cooling in the thermal framework: get_real_power().
Some devices have more sophisticated methods (like power counters)
to approximate the actual power that they use.
In the previous implementation we had a pre-calculated power
table which was then scaled by 'utilization'
('busy_time' and 'total_time' taken from devfreq 'last_status').
With this new interface the driver can provide more precise data
regarding actual power to the thermal governor every time the power
budget is calculated. We then use this value and calculate the real
resource utilization scaling factor.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
drivers/thermal/devfreq_cooling.c | 78 ++++++++++++++++++++++++++++++---------
include/linux/devfreq_cooling.h | 17 +++++++++
2 files changed, 78 insertions(+), 17 deletions(-)
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index bc4c78d..4411ab8 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -28,6 +28,8 @@
#include <trace/events/thermal.h>
+#define SCALE_ERROR_MITIGATION 100
+
static DEFINE_IDA(devfreq_ida);
/**
@@ -45,6 +47,12 @@ static DEFINE_IDA(devfreq_ida);
* @freq_table_size: Size of the @freq_table and @power_table
* @power_ops: Pointer to devfreq_cooling_power, used to generate the
* @power_table.
+ * @res_util: Resource utilization scaling factor for the power.
+ * It is multiplied by 100 to minimize the error. It is used
+ * for estimation of the power budget instead of using
+ * 'utilization' (which is 'busy_time / 'total_time').
+ * The 'res_util' range is from 100 to (power_table[state] * 100)
+ * for the corresponding 'state'.
*/
struct devfreq_cooling_device {
int id;
@@ -55,6 +63,7 @@ struct devfreq_cooling_device {
u32 *freq_table;
size_t freq_table_size;
struct devfreq_cooling_power *power_ops;
+ u32 res_util;
};
/**
@@ -253,27 +262,55 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
struct devfreq_dev_status *status = &df->last_status;
unsigned long state;
unsigned long freq = status->current_frequency;
- u32 dyn_power, static_power;
+ unsigned long voltage;
+ u32 dyn_power = 0;
+ u32 static_power = 0;
+ int res;
/* Get dynamic power for state */
state = freq_get_state(dfc, freq);
- if (state == THERMAL_CSTATE_INVALID)
- return -EAGAIN;
+ if (state == THERMAL_CSTATE_INVALID) {
+ res = -EAGAIN;
+ goto fail;
+ }
- dyn_power = dfc->power_table[state];
+ if (dfc->power_ops->get_real_power) {
+ voltage = get_voltage(df, freq);
+ if (voltage == 0) {
+ res = -EINVAL;
+ goto fail;
+ }
- /* Scale dynamic power for utilization */
- dyn_power = (dyn_power * status->busy_time) / status->total_time;
+ res = dfc->power_ops->get_real_power(df, power, freq, voltage);
+ if (!res) {
+ dfc->res_util = dfc->power_table[state];
+ dfc->res_util *= SCALE_ERROR_MITIGATION;
- /* Get static power */
- static_power = get_static_power(dfc, freq);
+ if (*power > 1)
+ dfc->res_util /= *power;
+ } else {
+ goto fail;
+ }
+ } else {
+ dyn_power = dfc->power_table[state];
+
+ /* Scale dynamic power for utilization */
+ dyn_power *= status->busy_time;
+ dyn_power /= status->total_time;
+ /* Get static power */
+ static_power = get_static_power(dfc, freq);
+
+ *power = dyn_power + static_power;
+ }
trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power,
static_power);
- *power = dyn_power + static_power;
-
return 0;
+fail:
+ /* It is safe to set max in this case */
+ dfc->res_util = SCALE_ERROR_MITIGATION;
+ return res;
}
static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev,
@@ -306,23 +343,30 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev,
unsigned long busy_time;
s32 dyn_power;
u32 static_power;
+ s32 est_power;
int i;
- static_power = get_static_power(dfc, freq);
+ if (dfc->power_ops->get_real_power) {
+ /* Scale for resource utilization */
+ est_power = power * dfc->res_util;
+ est_power /= SCALE_ERROR_MITIGATION;
+ } else {
+ static_power = get_static_power(dfc, freq);
- dyn_power = power - static_power;
- dyn_power = dyn_power > 0 ? dyn_power : 0;
+ dyn_power = power - static_power;
+ dyn_power = dyn_power > 0 ? dyn_power : 0;
- /* Scale dynamic power for utilization */
- busy_time = status->busy_time ?: 1;
- dyn_power = (dyn_power * status->total_time) / busy_time;
+ /* Scale dynamic power for utilization */
+ busy_time = status->busy_time ?: 1;
+ est_power = (dyn_power * status->total_time) / busy_time;
+ }
/*
* Find the first cooling state that is within the power
* budget for dynamic power.
*/
for (i = 0; i < dfc->freq_table_size - 1; i++)
- if (dyn_power >= dfc->power_table[i])
+ if (est_power >= dfc->power_table[i])
break;
*state = i;
diff --git a/include/linux/devfreq_cooling.h b/include/linux/devfreq_cooling.h
index c35d0c0..2e07744 100644
--- a/include/linux/devfreq_cooling.h
+++ b/include/linux/devfreq_cooling.h
@@ -34,6 +34,21 @@
* If get_dynamic_power() is NULL, then the
* dynamic power is calculated as
* @dyn_power_coeff * frequency * voltage^2
+ * @get_real_power: When this it is set, the framework uses it to ask the
+ * device driver for the actual power.
+ * Some devices have more sophisticated methods
+ * (like power counters) to approximate the actual power
+ * that they use.
+ * This function provides more accurate data to the
+ * thermal governor. When the driver does not provide
+ * such function, framework just uses pre-calculated
+ * table and scale the power by 'utilization'
+ * (based on 'busy_time' and 'total_time' taken from
+ * devfreq 'last_status').
+ * The value returned by this function must be lower
+ * or equal than the maximum power value
+ * for the current state
+ * (which can be found in power_table[state]).
*/
struct devfreq_cooling_power {
unsigned long (*get_static_power)(struct devfreq *devfreq,
@@ -41,6 +56,8 @@ struct devfreq_cooling_power {
unsigned long (*get_dynamic_power)(struct devfreq *devfreq,
unsigned long freq,
unsigned long voltage);
+ int (*get_real_power)(struct devfreq *df, u32 *power,
+ unsigned long freq, unsigned long voltage);
unsigned long dyn_power_coeff;
};
--
2.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function
2017-03-14 13:06 [PATCH v3] devfreq_cooling: let the driver supply the real power every time we need it Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 1/3] thermal: devfreq_cooling: refactor code and add get_voltage function Lukasz Luba
2017-03-14 13:06 ` [PATCH v3 2/3] thermal: devfreq_cooling: add new interface for direct power read Lukasz Luba
@ 2017-03-14 13:06 ` Lukasz Luba
2017-03-14 13:44 ` Steven Rostedt
2 siblings, 1 reply; 6+ messages in thread
From: Lukasz Luba @ 2017-03-14 13:06 UTC (permalink / raw)
To: linux-pm
Cc: chris.diamand, lukasz.luba, javi.merino, rui.zhang, edubezval,
Steven Rostedt, Ingo Molnar
This patch adds another parameter to the trace function:
trace_thermal_power_devfreq_get_power().
In case when we call directly driver's code for the real power,
we do not have static/dynamic_power values. Instead we get total
power in the '*power' value. The 'static_power' and
'dynamic_power' are set to 0.
Therefore, we have to trace that '*power' value in this scenario.
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
drivers/thermal/devfreq_cooling.c | 2 +-
include/trace/events/thermal.h | 11 +++++++----
2 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index 4411ab8..ca3ebe3 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -304,7 +304,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
}
trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power,
- static_power);
+ static_power, *power);
return 0;
fail:
diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
index 2b4a8ff..6cde5b3 100644
--- a/include/trace/events/thermal.h
+++ b/include/trace/events/thermal.h
@@ -151,9 +151,9 @@ TRACE_EVENT(thermal_power_cpu_limit,
TRACE_EVENT(thermal_power_devfreq_get_power,
TP_PROTO(struct thermal_cooling_device *cdev,
struct devfreq_dev_status *status, unsigned long freq,
- u32 dynamic_power, u32 static_power),
+ u32 dynamic_power, u32 static_power, u32 power),
- TP_ARGS(cdev, status, freq, dynamic_power, static_power),
+ TP_ARGS(cdev, status, freq, dynamic_power, static_power, power),
TP_STRUCT__entry(
__string(type, cdev->type )
@@ -161,6 +161,7 @@ TRACE_EVENT(thermal_power_devfreq_get_power,
__field(u32, load )
__field(u32, dynamic_power )
__field(u32, static_power )
+ __field(u32, power)
),
TP_fast_assign(
@@ -169,11 +170,13 @@ TRACE_EVENT(thermal_power_devfreq_get_power,
__entry->load = (100 * status->busy_time) / status->total_time;
__entry->dynamic_power = dynamic_power;
__entry->static_power = static_power;
+ __entry->power = power;
),
- TP_printk("type=%s freq=%lu load=%u dynamic_power=%u static_power=%u",
+ TP_printk("type=%s freq=%lu load=%u dynamic_power=%u static_power=%u power=%u",
__get_str(type), __entry->freq,
- __entry->load, __entry->dynamic_power, __entry->static_power)
+ __entry->load, __entry->dynamic_power, __entry->static_power,
+ __entry->power)
);
TRACE_EVENT(thermal_power_devfreq_limit,
--
2.9.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function
2017-03-14 13:06 ` [PATCH v3 3/3] trace: thermal: add another parameter *power to the tracing function Lukasz Luba
@ 2017-03-14 13:44 ` Steven Rostedt
0 siblings, 0 replies; 6+ messages in thread
From: Steven Rostedt @ 2017-03-14 13:44 UTC (permalink / raw)
To: Lukasz Luba
Cc: linux-pm, chris.diamand, javi.merino, rui.zhang, edubezval, Ingo Molnar
On Tue, 14 Mar 2017 13:06:16 +0000
Lukasz Luba <lukasz.luba@arm.com> wrote:
> This patch adds another parameter to the trace function:
> trace_thermal_power_devfreq_get_power().
>
> In case when we call directly driver's code for the real power,
> we do not have static/dynamic_power values. Instead we get total
> power in the '*power' value. The 'static_power' and
> 'dynamic_power' are set to 0.
>
> Therefore, we have to trace that '*power' value in this scenario.
>
> CC: Steven Rostedt <rostedt@goodmis.org>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Zhang Rui <rui.zhang@intel.com>
> CC: Eduardo Valentin <edubezval@gmail.com>
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
> drivers/thermal/devfreq_cooling.c | 2 +-
> include/trace/events/thermal.h | 11 +++++++----
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
> index 4411ab8..ca3ebe3 100644
> --- a/drivers/thermal/devfreq_cooling.c
> +++ b/drivers/thermal/devfreq_cooling.c
> @@ -304,7 +304,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
> }
>
> trace_thermal_power_devfreq_get_power(cdev, status, freq, dyn_power,
> - static_power);
> + static_power, *power);
I'm curious. Can you disassemble the above, and take a look at how it
handles that dereference?
I may make sense to pass in the pointer and do the dereference in the
TP_fast_assign():
__entry->power = *power;
I like to make the call site as light as possible and do as much work
in the tracepoint code that is feasible. This helps keep tracepoints
light weight when tracing is off.
-- Steve
>
> return 0;
^ permalink raw reply [flat|nested] 6+ messages in thread