* [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 13:55 ` Vincent Guittot
2019-10-14 0:58 ` [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous " Thara Gopinath
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Extrapolating on the exisitng framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressue. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a cpu running a rt/dl task through util_avg,
the average thermal pressure is tracked through load_avg.
In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_avg can be called
to do the periodic bookeeping (accumulate, decay and average)
of the thermal pressure.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
kernel/sched/pelt.c | 13 +++++++++++++
kernel/sched/pelt.h | 7 +++++++
kernel/sched/sched.h | 1 +
3 files changed, 21 insertions(+)
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index a96db50..f06aae3 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -353,6 +353,19 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
return 0;
}
+int update_thermal_avg(u64 now, struct rq *rq, u64 capacity)
+{
+ if (___update_load_sum(now, &rq->avg_thermal,
+ capacity,
+ capacity,
+ capacity)) {
+ ___update_load_avg(&rq->avg_thermal, 1, 1);
+ return 1;
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
/*
* irq:
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index afff644..01c5436 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -6,6 +6,7 @@ int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se
int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
+int update_thermal_avg(u64 now, struct rq *rq, u64 capacity);
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
int update_irq_load_avg(struct rq *rq, u64 running);
@@ -175,6 +176,12 @@ update_rq_clock_pelt(struct rq *rq, s64 delta) { }
static inline void
update_idle_rq_clock_pelt(struct rq *rq) { }
+static inline int
+update_thermal_avg(u64 now, struct rq *rq, u64 capacity)
+{
+ return 0;
+}
+
#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0db2c1b..d5d82c8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -944,6 +944,7 @@ struct rq {
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
struct sched_avg avg_irq;
#endif
+ struct sched_avg avg_thermal;
u64 idle_stamp;
u64 avg_idle;
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure
2019-10-14 0:58 ` [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure Thara Gopinath
@ 2019-10-14 13:55 ` Vincent Guittot
2019-10-16 21:15 ` Thara Gopinath
0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2019-10-14 13:55 UTC (permalink / raw)
To: Thara Gopinath
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Thara,
On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>
> Extrapolating on the exisitng framework to track rt/dl utilization using
s/exisitng/existing/
> pelt signals, add a similar mechanism to track thermal pressue. The
s/pessure/pressure/
> difference here from rt/dl utilization tracking is that, instead of
> tracking time spent by a cpu running a rt/dl task through util_avg,
> the average thermal pressure is tracked through load_avg.
It would be good to mention why you use load_avg field instead of
util_avg field: because the signal is weighted with the capped
capacity and is not binary
And also explained a bit what capacity refer to
> In order to track average thermal pressure, a new sched_avg variable
> avg_thermal is introduced. Function update_thermal_avg can be called
> to do the periodic bookeeping (accumulate, decay and average)
> of the thermal pressure.
>
> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> ---
> kernel/sched/pelt.c | 13 +++++++++++++
> kernel/sched/pelt.h | 7 +++++++
> kernel/sched/sched.h | 1 +
> 3 files changed, 21 insertions(+)
>
> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
> index a96db50..f06aae3 100644
> --- a/kernel/sched/pelt.c
> +++ b/kernel/sched/pelt.c
> @@ -353,6 +353,19 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
> return 0;
> }
>
> +int update_thermal_avg(u64 now, struct rq *rq, u64 capacity)
All the other functions are named :
update_cfs_rq/rt_rq/dl_rq/irq_load_avg
might be good to keep similar name with update_thermal_load_avg
> +{
> + if (___update_load_sum(now, &rq->avg_thermal,
> + capacity,
> + capacity,
> + capacity)) {
> + ___update_load_avg(&rq->avg_thermal, 1, 1);
> + return 1;
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
> /*
> * irq:
> diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
> index afff644..01c5436 100644
> --- a/kernel/sched/pelt.h
> +++ b/kernel/sched/pelt.h
> @@ -6,6 +6,7 @@ int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se
> int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
> int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
> int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
> +int update_thermal_avg(u64 now, struct rq *rq, u64 capacity);
>
> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
> int update_irq_load_avg(struct rq *rq, u64 running);
> @@ -175,6 +176,12 @@ update_rq_clock_pelt(struct rq *rq, s64 delta) { }
> static inline void
> update_idle_rq_clock_pelt(struct rq *rq) { }
>
> +static inline int
can you keep some function ordering as above and move
update_thermal_avg() just after update_dl_rq_load_avg
> +update_thermal_avg(u64 now, struct rq *rq, u64 capacity)
> +{
> + return 0;
> +}
> +
> #endif
>
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 0db2c1b..d5d82c8 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -944,6 +944,7 @@ struct rq {
> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
> struct sched_avg avg_irq;
> #endif
> + struct sched_avg avg_thermal;
> u64 idle_stamp;
> u64 avg_idle;
>
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure
2019-10-14 13:55 ` Vincent Guittot
@ 2019-10-16 21:15 ` Thara Gopinath
0 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-16 21:15 UTC (permalink / raw)
To: Vincent Guittot
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Vincent,
Thanks for the review.
On 10/14/2019 09:55 AM, Vincent Guittot wrote:
> Hi Thara,
>
> On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>>
>> Extrapolating on the exisitng framework to track rt/dl utilization using
>
> s/exisitng/existing/
>
>> pelt signals, add a similar mechanism to track thermal pressue. The
>
> s/pessure/pressure/
Will fix all the typo.
>
>> difference here from rt/dl utilization tracking is that, instead of
>> tracking time spent by a cpu running a rt/dl task through util_avg,
>> the average thermal pressure is tracked through load_avg.
>
> It would be good to mention why you use load_avg field instead of
> util_avg field: because the signal is weighted with the capped
> capacity and is not binary
> And also explained a bit what capacity refer to
>
>> In order to track average thermal pressure, a new sched_avg variable
>> avg_thermal is introduced. Function update_thermal_avg can be called
>> to do the periodic bookeeping (accumulate, decay and average)
>> of the thermal pressure.
>>
>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
>> ---
>> kernel/sched/pelt.c | 13 +++++++++++++
>> kernel/sched/pelt.h | 7 +++++++
>> kernel/sched/sched.h | 1 +
>> 3 files changed, 21 insertions(+)
>>
>> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
>> index a96db50..f06aae3 100644
>> --- a/kernel/sched/pelt.c
>> +++ b/kernel/sched/pelt.c
>> @@ -353,6 +353,19 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
>> return 0;
>> }
>>
>> +int update_thermal_avg(u64 now, struct rq *rq, u64 capacity)
>
> All the other functions are named :
> update_cfs_rq/rt_rq/dl_rq/irq_load_avg
>
> might be good to keep similar name with update_thermal_load_avg
Sure. Will rename the function.
>
>> +{
>> + if (___update_load_sum(now, &rq->avg_thermal,
>> + capacity,
>> + capacity,
>> + capacity)) {
>> + ___update_load_avg(&rq->avg_thermal, 1, 1);
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
>> /*
>> * irq:
>> diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
>> index afff644..01c5436 100644
>> --- a/kernel/sched/pelt.h
>> +++ b/kernel/sched/pelt.h
>> @@ -6,6 +6,7 @@ int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se
>> int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
>> int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
>> int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
>> +int update_thermal_avg(u64 now, struct rq *rq, u64 capacity);
>>
>> #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
>> int update_irq_load_avg(struct rq *rq, u64 running);
>> @@ -175,6 +176,12 @@ update_rq_clock_pelt(struct rq *rq, s64 delta) { }
>> static inline void
>> update_idle_rq_clock_pelt(struct rq *rq) { }
>>
>> +static inline int
>
> can you keep some function ordering as above and move
> update_thermal_avg() just after update_dl_rq_load_avg
will do.
Warm Regards
Thara
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
2019-10-14 0:58 ` [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 15:50 ` Vincent Guittot
2019-10-14 0:58 ` [Patch v3 3/7] sched: Initialize per cpu thermal pressure structure Thara Gopinath
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Add thermal.c and thermal.h files that provides interface
APIs to initialize, update/average, track, accumulate and decay
thermal pressure per cpu basis. A per cpu structure max_capacity_info is
introduced to keep track of instantaneous per cpu thermal pressure.
Thermal pressure is the delta between max_capacity and cap_capacity.
API update_periodic_maxcap is called for periodic accumulate and decay
of the thermal pressure. It is to to be called from a periodic tick
function. This API calculates the delta between max_capacity and
cap_capacity and passes on the delta to update_thermal_avg to do the
necessary accumulate, decay and average. API update_maxcap_capacity is for
the system to update the thermal pressure by updating cap_capacity.
Considering, update_periodic_maxcap reads cap_capacity and
update_maxcap_capacity writes into cap_capacity, one can argue for
some sort of locking mechanism to avoid a stale value.
But considering update_periodic_maxcap can be called from a system
critical path like scheduler tick function, a locking mechanism is not
ideal. This means that it is possible the value used to
calculate average thermal pressure for a cpu can be stale for upto 1
tick period.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
include/linux/sched.h | 14 +++++++++++
kernel/sched/Makefile | 2 +-
kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
kernel/sched/thermal.h | 13 ++++++++++
4 files changed, 94 insertions(+), 1 deletion(-)
create mode 100644 kernel/sched/thermal.c
create mode 100644 kernel/sched/thermal.h
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56b..875ce2b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
#endif
+#ifdef CONFIG_SMP
+void update_maxcap_capacity(int cpu, u64 capacity);
+
+void populate_max_capacity_info(void);
+#else
+static inline void update_maxcap_capacity(int cpu, u64 capacity)
+{
+}
+
+static inline void populate_max_capacity_info(void)
+{
+}
+#endif
+
const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 21fb5a5..4d3b820 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
obj-y += idle.o fair.o rt.o deadline.o
obj-y += wait.o wait_bit.o swait.o completion.o
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
new file mode 100644
index 0000000..5f0b2d4
--- /dev/null
+++ b/kernel/sched/thermal.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Sceduler Thermal Interactions
+ *
+ * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
+ */
+
+#include <linux/sched.h>
+#include "sched.h"
+#include "pelt.h"
+#include "thermal.h"
+
+struct max_capacity_info {
+ unsigned long max_capacity;
+ unsigned long cap_capacity;
+};
+
+static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
+
+void update_maxcap_capacity(int cpu, u64 capacity)
+{
+ struct max_capacity_info *__max_cap;
+ unsigned long __capacity;
+
+ __max_cap = (&per_cpu(max_cap, cpu));
+ if (!__max_cap) {
+ pr_err("no max_capacity_info structure for cpu %d\n", cpu);
+ return;
+ }
+
+ /* Normalize the capacity */
+ __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
+ SCHED_CAPACITY_SHIFT;
+ pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
+
+ __max_cap->cap_capacity = __capacity;
+}
+
+void populate_max_capacity_info(void)
+{
+ struct max_capacity_info *__max_cap;
+ u64 capacity;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ __max_cap = (&per_cpu(max_cap, cpu));
+ if (!__max_cap)
+ continue;
+ capacity = arch_scale_cpu_capacity(cpu);
+ __max_cap->max_capacity = capacity;
+ __max_cap->cap_capacity = capacity;
+ pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
+ }
+}
+
+void update_periodic_maxcap(struct rq *rq)
+{
+ struct max_capacity_info *__max_cap = (&per_cpu(max_cap, cpu_of(rq)));
+ unsigned long delta;
+
+ if (!__max_cap)
+ return;
+
+ delta = __max_cap->max_capacity - __max_cap->cap_capacity;
+ update_thermal_avg(rq_clock_task(rq), rq, delta);
+}
diff --git a/kernel/sched/thermal.h b/kernel/sched/thermal.h
new file mode 100644
index 0000000..5657cb4
--- /dev/null
+++ b/kernel/sched/thermal.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Scheduler thermal interaction internal methods.
+ */
+
+#ifdef CONFIG_SMP
+void update_periodic_maxcap(struct rq *rq);
+
+#else
+static inline void update_periodic_maxcap(struct rq *rq)
+{
+}
+#endif
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-14 0:58 ` [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous " Thara Gopinath
@ 2019-10-14 15:50 ` Vincent Guittot
2019-10-16 21:22 ` Thara Gopinath
0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2019-10-14 15:50 UTC (permalink / raw)
To: Thara Gopinath
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Thara,
On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>
> Add thermal.c and thermal.h files that provides interface
> APIs to initialize, update/average, track, accumulate and decay
> thermal pressure per cpu basis. A per cpu structure max_capacity_info is
> introduced to keep track of instantaneous per cpu thermal pressure.
> Thermal pressure is the delta between max_capacity and cap_capacity.
> API update_periodic_maxcap is called for periodic accumulate and decay
> of the thermal pressure. It is to to be called from a periodic tick
> function. This API calculates the delta between max_capacity and
> cap_capacity and passes on the delta to update_thermal_avg to do the
> necessary accumulate, decay and average. API update_maxcap_capacity is for
> the system to update the thermal pressure by updating cap_capacity.
> Considering, update_periodic_maxcap reads cap_capacity and
> update_maxcap_capacity writes into cap_capacity, one can argue for
> some sort of locking mechanism to avoid a stale value.
> But considering update_periodic_maxcap can be called from a system
> critical path like scheduler tick function, a locking mechanism is not
> ideal. This means that it is possible the value used to
> calculate average thermal pressure for a cpu can be stale for upto 1
> tick period.
>
> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> ---
> include/linux/sched.h | 14 +++++++++++
> kernel/sched/Makefile | 2 +-
> kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/sched/thermal.h | 13 ++++++++++
> 4 files changed, 94 insertions(+), 1 deletion(-)
> create mode 100644 kernel/sched/thermal.c
> create mode 100644 kernel/sched/thermal.h
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56b..875ce2b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
>
> #endif
>
> +#ifdef CONFIG_SMP
> +void update_maxcap_capacity(int cpu, u64 capacity);
> +
> +void populate_max_capacity_info(void);
> +#else
> +static inline void update_maxcap_capacity(int cpu, u64 capacity)
> +{
> +}
> +
> +static inline void populate_max_capacity_info(void)
> +{
> +}
> +#endif
> +
> const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
> char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
> int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
> index 21fb5a5..4d3b820 100644
> --- a/kernel/sched/Makefile
> +++ b/kernel/sched/Makefile
> @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
> obj-y += idle.o fair.o rt.o deadline.o
> obj-y += wait.o wait_bit.o swait.o completion.o
>
> -obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
> +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
> obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
> obj-$(CONFIG_SCHEDSTATS) += stats.o
> obj-$(CONFIG_SCHED_DEBUG) += debug.o
> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
> new file mode 100644
> index 0000000..5f0b2d4
> --- /dev/null
> +++ b/kernel/sched/thermal.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Sceduler Thermal Interactions
> + *
> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
> + */
> +
> +#include <linux/sched.h>
> +#include "sched.h"
> +#include "pelt.h"
> +#include "thermal.h"
> +
> +struct max_capacity_info {
> + unsigned long max_capacity;
> + unsigned long cap_capacity;
> +};
> +
> +static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
> +
> +void update_maxcap_capacity(int cpu, u64 capacity)
> +{
> + struct max_capacity_info *__max_cap;
> + unsigned long __capacity;
> +
> + __max_cap = (&per_cpu(max_cap, cpu));
> + if (!__max_cap) {
> + pr_err("no max_capacity_info structure for cpu %d\n", cpu);
> + return;
> + }
> +
> + /* Normalize the capacity */
> + __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
> + SCHED_CAPACITY_SHIFT;
> + pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
> +
> + __max_cap->cap_capacity = __capacity;
> +}
> +
> +void populate_max_capacity_info(void)
> +{
> + struct max_capacity_info *__max_cap;
> + u64 capacity;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + __max_cap = (&per_cpu(max_cap, cpu));
> + if (!__max_cap)
> + continue;
> + capacity = arch_scale_cpu_capacity(cpu);
> + __max_cap->max_capacity = capacity;
> + __max_cap->cap_capacity = capacity;
> + pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
> + }
> +}
everything above seems to be there for the cpu cooling device and
should be included in it instead. The scheduler only need the capacity
capping
The cpu cooling device should just set the delta capacity in the
per-cpu variable (see my comment below)
> +
> +void update_periodic_maxcap(struct rq *rq)
> +{
> + struct max_capacity_info *__max_cap = (&per_cpu(max_cap, cpu_of(rq)));
> + unsigned long delta;
> +
> + if (!__max_cap)
> + return;
> +
> + delta = __max_cap->max_capacity - __max_cap->cap_capacity;
Why don't you just save the delta in the per_cpu variable instead of
the struct max_capacity_info ? You have to compute the delta every
tick whereas we can expect it to not change that much compared to the
number of tick
> + update_thermal_avg(rq_clock_task(rq), rq, delta);
> +}
> diff --git a/kernel/sched/thermal.h b/kernel/sched/thermal.h
> new file mode 100644
> index 0000000..5657cb4
> --- /dev/null
> +++ b/kernel/sched/thermal.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Scheduler thermal interaction internal methods.
> + */
> +
> +#ifdef CONFIG_SMP
> +void update_periodic_maxcap(struct rq *rq);
> +
> +#else
> +static inline void update_periodic_maxcap(struct rq *rq)
> +{
> +}
> +#endif
> --
> 2.1.4
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-14 15:50 ` Vincent Guittot
@ 2019-10-16 21:22 ` Thara Gopinath
2019-10-17 8:44 ` Vincent Guittot
0 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-16 21:22 UTC (permalink / raw)
To: Vincent Guittot
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Vincent,
Thanks for the review
On 10/14/2019 11:50 AM, Vincent Guittot wrote:
> Hi Thara,
>
> On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>>
>> Add thermal.c and thermal.h files that provides interface
>> APIs to initialize, update/average, track, accumulate and decay
>> thermal pressure per cpu basis. A per cpu structure max_capacity_info is
>> introduced to keep track of instantaneous per cpu thermal pressure.
>> Thermal pressure is the delta between max_capacity and cap_capacity.
>> API update_periodic_maxcap is called for periodic accumulate and decay
>> of the thermal pressure. It is to to be called from a periodic tick
>> function. This API calculates the delta between max_capacity and
>> cap_capacity and passes on the delta to update_thermal_avg to do the
>> necessary accumulate, decay and average. API update_maxcap_capacity is for
>> the system to update the thermal pressure by updating cap_capacity.
>> Considering, update_periodic_maxcap reads cap_capacity and
>> update_maxcap_capacity writes into cap_capacity, one can argue for
>> some sort of locking mechanism to avoid a stale value.
>> But considering update_periodic_maxcap can be called from a system
>> critical path like scheduler tick function, a locking mechanism is not
>> ideal. This means that it is possible the value used to
>> calculate average thermal pressure for a cpu can be stale for upto 1
>> tick period.
>>
>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
>> ---
>> include/linux/sched.h | 14 +++++++++++
>> kernel/sched/Makefile | 2 +-
>> kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> kernel/sched/thermal.h | 13 ++++++++++
>> 4 files changed, 94 insertions(+), 1 deletion(-)
>> create mode 100644 kernel/sched/thermal.c
>> create mode 100644 kernel/sched/thermal.h
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 2c2e56b..875ce2b 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
>>
>> #endif
>>
>> +#ifdef CONFIG_SMP
>> +void update_maxcap_capacity(int cpu, u64 capacity);
>> +
>> +void populate_max_capacity_info(void);
>> +#else
>> +static inline void update_maxcap_capacity(int cpu, u64 capacity)
>> +{
>> +}
>> +
>> +static inline void populate_max_capacity_info(void)
>> +{
>> +}
>> +#endif
>> +
>> const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
>> char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
>> int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
>> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
>> index 21fb5a5..4d3b820 100644
>> --- a/kernel/sched/Makefile
>> +++ b/kernel/sched/Makefile
>> @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
>> obj-y += idle.o fair.o rt.o deadline.o
>> obj-y += wait.o wait_bit.o swait.o completion.o
>>
>> -obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
>> +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
>> obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
>> obj-$(CONFIG_SCHEDSTATS) += stats.o
>> obj-$(CONFIG_SCHED_DEBUG) += debug.o
>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
>> new file mode 100644
>> index 0000000..5f0b2d4
>> --- /dev/null
>> +++ b/kernel/sched/thermal.c
>> @@ -0,0 +1,66 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Sceduler Thermal Interactions
>> + *
>> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
>> + */
>> +
>> +#include <linux/sched.h>
>> +#include "sched.h"
>> +#include "pelt.h"
>> +#include "thermal.h"
>> +
>> +struct max_capacity_info {
>> + unsigned long max_capacity;
>> + unsigned long cap_capacity;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
>> +
>> +void update_maxcap_capacity(int cpu, u64 capacity)
>> +{
>> + struct max_capacity_info *__max_cap;
>> + unsigned long __capacity;
>> +
>> + __max_cap = (&per_cpu(max_cap, cpu));
>> + if (!__max_cap) {
>> + pr_err("no max_capacity_info structure for cpu %d\n", cpu);
>> + return;
>> + }
>> +
>> + /* Normalize the capacity */
>> + __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
>> + SCHED_CAPACITY_SHIFT;
>> + pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
>> +
>> + __max_cap->cap_capacity = __capacity;
>> +}
>> +
>> +void populate_max_capacity_info(void)
>> +{
>> + struct max_capacity_info *__max_cap;
>> + u64 capacity;
>> + int cpu;
>> +
>> + for_each_possible_cpu(cpu) {
>> + __max_cap = (&per_cpu(max_cap, cpu));
>> + if (!__max_cap)
>> + continue;
>> + capacity = arch_scale_cpu_capacity(cpu);
>> + __max_cap->max_capacity = capacity;
>> + __max_cap->cap_capacity = capacity;
>> + pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
>> + }
>> +}
>
> everything above seems to be there for the cpu cooling device and
> should be included in it instead. The scheduler only need the capacity
> capping
> The cpu cooling device should just set the delta capacity in the
> per-cpu variable (see my comment below)
It can be a firmware updating the thermal pressure instead of cpu
cooling device. Or may be some other entity .So instead of replicating
this code, isnt't it better to reside in one place ?
>
>> +
>> +void update_periodic_maxcap(struct rq *rq)
>> +{
>> + struct max_capacity_info *__max_cap = (&per_cpu(max_cap, cpu_of(rq)));
>> + unsigned long delta;
>> +
>> + if (!__max_cap)
>> + return;
>> +
>> + delta = __max_cap->max_capacity - __max_cap->cap_capacity;
>
> Why don't you just save the delta in the per_cpu variable instead of
> the struct max_capacity_info ? You have to compute the delta every
> tick whereas we can expect it to not change that much compared to the
> number of tick.
Again I think thermal pressure can be applied from other entities like
firmware as well. But I agree with your point that calculating delta
every tick is not a good idea. How about I move it to
update_maxcap_capacity so that delta is calculate every time an update
comes from cpu cooling device or anybody else ?
Warm Regards
Thara
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-16 21:22 ` Thara Gopinath
@ 2019-10-17 8:44 ` Vincent Guittot
2019-10-17 16:40 ` Thara Gopinath
0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2019-10-17 8:44 UTC (permalink / raw)
To: Thara Gopinath
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Thara,
On Wed, 16 Oct 2019 at 23:22, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>
> Hi Vincent,
>
> Thanks for the review
> On 10/14/2019 11:50 AM, Vincent Guittot wrote:
> > Hi Thara,
> >
> > On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
> >>
> >> Add thermal.c and thermal.h files that provides interface
> >> APIs to initialize, update/average, track, accumulate and decay
> >> thermal pressure per cpu basis. A per cpu structure max_capacity_info is
> >> introduced to keep track of instantaneous per cpu thermal pressure.
> >> Thermal pressure is the delta between max_capacity and cap_capacity.
> >> API update_periodic_maxcap is called for periodic accumulate and decay
> >> of the thermal pressure. It is to to be called from a periodic tick
> >> function. This API calculates the delta between max_capacity and
> >> cap_capacity and passes on the delta to update_thermal_avg to do the
> >> necessary accumulate, decay and average. API update_maxcap_capacity is for
> >> the system to update the thermal pressure by updating cap_capacity.
> >> Considering, update_periodic_maxcap reads cap_capacity and
> >> update_maxcap_capacity writes into cap_capacity, one can argue for
> >> some sort of locking mechanism to avoid a stale value.
> >> But considering update_periodic_maxcap can be called from a system
> >> critical path like scheduler tick function, a locking mechanism is not
> >> ideal. This means that it is possible the value used to
> >> calculate average thermal pressure for a cpu can be stale for upto 1
> >> tick period.
> >>
> >> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> >> ---
> >> include/linux/sched.h | 14 +++++++++++
> >> kernel/sched/Makefile | 2 +-
> >> kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> kernel/sched/thermal.h | 13 ++++++++++
> >> 4 files changed, 94 insertions(+), 1 deletion(-)
> >> create mode 100644 kernel/sched/thermal.c
> >> create mode 100644 kernel/sched/thermal.h
> >>
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 2c2e56b..875ce2b 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
> >>
> >> #endif
> >>
> >> +#ifdef CONFIG_SMP
> >> +void update_maxcap_capacity(int cpu, u64 capacity);
> >> +
> >> +void populate_max_capacity_info(void);
> >> +#else
> >> +static inline void update_maxcap_capacity(int cpu, u64 capacity)
> >> +{
> >> +}
> >> +
> >> +static inline void populate_max_capacity_info(void)
> >> +{
> >> +}
> >> +#endif
> >> +
> >> const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
> >> char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
> >> int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
> >> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
> >> index 21fb5a5..4d3b820 100644
> >> --- a/kernel/sched/Makefile
> >> +++ b/kernel/sched/Makefile
> >> @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
> >> obj-y += idle.o fair.o rt.o deadline.o
> >> obj-y += wait.o wait_bit.o swait.o completion.o
> >>
> >> -obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
> >> +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
> >> obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
> >> obj-$(CONFIG_SCHEDSTATS) += stats.o
> >> obj-$(CONFIG_SCHED_DEBUG) += debug.o
> >> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
> >> new file mode 100644
> >> index 0000000..5f0b2d4
> >> --- /dev/null
> >> +++ b/kernel/sched/thermal.c
> >> @@ -0,0 +1,66 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Sceduler Thermal Interactions
> >> + *
> >> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
> >> + */
> >> +
> >> +#include <linux/sched.h>
> >> +#include "sched.h"
> >> +#include "pelt.h"
> >> +#include "thermal.h"
> >> +
> >> +struct max_capacity_info {
> >> + unsigned long max_capacity;
> >> + unsigned long cap_capacity;
> >> +};
> >> +
> >> +static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
> >> +
> >> +void update_maxcap_capacity(int cpu, u64 capacity)
> >> +{
> >> + struct max_capacity_info *__max_cap;
> >> + unsigned long __capacity;
> >> +
> >> + __max_cap = (&per_cpu(max_cap, cpu));
> >> + if (!__max_cap) {
> >> + pr_err("no max_capacity_info structure for cpu %d\n", cpu);
> >> + return;
> >> + }
> >> +
> >> + /* Normalize the capacity */
> >> + __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
> >> + SCHED_CAPACITY_SHIFT;
> >> + pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
> >> +
> >> + __max_cap->cap_capacity = __capacity;
> >> +}
> >> +
> >> +void populate_max_capacity_info(void)
> >> +{
> >> + struct max_capacity_info *__max_cap;
> >> + u64 capacity;
> >> + int cpu;
> >> +
> >> + for_each_possible_cpu(cpu) {
> >> + __max_cap = (&per_cpu(max_cap, cpu));
> >> + if (!__max_cap)
> >> + continue;
> >> + capacity = arch_scale_cpu_capacity(cpu);
> >> + __max_cap->max_capacity = capacity;
> >> + __max_cap->cap_capacity = capacity;
> >> + pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
> >> + }
> >> +}
> >
> > everything above seems to be there for the cpu cooling device and
> > should be included in it instead. The scheduler only need the capacity
> > capping
> > The cpu cooling device should just set the delta capacity in the
> > per-cpu variable (see my comment below)
> It can be a firmware updating the thermal pressure instead of cpu
> cooling device. Or may be some other entity .So instead of replicating
> this code, isnt't it better to reside in one place ?
But you have now idea which kind of metrics will be provided by
firmware. It might not be a capped frequency and a max frequency but
other metrics.
Also, __max_cap->max_capacity just save in a new per cpu struct the
return of arch_scale_cpu_capacity but this doesn't give us real
benefit
> >
> >> +
> >> +void update_periodic_maxcap(struct rq *rq)
> >> +{
> >> + struct max_capacity_info *__max_cap = (&per_cpu(max_cap, cpu_of(rq)));
> >> + unsigned long delta;
> >> +
> >> + if (!__max_cap)
> >> + return;
> >> +
> >> + delta = __max_cap->max_capacity - __max_cap->cap_capacity;
> >
> > Why don't you just save the delta in the per_cpu variable instead of
> > the struct max_capacity_info ? You have to compute the delta every
> > tick whereas we can expect it to not change that much compared to the
> > number of tick.
>
> Again I think thermal pressure can be applied from other entities like
> firmware as well. But I agree with your point that calculating delta
> every tick is not a good idea. How about I move it to
> update_maxcap_capacity so that delta is calculate every time an update
> comes from cpu cooling device or anybody else ?
>
> Warm Regards
> Thara
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-17 8:44 ` Vincent Guittot
@ 2019-10-17 16:40 ` Thara Gopinath
2019-10-18 8:05 ` Vincent Guittot
0 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-17 16:40 UTC (permalink / raw)
To: Vincent Guittot
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
On 10/17/2019 04:44 AM, Vincent Guittot wrote:
> Hi Thara,
>
> On Wed, 16 Oct 2019 at 23:22, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>>
>> Hi Vincent,
>>
>> Thanks for the review
>> On 10/14/2019 11:50 AM, Vincent Guittot wrote:
>>> Hi Thara,
>>>
>>> On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>>>>
>>>> Add thermal.c and thermal.h files that provides interface
>>>> APIs to initialize, update/average, track, accumulate and decay
>>>> thermal pressure per cpu basis. A per cpu structure max_capacity_info is
>>>> introduced to keep track of instantaneous per cpu thermal pressure.
>>>> Thermal pressure is the delta between max_capacity and cap_capacity.
>>>> API update_periodic_maxcap is called for periodic accumulate and decay
>>>> of the thermal pressure. It is to to be called from a periodic tick
>>>> function. This API calculates the delta between max_capacity and
>>>> cap_capacity and passes on the delta to update_thermal_avg to do the
>>>> necessary accumulate, decay and average. API update_maxcap_capacity is for
>>>> the system to update the thermal pressure by updating cap_capacity.
>>>> Considering, update_periodic_maxcap reads cap_capacity and
>>>> update_maxcap_capacity writes into cap_capacity, one can argue for
>>>> some sort of locking mechanism to avoid a stale value.
>>>> But considering update_periodic_maxcap can be called from a system
>>>> critical path like scheduler tick function, a locking mechanism is not
>>>> ideal. This means that it is possible the value used to
>>>> calculate average thermal pressure for a cpu can be stale for upto 1
>>>> tick period.
>>>>
>>>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
>>>> ---
>>>> include/linux/sched.h | 14 +++++++++++
>>>> kernel/sched/Makefile | 2 +-
>>>> kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> kernel/sched/thermal.h | 13 ++++++++++
>>>> 4 files changed, 94 insertions(+), 1 deletion(-)
>>>> create mode 100644 kernel/sched/thermal.c
>>>> create mode 100644 kernel/sched/thermal.h
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 2c2e56b..875ce2b 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
>>>>
>>>> #endif
>>>>
>>>> +#ifdef CONFIG_SMP
>>>> +void update_maxcap_capacity(int cpu, u64 capacity);
>>>> +
>>>> +void populate_max_capacity_info(void);
>>>> +#else
>>>> +static inline void update_maxcap_capacity(int cpu, u64 capacity)
>>>> +{
>>>> +}
>>>> +
>>>> +static inline void populate_max_capacity_info(void)
>>>> +{
>>>> +}
>>>> +#endif
>>>> +
>>>> const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
>>>> char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
>>>> int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
>>>> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
>>>> index 21fb5a5..4d3b820 100644
>>>> --- a/kernel/sched/Makefile
>>>> +++ b/kernel/sched/Makefile
>>>> @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
>>>> obj-y += idle.o fair.o rt.o deadline.o
>>>> obj-y += wait.o wait_bit.o swait.o completion.o
>>>>
>>>> -obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
>>>> +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
>>>> obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
>>>> obj-$(CONFIG_SCHEDSTATS) += stats.o
>>>> obj-$(CONFIG_SCHED_DEBUG) += debug.o
>>>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
>>>> new file mode 100644
>>>> index 0000000..5f0b2d4
>>>> --- /dev/null
>>>> +++ b/kernel/sched/thermal.c
>>>> @@ -0,0 +1,66 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +/*
>>>> + * Sceduler Thermal Interactions
>>>> + *
>>>> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
>>>> + */
>>>> +
>>>> +#include <linux/sched.h>
>>>> +#include "sched.h"
>>>> +#include "pelt.h"
>>>> +#include "thermal.h"
>>>> +
>>>> +struct max_capacity_info {
>>>> + unsigned long max_capacity;
>>>> + unsigned long cap_capacity;
>>>> +};
>>>> +
>>>> +static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
>>>> +
>>>> +void update_maxcap_capacity(int cpu, u64 capacity)
>>>> +{
>>>> + struct max_capacity_info *__max_cap;
>>>> + unsigned long __capacity;
>>>> +
>>>> + __max_cap = (&per_cpu(max_cap, cpu));
>>>> + if (!__max_cap) {
>>>> + pr_err("no max_capacity_info structure for cpu %d\n", cpu);
>>>> + return;
>>>> + }
>>>> +
>>>> + /* Normalize the capacity */
>>>> + __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
>>>> + SCHED_CAPACITY_SHIFT;
>>>> + pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
>>>> +
>>>> + __max_cap->cap_capacity = __capacity;
>>>> +}
>>>> +
>>>> +void populate_max_capacity_info(void)
>>>> +{
>>>> + struct max_capacity_info *__max_cap;
>>>> + u64 capacity;
>>>> + int cpu;
>>>> +
>>>> + for_each_possible_cpu(cpu) {
>>>> + __max_cap = (&per_cpu(max_cap, cpu));
>>>> + if (!__max_cap)
>>>> + continue;
>>>> + capacity = arch_scale_cpu_capacity(cpu);
>>>> + __max_cap->max_capacity = capacity;
>>>> + __max_cap->cap_capacity = capacity;
>>>> + pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
>>>> + }
>>>> +}
>>>
>>> everything above seems to be there for the cpu cooling device and
>>> should be included in it instead. The scheduler only need the capacity
>>> capping
>>> The cpu cooling device should just set the delta capacity in the
>>> per-cpu variable (see my comment below)
>> It can be a firmware updating the thermal pressure instead of cpu
>> cooling device. Or may be some other entity .So instead of replicating
>> this code, isnt't it better to reside in one place ?
>
> But you have now idea which kind of metrics will be provided by
> firmware. It might not be a capped frequency and a max frequency but
> other metrics.
hmm. Why ? It is thermal pressure. It is finally applied to the
scheduler metric cpu_capacity. It should not be anything other than
capacity reduction due to thermal events that should be provided.
> Also, __max_cap->max_capacity just save in a new per cpu struct the
> return of arch_scale_cpu_capacity but this doesn't give us real
> benefit
I agree. I will change the struct into a single variable delta and
initialize it to 0. populate_max_capacity_info need not be there. I will
have update_maxcap_capacity calculate the delta with
arch_scale_cpu_capacity and store it into a per cpu variable.
--
Warm Regards
Thara
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous thermal pressure
2019-10-17 16:40 ` Thara Gopinath
@ 2019-10-18 8:05 ` Vincent Guittot
0 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2019-10-18 8:05 UTC (permalink / raw)
To: Thara Gopinath
Cc: Ingo Molnar, Peter Zijlstra, Ionela Voinescu, Zhang Rui,
Eduardo Valentin, linux-kernel, Amit Kachhap, Javi Merino,
Daniel Lezcano
Hi Thara,
On Thu, 17 Oct 2019 at 18:40, Thara Gopinath <thara.gopinath@linaro.org> wrote:
>
> On 10/17/2019 04:44 AM, Vincent Guittot wrote:
> > Hi Thara,
> >
> > On Wed, 16 Oct 2019 at 23:22, Thara Gopinath <thara.gopinath@linaro.org> wrote:
> >>
> >> Hi Vincent,
> >>
> >> Thanks for the review
> >> On 10/14/2019 11:50 AM, Vincent Guittot wrote:
> >>> Hi Thara,
> >>>
> >>> On Mon, 14 Oct 2019 at 02:58, Thara Gopinath <thara.gopinath@linaro.org> wrote:
> >>>>
> >>>> Add thermal.c and thermal.h files that provides interface
> >>>> APIs to initialize, update/average, track, accumulate and decay
> >>>> thermal pressure per cpu basis. A per cpu structure max_capacity_info is
> >>>> introduced to keep track of instantaneous per cpu thermal pressure.
> >>>> Thermal pressure is the delta between max_capacity and cap_capacity.
> >>>> API update_periodic_maxcap is called for periodic accumulate and decay
> >>>> of the thermal pressure. It is to to be called from a periodic tick
> >>>> function. This API calculates the delta between max_capacity and
> >>>> cap_capacity and passes on the delta to update_thermal_avg to do the
> >>>> necessary accumulate, decay and average. API update_maxcap_capacity is for
> >>>> the system to update the thermal pressure by updating cap_capacity.
> >>>> Considering, update_periodic_maxcap reads cap_capacity and
> >>>> update_maxcap_capacity writes into cap_capacity, one can argue for
> >>>> some sort of locking mechanism to avoid a stale value.
> >>>> But considering update_periodic_maxcap can be called from a system
> >>>> critical path like scheduler tick function, a locking mechanism is not
> >>>> ideal. This means that it is possible the value used to
> >>>> calculate average thermal pressure for a cpu can be stale for upto 1
> >>>> tick period.
> >>>>
> >>>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> >>>> ---
> >>>> include/linux/sched.h | 14 +++++++++++
> >>>> kernel/sched/Makefile | 2 +-
> >>>> kernel/sched/thermal.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>> kernel/sched/thermal.h | 13 ++++++++++
> >>>> 4 files changed, 94 insertions(+), 1 deletion(-)
> >>>> create mode 100644 kernel/sched/thermal.c
> >>>> create mode 100644 kernel/sched/thermal.h
> >>>>
> >>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >>>> index 2c2e56b..875ce2b 100644
> >>>> --- a/include/linux/sched.h
> >>>> +++ b/include/linux/sched.h
> >>>> @@ -1983,6 +1983,20 @@ static inline void rseq_syscall(struct pt_regs *regs)
> >>>>
> >>>> #endif
> >>>>
> >>>> +#ifdef CONFIG_SMP
> >>>> +void update_maxcap_capacity(int cpu, u64 capacity);
> >>>> +
> >>>> +void populate_max_capacity_info(void);
> >>>> +#else
> >>>> +static inline void update_maxcap_capacity(int cpu, u64 capacity)
> >>>> +{
> >>>> +}
> >>>> +
> >>>> +static inline void populate_max_capacity_info(void)
> >>>> +{
> >>>> +}
> >>>> +#endif
> >>>> +
> >>>> const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
> >>>> char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
> >>>> int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
> >>>> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
> >>>> index 21fb5a5..4d3b820 100644
> >>>> --- a/kernel/sched/Makefile
> >>>> +++ b/kernel/sched/Makefile
> >>>> @@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
> >>>> obj-y += idle.o fair.o rt.o deadline.o
> >>>> obj-y += wait.o wait_bit.o swait.o completion.o
> >>>>
> >>>> -obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
> >>>> +obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
> >>>> obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
> >>>> obj-$(CONFIG_SCHEDSTATS) += stats.o
> >>>> obj-$(CONFIG_SCHED_DEBUG) += debug.o
> >>>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
> >>>> new file mode 100644
> >>>> index 0000000..5f0b2d4
> >>>> --- /dev/null
> >>>> +++ b/kernel/sched/thermal.c
> >>>> @@ -0,0 +1,66 @@
> >>>> +// SPDX-License-Identifier: GPL-2.0
> >>>> +/*
> >>>> + * Sceduler Thermal Interactions
> >>>> + *
> >>>> + * Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
> >>>> + */
> >>>> +
> >>>> +#include <linux/sched.h>
> >>>> +#include "sched.h"
> >>>> +#include "pelt.h"
> >>>> +#include "thermal.h"
> >>>> +
> >>>> +struct max_capacity_info {
> >>>> + unsigned long max_capacity;
> >>>> + unsigned long cap_capacity;
> >>>> +};
> >>>> +
> >>>> +static DEFINE_PER_CPU(struct max_capacity_info, max_cap);
> >>>> +
> >>>> +void update_maxcap_capacity(int cpu, u64 capacity)
> >>>> +{
> >>>> + struct max_capacity_info *__max_cap;
> >>>> + unsigned long __capacity;
> >>>> +
> >>>> + __max_cap = (&per_cpu(max_cap, cpu));
> >>>> + if (!__max_cap) {
> >>>> + pr_err("no max_capacity_info structure for cpu %d\n", cpu);
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + /* Normalize the capacity */
> >>>> + __capacity = (capacity * arch_scale_cpu_capacity(cpu)) >>
> >>>> + SCHED_CAPACITY_SHIFT;
> >>>> + pr_debug("updating cpu%d capped capacity from %lu to %lu\n", cpu, __max_cap->cap_capacity, __capacity);
> >>>> +
> >>>> + __max_cap->cap_capacity = __capacity;
> >>>> +}
> >>>> +
> >>>> +void populate_max_capacity_info(void)
> >>>> +{
> >>>> + struct max_capacity_info *__max_cap;
> >>>> + u64 capacity;
> >>>> + int cpu;
> >>>> +
> >>>> + for_each_possible_cpu(cpu) {
> >>>> + __max_cap = (&per_cpu(max_cap, cpu));
> >>>> + if (!__max_cap)
> >>>> + continue;
> >>>> + capacity = arch_scale_cpu_capacity(cpu);
> >>>> + __max_cap->max_capacity = capacity;
> >>>> + __max_cap->cap_capacity = capacity;
> >>>> + pr_debug("cpu %d max capacity set to %ld\n", cpu, __max_cap->max_capacity);
> >>>> + }
> >>>> +}
> >>>
> >>> everything above seems to be there for the cpu cooling device and
> >>> should be included in it instead. The scheduler only need the capacity
> >>> capping
> >>> The cpu cooling device should just set the delta capacity in the
> >>> per-cpu variable (see my comment below)
> >> It can be a firmware updating the thermal pressure instead of cpu
> >> cooling device. Or may be some other entity .So instead of replicating
> >> this code, isnt't it better to reside in one place ?
> >
> > But you have now idea which kind of metrics will be provided by
> > firmware. It might not be a capped frequency and a max frequency but
> > other metrics.
> hmm. Why ? It is thermal pressure. It is finally applied to the
> scheduler metric cpu_capacity. It should not be anything other than
> capacity reduction due to thermal events that should be provided.
What I mean is that you don't know how the value will be provided and
if you will need a cap and a max value. It might easily provide
directly the delta, or a percent or even something else
> > Also, __max_cap->max_capacity just save in a new per cpu struct the
> > return of arch_scale_cpu_capacity but this doesn't give us real
> > benefit
> I agree. I will change the struct into a single variable delta and
> initialize it to 0. populate_max_capacity_info need not be there. I will
> have update_maxcap_capacity calculate the delta with
> arch_scale_cpu_capacity and store it into a per cpu variable.
Also could you rename this function. You use thermal for the pelt
function so could you align the function names ? either to use thermal
everywhere or another name but could you not mix the naming
If you want to use thermal why not using update_thermal_pressure ? at
least it's easy to understand what it does
>
>
> --
> Warm Regards
> Thara
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Patch v3 3/7] sched: Initialize per cpu thermal pressure structure
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
2019-10-14 0:58 ` [Patch v3 1/7] sched/pelt.c: Add support to track thermal pressure Thara Gopinath
2019-10-14 0:58 ` [Patch v3 2/7] sched: Add infrastructure to store and update instantaneous " Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 0:58 ` [Patch v3 4/7] sched/fair: Enable CFS periodic tick to update thermal pressure Thara Gopinath
` (3 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Initialize per cpu max_capacity_info during scheduler init.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
drivers/base/arch_topology.c | 1 +
kernel/sched/core.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1eb81f11..7ac9f2f 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -123,6 +123,7 @@ void topology_normalize_cpu_scale(void)
pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
cpu, topology_get_cpu_scale(cpu));
}
+ populate_max_capacity_info();
}
bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f..744f026 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6534,6 +6534,8 @@ void __init sched_init_smp(void)
init_sched_rt_class();
init_sched_dl_class();
+ populate_max_capacity_info();
+
sched_smp_initialized = true;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Patch v3 4/7] sched/fair: Enable CFS periodic tick to update thermal pressure
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
` (2 preceding siblings ...)
2019-10-14 0:58 ` [Patch v3 3/7] sched: Initialize per cpu thermal pressure structure Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 0:58 ` [Patch v3 5/7] sched/fair: update cpu_capcity to reflect " Thara Gopinath
` (2 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Introduce support in CFS periodic tick to trigger the process of
computing average thermal pressure for a cpu.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
kernel/sched/fair.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83ab35e..fe7c165 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -21,6 +21,7 @@
* Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
*/
#include "sched.h"
+#include "thermal.h"
#include <trace/events/sched.h>
@@ -7566,6 +7567,8 @@ static void update_blocked_averages(int cpu)
done = false;
update_blocked_load_status(rq, !done);
+
+ update_periodic_maxcap(rq);
rq_unlock_irqrestore(rq, &rf);
}
@@ -9925,6 +9928,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
update_misfit_status(curr, rq);
update_overutilized_status(task_rq(curr));
+
+ update_periodic_maxcap(rq);
}
/*
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Patch v3 5/7] sched/fair: update cpu_capcity to reflect thermal pressure
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
` (3 preceding siblings ...)
2019-10-14 0:58 ` [Patch v3 4/7] sched/fair: Enable CFS periodic tick to update thermal pressure Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 0:58 ` [Patch v3 6/7] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
2019-10-14 0:58 ` [Patch v3 7/7] sched: thermal: Enable tuning of decay period Thara Gopinath
6 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
cpu_capacity relflects the maximum available capacity of a cpu. Thermal
pressure on a cpu means this maximum available capacity is reduced. This
patch reduces the average thermal pressure for a cpu from its maximum
available capacity so that cpu_capacity reflects the actual
available capacity.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
kernel/sched/fair.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fe7c165..cbc2208 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7719,6 +7719,7 @@ static unsigned long scale_rt_capacity(struct sched_domain *sd, int cpu)
used = READ_ONCE(rq->avg_rt.util_avg);
used += READ_ONCE(rq->avg_dl.util_avg);
+ used += READ_ONCE(rq->avg_thermal.load_avg);
if (unlikely(used >= max))
return 1;
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Patch v3 6/7] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
` (4 preceding siblings ...)
2019-10-14 0:58 ` [Patch v3 5/7] sched/fair: update cpu_capcity to reflect " Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-14 0:58 ` [Patch v3 7/7] sched: thermal: Enable tuning of decay period Thara Gopinath
6 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Thermal governors can request for a cpu's maximum supported frequency
to be capped in case of an overheat event. This in turn means that the
maximum capacity available for tasks to run on the particular cpu is
reduced. Delta between the original maximum capacity and capped
maximum capacity is known as thermal pressure. Enable cpufreq cooling
device to update the thermal pressure in event of a capped
maximum frequency.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
drivers/thermal/cpu_cooling.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 391f397..9e2764a 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -218,6 +218,23 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
}
/**
+ * update_sched_max_capacity - update scheduler about change in cpu
+ * max frequency.
+ * @policy - cpufreq policy whose max frequency is capped.
+ */
+static void update_sched_max_capacity(struct cpumask *cpus,
+ unsigned int cur_max_freq,
+ unsigned int max_freq)
+{
+ int cpu;
+ unsigned long capacity = (cur_max_freq << SCHED_CAPACITY_SHIFT) /
+ max_freq;
+
+ for_each_cpu(cpu, cpus)
+ update_maxcap_capacity(cpu, capacity);
+}
+
+/**
* get_load() - get load for a cpu since last updated
* @cpufreq_cdev: &struct cpufreq_cooling_device for this cpu
* @cpu: cpu number
@@ -320,6 +337,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
unsigned long state)
{
struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+ int ret;
/* Request state should be less than max_level */
if (WARN_ON(state > cpufreq_cdev->max_level))
@@ -331,8 +349,17 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
cpufreq_cdev->cpufreq_state = state;
- return dev_pm_qos_update_request(&cpufreq_cdev->qos_req,
- cpufreq_cdev->freq_table[state].frequency);
+ ret = dev_pm_qos_update_request
+ (&cpufreq_cdev->qos_req,
+ cpufreq_cdev->freq_table[state].frequency);
+
+ if (ret > 0)
+ update_sched_max_capacity
+ (cpufreq_cdev->policy->cpus,
+ cpufreq_cdev->freq_table[state].frequency,
+ cpufreq_cdev->policy->cpuinfo.max_freq);
+
+ return ret;
}
/**
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Patch v3 7/7] sched: thermal: Enable tuning of decay period
2019-10-14 0:58 [Patch v3 0/7] Introduce Thermal Pressure Thara Gopinath
` (5 preceding siblings ...)
2019-10-14 0:58 ` [Patch v3 6/7] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
@ 2019-10-14 0:58 ` Thara Gopinath
2019-10-15 10:14 ` Quentin Perret
6 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-14 0:58 UTC (permalink / raw)
To: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang, edubezval
Cc: linux-kernel, amit.kachhap, javi.merino, daniel.lezcano
Thermal pressure follows pelt signas which means the
decay period for thermal pressure is the default pelt
decay period. Depending on soc charecteristics and thermal
activity, it might be beneficial to decay thermal pressure
slower, but still in-tune with the pelt signals.
One way to achieve this slow decay is to right shift the now
run time.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
include/linux/sched/sysctl.h | 1 +
kernel/sched/thermal.c | 16 +++++++++++++++-
kernel/sysctl.c | 7 +++++++
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215..f4c4afd 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -41,6 +41,7 @@ extern unsigned int sysctl_numa_balancing_scan_size;
#ifdef CONFIG_SCHED_DEBUG
extern __read_mostly unsigned int sysctl_sched_migration_cost;
extern __read_mostly unsigned int sysctl_sched_nr_migrate;
+extern __read_mostly unsigned int sysctl_sched_thermal_decay_coeff;
int sched_proc_update_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length,
diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
index 5f0b2d4..2a00488 100644
--- a/kernel/sched/thermal.c
+++ b/kernel/sched/thermal.c
@@ -10,6 +10,19 @@
#include "pelt.h"
#include "thermal.h"
+/**
+ * By default the decay is the default pelt decay period.
+ * The decay coefficient can change is decay period in
+ * multiples of 32.
+ * Decay coefficient Decay period(ms)
+ * 0 32
+ * 1 64
+ * 2 128
+ * 3 256
+ * 4 512
+ */
+unsigned int sysctl_sched_thermal_decay_coeff __read_mostly;
+
struct max_capacity_info {
unsigned long max_capacity;
unsigned long cap_capacity;
@@ -62,5 +75,6 @@ void update_periodic_maxcap(struct rq *rq)
return;
delta = __max_cap->max_capacity - __max_cap->cap_capacity;
- update_thermal_avg(rq_clock_task(rq), rq, delta);
+ update_thermal_avg((rq_clock_task(rq) >>
+ sysctl_sched_thermal_decay_coeff), rq, delta);
}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 00fcea2..5056c08 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -376,6 +376,13 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec,
},
+ {
+ .procname = "sched_thermal_decay_coeff",
+ .data = &sysctl_sched_thermal_decay_coeff,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
#ifdef CONFIG_SCHEDSTATS
{
.procname = "sched_schedstats",
--
2.1.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Patch v3 7/7] sched: thermal: Enable tuning of decay period
2019-10-14 0:58 ` [Patch v3 7/7] sched: thermal: Enable tuning of decay period Thara Gopinath
@ 2019-10-15 10:14 ` Quentin Perret
2019-10-21 21:03 ` Thara Gopinath
0 siblings, 1 reply; 19+ messages in thread
From: Quentin Perret @ 2019-10-15 10:14 UTC (permalink / raw)
To: Thara Gopinath
Cc: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang,
edubezval, linux-kernel, amit.kachhap, javi.merino,
daniel.lezcano
Hi Thara,
On Sunday 13 Oct 2019 at 20:58:25 (-0400), Thara Gopinath wrote:
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 00fcea2..5056c08 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -376,6 +376,13 @@ static struct ctl_table kern_table[] = {
> .mode = 0644,
> .proc_handler = proc_dointvec,
> },
> + {
> + .procname = "sched_thermal_decay_coeff",
> + .data = &sysctl_sched_thermal_decay_coeff,
> + .maxlen = sizeof(unsigned int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
Perhaps change this for 'sched_proc_update_handler' with min and max
values ? Otherwise userspace is allowed to write nonsensical values
here. And since sysctl_sched_thermal_decay_coeff is used to shift, this
can lead to an undefined behaviour.
Also, could we take this sysctl out of SCHED_DEBUG ? I expect this to be
used/tuned on production devices where SCHED_DEBUG should theoretically
be off.
Thanks,
Quentin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 7/7] sched: thermal: Enable tuning of decay period
2019-10-15 10:14 ` Quentin Perret
@ 2019-10-21 21:03 ` Thara Gopinath
2019-10-22 9:03 ` Quentin Perret
0 siblings, 1 reply; 19+ messages in thread
From: Thara Gopinath @ 2019-10-21 21:03 UTC (permalink / raw)
To: Quentin Perret
Cc: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang,
edubezval, linux-kernel, amit.kachhap, javi.merino,
daniel.lezcano
Hi Quentin,
Thanks for the review.
On 10/15/2019 06:14 AM, Quentin Perret wrote:
> Hi Thara,
>
> On Sunday 13 Oct 2019 at 20:58:25 (-0400), Thara Gopinath wrote:
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index 00fcea2..5056c08 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -376,6 +376,13 @@ static struct ctl_table kern_table[] = {
>> .mode = 0644,
>> .proc_handler = proc_dointvec,
>> },
>> + {
>> + .procname = "sched_thermal_decay_coeff",
>> + .data = &sysctl_sched_thermal_decay_coeff,
>> + .maxlen = sizeof(unsigned int),
>> + .mode = 0644,
>> + .proc_handler = proc_dointvec,
>
> Perhaps change this for 'sched_proc_update_handler' with min and max
> values ? Otherwise userspace is allowed to write nonsensical values
> here. And since sysctl_sched_thermal_decay_coeff is used to shift, this
> can lead to an undefined behaviour.
Will do
>
> Also, could we take this sysctl out of SCHED_DEBUG ? I expect this to be
> used/tuned on production devices where SCHED_DEBUG should theoretically
> be off.
I will take it out of SCHED_DEBUG. I am wondering if this should be
a runtime control at all. Because this is a shift this changes the
accumulating window for the thermal pressure signal. A runtime change
will not guarantee a clean start of the window. May be I should make
this a config option.
>
> Thanks,
> Quentin
>
--
Warm Regards
Thara
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 7/7] sched: thermal: Enable tuning of decay period
2019-10-21 21:03 ` Thara Gopinath
@ 2019-10-22 9:03 ` Quentin Perret
2019-10-22 20:36 ` Thara Gopinath
0 siblings, 1 reply; 19+ messages in thread
From: Quentin Perret @ 2019-10-22 9:03 UTC (permalink / raw)
To: Thara Gopinath
Cc: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang,
edubezval, linux-kernel, amit.kachhap, javi.merino,
daniel.lezcano, kernel-team
Hi Thara,
On Monday 21 Oct 2019 at 17:03:56 (-0400), Thara Gopinath wrote:
> On 10/15/2019 06:14 AM, Quentin Perret wrote:
> > Hi Thara,
> >
> > On Sunday 13 Oct 2019 at 20:58:25 (-0400), Thara Gopinath wrote:
> >> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> >> index 00fcea2..5056c08 100644
> >> --- a/kernel/sysctl.c
> >> +++ b/kernel/sysctl.c
> >> @@ -376,6 +376,13 @@ static struct ctl_table kern_table[] = {
> >> .mode = 0644,
> >> .proc_handler = proc_dointvec,
> >> },
> >> + {
> >> + .procname = "sched_thermal_decay_coeff",
> >> + .data = &sysctl_sched_thermal_decay_coeff,
> >> + .maxlen = sizeof(unsigned int),
> >> + .mode = 0644,
> >> + .proc_handler = proc_dointvec,
> >
> > Perhaps change this for 'sched_proc_update_handler' with min and max
> > values ? Otherwise userspace is allowed to write nonsensical values
> > here. And since sysctl_sched_thermal_decay_coeff is used to shift, this
> > can lead to an undefined behaviour.
> Will do
> >
> > Also, could we take this sysctl out of SCHED_DEBUG ? I expect this to be
> > used/tuned on production devices where SCHED_DEBUG should theoretically
> > be off.
>
> I will take it out of SCHED_DEBUG. I am wondering if this should be
> a runtime control at all. Because this is a shift this changes the
> accumulating window for the thermal pressure signal. A runtime change
> will not guarantee a clean start of the window. May be I should make
> this a config option.
I'd personally prefer if it wan't a Kconfig option. We'd like to make
Android devices (which are going to use this) work with a Generic Kernel
Image, which means there will be a single config for everyone. But I
expect this knob to be tuned to different values depending on the SoC.
If you really don't want a sysctl, perhaps a cmdline option could work ?
Thanks,
Quentin
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Patch v3 7/7] sched: thermal: Enable tuning of decay period
2019-10-22 9:03 ` Quentin Perret
@ 2019-10-22 20:36 ` Thara Gopinath
0 siblings, 0 replies; 19+ messages in thread
From: Thara Gopinath @ 2019-10-22 20:36 UTC (permalink / raw)
To: Quentin Perret
Cc: mingo, peterz, ionela.voinescu, vincent.guittot, rui.zhang,
edubezval, linux-kernel, amit.kachhap, javi.merino,
daniel.lezcano, kernel-team
On 10/22/2019 05:03 AM, Quentin Perret wrote:
> Hi Thara,
>
> On Monday 21 Oct 2019 at 17:03:56 (-0400), Thara Gopinath wrote:
>> On 10/15/2019 06:14 AM, Quentin Perret wrote:
>>> Hi Thara,
>>>
>>> On Sunday 13 Oct 2019 at 20:58:25 (-0400), Thara Gopinath wrote:
>>>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>>>> index 00fcea2..5056c08 100644
>>>> --- a/kernel/sysctl.c
>>>> +++ b/kernel/sysctl.c
>>>> @@ -376,6 +376,13 @@ static struct ctl_table kern_table[] = {
>>>> .mode = 0644,
>>>> .proc_handler = proc_dointvec,
>>>> },
>>>> + {
>>>> + .procname = "sched_thermal_decay_coeff",
>>>> + .data = &sysctl_sched_thermal_decay_coeff,
>>>> + .maxlen = sizeof(unsigned int),
>>>> + .mode = 0644,
>>>> + .proc_handler = proc_dointvec,
>>>
>>> Perhaps change this for 'sched_proc_update_handler' with min and max
>>> values ? Otherwise userspace is allowed to write nonsensical values
>>> here. And since sysctl_sched_thermal_decay_coeff is used to shift, this
>>> can lead to an undefined behaviour.
>> Will do
>>>
>>> Also, could we take this sysctl out of SCHED_DEBUG ? I expect this to be
>>> used/tuned on production devices where SCHED_DEBUG should theoretically
>>> be off.
>>
>> I will take it out of SCHED_DEBUG. I am wondering if this should be
>> a runtime control at all. Because this is a shift this changes the
>> accumulating window for the thermal pressure signal. A runtime change
>> will not guarantee a clean start of the window. May be I should make
>> this a config option.
>
> I'd personally prefer if it wan't a Kconfig option. We'd like to make
> Android devices (which are going to use this) work with a Generic Kernel
> Image, which means there will be a single config for everyone. But I
> expect this knob to be tuned to different values depending on the SoC.
>
> If you really don't want a sysctl, perhaps a cmdline option could work ?
yes . I will. I have sent across v4 with these and other review comments
fixed.
>
> Thanks,
> Quentin
>
--
Warm Regards
Thara
^ permalink raw reply [flat|nested] 19+ messages in thread