All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch v10 0/9] Introduce Thermal Pressure
@ 2020-02-22  0:52 Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Thermal governors can respond to an overheat event of a cpu by
capping the cpu's maximum possible frequency. This in turn
means that the maximum available compute capacity of the
cpu is restricted. But today in the kernel, task scheduler is
not notified of capping of maximum frequency of a cpu.
In other words, scheduler is unaware of maximum capacity
restrictions placed on a cpu due to thermal activity.
This patch series attempts to address this issue.
The benefits identified are better task placement among available
cpus in event of overheating which in turn leads to better
performance numbers.

The reduction in the maximum possible capacity of a cpu due to a
thermal event can be considered as thermal pressure. Instantaneous
thermal pressure is hard to record and can sometime be erroneous
as there can be mismatch between the actual capping of capacity
and scheduler recording it. Thus solution is to have a weighted
average per cpu value for thermal pressure over time.
The weight reflects the amount of time the cpu has spent at a
capped maximum frequency. Since thermal pressure is recorded as
an average, it must be decayed periodically. Exisiting algorithm
in the kernel scheduler pelt framework is re-used to calculate
the weighted average. This patch series also defines a sysctl
inerface to allow for a configurable decay period.

Regarding testing, basic build, boot and sanity testing have been
performed on db845c platform with debian file system.
Further, dhrystone and hackbench tests have been
run with the thermal pressure algorithm. During testing, due to
constraints of step wise governor in dealing with big little systems,
trip point 0 temperature was made assymetric between cpus in little
cluster and big cluster; the idea being that
big core will heat up and cpu cooling device will throttle the
frequency of the big cores faster, there by limiting the maximum available
capacity and the scheduler will spread out tasks to little cores as well.

Test Results

Hackbench: 1 group , 30000 loops, 10 runs
                                               Result         SD
                                               (Secs)     (% of mean)
 No Thermal Pressure                            14.03       2.69%
 Thermal Pressure PELT Algo. Decay : 32 ms      13.29       0.56%
 Thermal Pressure PELT Algo. Decay : 64 ms      12.57       1.56%
 Thermal Pressure PELT Algo. Decay : 128 ms     12.71       1.04%
 Thermal Pressure PELT Algo. Decay : 256 ms     12.29       1.42%
 Thermal Pressure PELT Algo. Decay : 512 ms     12.42       1.15%

Dhrystone Run Time  : 20 threads, 3000 MLOOPS
                                                 Result      SD
                                                 (Secs)    (% of mean)
 No Thermal Pressure                              9.452      4.49%
 Thermal Pressure PELT Algo. Decay : 32 ms        8.793      5.30%
 Thermal Pressure PELT Algo. Decay : 64 ms        8.981      5.29%
 Thermal Pressure PELT Algo. Decay : 128 ms       8.647      6.62%
 Thermal Pressure PELT Algo. Decay : 256 ms       8.774      6.45%
 Thermal Pressure PELT Algo. Decay : 512 ms       8.603      5.41%

A Brief History

The first version of this patch-series was posted with resuing
PELT algorithm to decay thermal pressure signal. The discussions
that followed were around whether intanteneous thermal pressure
solution is better and whether a stand-alone algortihm to accumulate
and decay thermal pressure is more appropriate than re-using the
PELT framework.
Tests on Hikey960 showed the stand-alone algorithm performing slightly
better than resuing PELT algorithm and V2 was posted with the stand
alone algorithm. Test results were shared as part of this series.
Discussions were around re-using PELT algorithm and running
further tests with more granular decay period.

For some time after this development was impeded due to hardware
unavailability, some other unforseen and possibly unfortunate events.
For this version, h/w was switched from hikey960 to db845c.
Also Instantaneous thermal pressure was never tested as part of this
cycle as it is clear that weighted average is a better implementation.
The non-PELT algorithm never gave any conclusive results to prove that it
is better than reusing PELT algorithm, in this round of testing.
Also reusing PELT algorithm means thermal pressure tracks the
other utilization signals in the scheduler.

v3->v4:
        - "Patch 3/7:sched: Initialize per cpu thermal pressure structure"
           is dropped as it is no longer needed following changes in other
           other patches.
        - rest of the change log mentioned in specific patches.

v5->v6:
	- "Added arch_ interface APIs to access and update thermal pressure.
	   Moved declaration of per cpu thermal_pressure valriable and
	   infrastructure to update the variable to topology files.

v6->v7:
	- Added CONFIG_HAVE_SCHED_THERMAL_PRESSURE to stub out
	  update_thermal_load_avg in unsupported architectures as per
	  review comments from Peter, Dietmar and Quentin.
	- Renamed arch_scale_thermal_capacity to arch_cpu_thermal_pressure
	  as per review comments from Peter, Dietmar and Ionela.
	- Changed the input argument in arch_set_thermal_pressure from
	  capped capacity to delta capacity(thermal pressure) as per
	  Ionela's review comments. Hence the calculation for delta
	  capacity(thermal pressure) is moved to cpufreq_cooling.c.
	- Fixed a bunch of spelling typos.

v7->v8:
	- Fixed typo in defining update_thermal_load_avg which was
	  causing build errors (reported by kbuild test report)

v8->v9:
	- Defined thermal_load_avg to read rq->avg_thermal.load_avg and
	  avoid cacheline miss in unsupported cases as per Peter's
          suggestion.
	- Moved periodic triggering of thermal pressure averaging from CFS
	  tick function to generic scheduler core tick function.
	- Moved rq_clock_thermal from fair.c to sched.h to enable using
	  the function from multiple files.
	- Initialized the __shift to 0 in setup_sched_thermal_decay_shift
	  as per Quentin's suggestion
	- Added an extra patch enabling CONFIG_HAVE_SCHED_THERMAL_PRESSURE
	  as per Dietmar's request.

v9->v10:
	- Renamed arch_cpu_thermal_pressure to arch_scale_thermal_pressure
	  as per review comments from Dietmar.
	- Split "[Patch v9 3/8] arm,arm64,drivers:Add infrastructure to
	  store and update instantaneous thermal pressure" into 3 thus
	  separating out arch/arm and arch/arm64 specific code into
	  individual patches as suggested by Amit Kucheria.
	- Added description for sched_thermal_decay_shift in
	  kernel-parameters.txt following Randy's review comments.
	- Fixed typos in comments as per Amit Kucheria's review comments.

Thara Gopinath (9):
  sched/pelt: Add support to track thermal pressure
  sched/topology: Add hook to read per cpu thermal pressure.
  drivers/base/arch_topology: Add infrastructure to store and update
    instantaneous thermal pressure
  arm64/topology: Populate arch_cpu_thermal_pressure for arm64 platforms
  arm/topology: Populate arch_cpu_thermal_pressure for arm platforms
  sched/fair: Enable periodic update of average thermal pressure
  sched/fair: update cpu_capacity to reflect thermal pressure
  thermal/cpu-cooling: Update thermal pressure in case of a maximum
    frequency capping
  sched/fair: Enable tuning of decay period

 .../admin-guide/kernel-parameters.txt         | 16 ++++++++++
 arch/arm/include/asm/topology.h               |  3 ++
 arch/arm64/include/asm/topology.h             |  3 ++
 drivers/base/arch_topology.c                  | 11 +++++++
 drivers/thermal/cpufreq_cooling.c             | 19 ++++++++++--
 include/linux/arch_topology.h                 | 10 ++++++
 include/linux/sched/topology.h                |  8 +++++
 include/trace/events/sched.h                  |  4 +++
 init/Kconfig                                  |  4 +++
 kernel/sched/core.c                           |  3 ++
 kernel/sched/fair.c                           | 27 ++++++++++++++++
 kernel/sched/pelt.c                           | 31 +++++++++++++++++++
 kernel/sched/pelt.h                           | 31 +++++++++++++++++++
 kernel/sched/sched.h                          | 21 +++++++++++++
 14 files changed, 189 insertions(+), 2 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-02-22  0:59   ` Randy Dunlap
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 2/9] sched/topology: Add hook to read per cpu " Thara Gopinath
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Extrapolating on the existing framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressure. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a cpu running a rt/dl task through util_avg, the
average thermal pressure is tracked through load_avg. This is because
thermal pressure signal is weighted time "delta" capacity unlike util_avg
which is binary. "delta capacity" here means delta between the actual
capacity of a cpu and the decreased capacity a cpu due to a thermal event.

In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_load_avg can be called
to do the periodic bookkeeping (accumulate, decay and average) of the
thermal pressure.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
v6->v7:
	- Added CONFIG_HAVE_SCHED_THERMAL_PRESSURE to stub out
	  update_thermal_load_avg in unsupported architectures as per
	  review comments from Peter, Dietmar and Quentin.
	- Updated comment for update_thermal_load_avg as per review
	  comments from Peter and Dietmar.
v7->v8:
	- Fixed typo in defining update_thermal_load_avg which was
	  causing build errors (reported by kbuild test report)
v8->v9:
	- Defined thermal_load_avg to read rq->avg_thermal.load_avg and
	  avoid cacheline miss in unsupported cases as per Peter's
          suggestion.
v9->v10:
	- Fixed typos in comments as per Amit Kucheria's review comments.

 include/trace/events/sched.h |  4 ++++
 init/Kconfig                 |  4 ++++
 kernel/sched/pelt.c          | 31 +++++++++++++++++++++++++++++++
 kernel/sched/pelt.h          | 31 +++++++++++++++++++++++++++++++
 kernel/sched/sched.h         |  3 +++
 5 files changed, 73 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 420e80e56e55..a8fb667c669e 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -613,6 +613,10 @@ DECLARE_TRACE(pelt_dl_tp,
 	TP_PROTO(struct rq *rq),
 	TP_ARGS(rq));
 
+DECLARE_TRACE(pelt_thermal_tp,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq));
+
 DECLARE_TRACE(pelt_irq_tp,
 	TP_PROTO(struct rq *rq),
 	TP_ARGS(rq));
diff --git a/init/Kconfig b/init/Kconfig
index 2a25c769eaaa..8d56902efa70 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -464,6 +464,10 @@ config HAVE_SCHED_AVG_IRQ
 	depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
 	depends on SMP
 
+config HAVE_SCHED_THERMAL_PRESSURE
+	bool "Enable periodic averaging of thermal pressure"
+	depends on SMP
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	depends on MULTIUSER
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index bd006b79b360..1fdacbf6fb44 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -367,6 +367,37 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 	return 0;
 }
 
+#ifdef CONFIG_HAVE_SCHED_THERMAL_PRESSURE
+/*
+ * thermal:
+ *
+ *   load_sum = \Sum se->avg.load_sum but se->avg.load_sum is not tracked
+ *
+ *   util_avg and runnable_load_avg are not supported and meaningless.
+ *
+ * Unlike rt/dl utilization tracking that track time spent by a cpu
+ * running a rt/dl task through util_avg, the average thermal pressure is
+ * tracked through load_avg. This is because thermal pressure signal is
+ * time weighted "delta" capacity unlike util_avg which is binary.
+ * "delta capacity" =  actual capacity  -
+ *			capped capacity a cpu due to a thermal event.
+ */
+
+int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	if (___update_load_sum(now, &rq->avg_thermal,
+			       capacity,
+			       capacity,
+			       capacity)) {
+		___update_load_avg(&rq->avg_thermal, 1, 1);
+		trace_pelt_thermal_tp(rq);
+		return 1;
+	}
+
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 /*
  * irq:
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index afff644da065..916979a54782 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -7,6 +7,26 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
 int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
 int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
 
+#ifdef CONFIG_HAVE_SCHED_THERMAL_PRESSURE
+int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity);
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return READ_ONCE(rq->avg_thermal.load_avg);
+}
+#else
+static inline int
+update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	return 0;
+}
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 int update_irq_load_avg(struct rq *rq, u64 running);
 #else
@@ -158,6 +178,17 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 	return 0;
 }
 
+static inline int
+update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	return 0;
+}
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return 0;
+}
+
 static inline int
 update_irq_load_avg(struct rq *rq, u64 running)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 12bf82d86156..211411ac0efa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -943,6 +943,9 @@ struct rq {
 	struct sched_avg	avg_dl;
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 	struct sched_avg	avg_irq;
+#endif
+#ifdef CONFIG_HAVE_SCHED_THERMAL_PRESSURE
+	struct sched_avg	avg_thermal;
 #endif
 	u64			idle_stamp;
 	u64			avg_idle;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 2/9] sched/topology: Add hook to read per cpu thermal pressure.
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] sched/topology: Add callback to read per CPU " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 3/9] drivers/base/arch_topology: Add infrastructure to store and update instantaneous " Thara Gopinath
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Introduce arch_scale_thermal_pressure to retrieve per cpu thermal
pressure.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v6->v7:
	- Renamed arch_scale_thermal_capacity to arch_cpu_thermal_pressure
	  as per review comments from Peter, Dietmar and Ionela.
v9->v10:
	- Renamed arch_cpu_thermal_pressure to arch_scale_thermal_pressure
	  as per review comments from Dietmar.

 include/linux/sched/topology.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index f341163fedc9..af9319e4cfb9 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -225,6 +225,14 @@ unsigned long arch_scale_cpu_capacity(int cpu)
 }
 #endif
 
+#ifndef arch_scale_thermal_pressure
+static __always_inline
+unsigned long arch_scale_thermal_pressure(int cpu)
+{
+	return 0;
+}
+#endif
+
 static inline int task_node(const struct task_struct *p)
 {
 	return cpu_to_node(task_cpu(p));
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 3/9] drivers/base/arch_topology: Add infrastructure to store and update instantaneous thermal pressure
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 2/9] sched/topology: Add hook to read per cpu " Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 4/9] arm64/topology: Populate arch_scale_thermal_pressure for arm64 platforms Thara Gopinath
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Add architecture specific APIs to update and track thermal pressure on a
per cpu basis. A per cpu variable thermal_pressure is introduced to keep
track of instantaneous per cpu thermal pressure. Thermal pressure is the
delta between maximum capacity and capped capacity due to a thermal event.

topology_get_thermal_pressure can be hooked into the scheduler specified
arch_scale_thermal_pressure to retrieve instantaneous thermal pressure of
a cpu.

arch_set_thermal_pressure can be used to update the thermal pressure.

Considering topology_get_thermal_pressure reads thermal_pressure and
arch_set_thermal_pressure writes into thermal_pressure, one can argue for
some sort of locking mechanism to avoid a stale value.  But considering
topology_get_thermal_pressure can be called from a system critical path
like scheduler tick function, a locking mechanism is not ideal. This means
that it is possible the thermal_pressure value used to calculate average
thermal pressure for a cpu can be stale for upto 1 tick period.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v6->v7:
	- Changed the input argument in arch_set_thermal_pressure from
	  capped capacity to delta capacity(thermal pressure) as per
	  Ionela's review comments.
v9->v10:
	- Split the patch into three thus separating out arch/arm
	  and arch/arm64 specific code into individual patches as
	  suggested by Amit Kucheria.

 drivers/base/arch_topology.c  | 11 +++++++++++
 include/linux/arch_topology.h | 10 ++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 6119e11a9f95..68dfa49d3b63 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -42,6 +42,17 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 	per_cpu(cpu_scale, cpu) = capacity;
 }
 
+DEFINE_PER_CPU(unsigned long, thermal_pressure);
+
+void arch_set_thermal_pressure(struct cpumask *cpus,
+			       unsigned long th_pressure)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpus)
+		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+}
+
 static ssize_t cpu_capacity_show(struct device *dev,
 				 struct device_attribute *attr,
 				 char *buf)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 3015ecbb90b1..88a115e81f27 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -33,6 +33,16 @@ unsigned long topology_get_freq_scale(int cpu)
 	return per_cpu(freq_scale, cpu);
 }
 
+DECLARE_PER_CPU(unsigned long, thermal_pressure);
+
+static inline unsigned long topology_get_thermal_pressure(int cpu)
+{
+	return per_cpu(thermal_pressure, cpu);
+}
+
+void arch_set_thermal_pressure(struct cpumask *cpus,
+			       unsigned long th_pressure);
+
 struct cpu_topology {
 	int thread_id;
 	int core_id;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 4/9] arm64/topology: Populate arch_scale_thermal_pressure for arm64 platforms
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (2 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 3/9] drivers/base/arch_topology: Add infrastructure to store and update instantaneous " Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] arm64/topology: Populate arch_scale_thermal_pressure() " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 5/9] arm/topology: Populate arch_scale_thermal_pressure for arm platforms Thara Gopinath
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Hook up topology_get_thermal_pressure to arch_scale_thermal_pressure thus
enabling scheduler to retrieve instantaneous thermal pressure of a cpu.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
 arch/arm64/include/asm/topology.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index a4d945db95a2..cbd70d78ef15 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -25,6 +25,9 @@ int pcibus_to_node(struct pci_bus *bus);
 /* Enable topology flag updates */
 #define arch_update_cpu_topology topology_update_cpu_topology
 
+/* Replace task scheduler's default thermal pressure retrieve API */
+#define arch_scale_thermal_pressure topology_get_thermal_pressure
+
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_ARM_TOPOLOGY_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 5/9] arm/topology: Populate arch_scale_thermal_pressure for arm platforms
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (3 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 4/9] arm64/topology: Populate arch_scale_thermal_pressure for arm64 platforms Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] arm/topology: Populate arch_scale_thermal_pressure() for ARM platforms tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure Thara Gopinath
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Hook up topology_get_thermal_pressure to arch_scale_thermal_pressure thus
enabling scheduler to retrieve instantaneous thermal pressure of a cpu.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
 arch/arm/include/asm/topology.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 8a0fae94d45e..3a50a19c7c28 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -16,6 +16,9 @@
 /* Enable topology flag updates */
 #define arch_update_cpu_topology topology_update_cpu_topology
 
+/* Replace task scheduler's default thermal pressure retrieve API */
+#define arch_scale_thermal_pressure topology_get_thermal_pressure
+
 #else
 
 static inline void init_cpu_topology(void) { }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (4 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 5/9] arm/topology: Populate arch_scale_thermal_pressure for arm platforms Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-02-27  9:03   ` Amit Kucheria
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 7/9] sched/fair: update cpu_capacity to reflect " Thara Gopinath
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Introduce support in scheduler periodic tick and other CFS bookkeeping
apis to trigger the process of computing average thermal pressure for a
cpu. Also consider avg_thermal.load_avg in others_have_blocked which
allows for decay of pelt signals.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v8->v9:
	- Moved periodic triggering of thermal pressure averaging from CFS
	  tick function to generic scheduler core tick function as per
	  Peter's review comments.

 kernel/sched/core.c | 3 +++
 kernel/sched/fair.c | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e94819d573be..160b5e9e8945 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3588,6 +3588,7 @@ void scheduler_tick(void)
 	struct rq *rq = cpu_rq(cpu);
 	struct task_struct *curr = rq->curr;
 	struct rq_flags rf;
+	unsigned long thermal_pressure;
 
 	arch_scale_freq_tick();
 	sched_clock_tick();
@@ -3595,6 +3596,8 @@ void scheduler_tick(void)
 	rq_lock(rq, &rf);
 
 	update_rq_clock(rq);
+	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
+	update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
 	curr->sched_class->task_tick(rq, curr, 0);
 	calc_global_load_tick(rq);
 	psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f38ff5a335d3..00b21a5b71f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7536,6 +7536,9 @@ static inline bool others_have_blocked(struct rq *rq)
 	if (READ_ONCE(rq->avg_dl.util_avg))
 		return true;
 
+	if (thermal_load_avg(rq))
+		return true;
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 	if (READ_ONCE(rq->avg_irq.util_avg))
 		return true;
@@ -7561,6 +7564,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 {
 	const struct sched_class *curr_class;
 	u64 now = rq_clock_pelt(rq);
+	unsigned long thermal_pressure;
 	bool decayed;
 
 	/*
@@ -7569,8 +7573,11 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 	 */
 	curr_class = rq->curr->sched_class;
 
+	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
+
 	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
 		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
+		  update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
 		  update_irq_load_avg(rq, 0);
 
 	if (others_have_blocked(rq))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 7/9] sched/fair: update cpu_capacity to reflect thermal pressure
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (5 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] sched/fair: Update " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 8/9] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

cpu_capacity initially reflects the maximum possible capacity of a cpu.
Thermal pressure on a cpu means this maximum possible capacity is
unavailable due to thermal events. This patch subtracts the average
thermal pressure for a cpu from its maximum possible capacity so that
cpu_capacity reflects the remaining maximum capacity.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v8->v9:
	- Use thermal_load_avg to read rq->avg_thermal.load_avg.

 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00b21a5b71f0..10e867e540ab 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7800,8 +7800,15 @@ static unsigned long scale_rt_capacity(struct sched_domain *sd, int cpu)
 	if (unlikely(irq >= max))
 		return 1;
 
+	/*
+	 * avg_rt.util_avg and avg_dl.util_avg track binary signals
+	 * (running and not running) with weights 0 and 1024 respectively.
+	 * avg_thermal.load_avg tracks thermal pressure and the weighted
+	 * average uses the actual delta max capacity(load).
+	 */
 	used = READ_ONCE(rq->avg_rt.util_avg);
 	used += READ_ONCE(rq->avg_dl.util_avg);
+	used += thermal_load_avg(rq);
 
 	if (unlikely(used >= max))
 		return 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 8/9] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (6 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 7/9] sched/fair: update cpu_capacity to reflect " Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  2020-02-22  0:52 ` [Patch v10 9/9] sched/fair: Enable tuning of decay period Thara Gopinath
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Thermal governors can request for a cpu's maximum supported frequency to
be capped in case of an overheat event. This in turn means that the
maximum capacity available for tasks to run on the particular cpu is
reduced. Delta between the original maximum capacity and capped maximum
capacity is known as thermal pressure. Enable cpufreq cooling device to
update the thermal pressure in event of a capped maximum frequency.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v6->v7
	- Changed the input argument in arch_set_thermal_pressure from
	  capped capacity to delta capacity(thermal pressure) as per
	  Ionela's review comments. Hence the calculation for delta
	  capacity(thermal pressure) is moved to cpufreq_cooling.c.

 drivers/thermal/cpufreq_cooling.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index fe83d7a210d4..4ae8c856c88e 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -431,6 +431,10 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 				 unsigned long state)
 {
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+	struct cpumask *cpus;
+	unsigned int frequency;
+	unsigned long max_capacity, capacity;
+	int ret;
 
 	/* Request state should be less than max_level */
 	if (WARN_ON(state > cpufreq_cdev->max_level))
@@ -442,8 +446,19 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 
 	cpufreq_cdev->cpufreq_state = state;
 
-	return freq_qos_update_request(&cpufreq_cdev->qos_req,
-				get_state_freq(cpufreq_cdev, state));
+	frequency = get_state_freq(cpufreq_cdev, state);
+
+	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
+
+	if (ret > 0) {
+		cpus = cpufreq_cdev->policy->cpus;
+		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
+		capacity = frequency * max_capacity;
+		capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
+		arch_set_thermal_pressure(cpus, max_capacity - capacity);
+	}
+
+	return ret;
 }
 
 /* Bind cpufreq callbacks to thermal cooling device ops */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [Patch v10 9/9] sched/fair: Enable tuning of decay period
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (7 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 8/9] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
@ 2020-02-22  0:52 ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  2020-02-27  9:01 ` [Patch v10 0/9] Introduce Thermal Pressure Amit Kucheria
       [not found] ` <CAP=VYLqWfqOZT6ec9cKyKOsOhu7HhVn2f_eU+ca006i4CV8R-w@mail.gmail.com>
  10 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22  0:52 UTC (permalink / raw)
  To: mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

Thermal pressure follows pelt signals which means the decay period for
thermal pressure is the default pelt decay period. Depending on soc
characteristics and thermal activity, it might be beneficial to decay
thermal pressure slower, but still in-tune with the pelt signals.  One way
to achieve this is to provide a command line parameter to set a decay
shift parameter to an integer between 0 and 10.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
---
v8->v9:
	- Initialized the __shift to 0 in setup_sched_thermal_decay_shift
	  as per Quentin's suggestion.
v9->v10:
	- Added description for sched_thermal_decay_shift in
	  kernel-parameters.txt following Randy's review comments.

 .../admin-guide/kernel-parameters.txt          | 16 ++++++++++++++++
 kernel/sched/core.c                            |  2 +-
 kernel/sched/fair.c                            | 15 ++++++++++++++-
 kernel/sched/sched.h                           | 18 ++++++++++++++++++
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7f407eb22d3..7cf12c611fe0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4393,6 +4393,22 @@
 			incurs a small amount of overhead in the scheduler
 			but is useful for debugging and performance tuning.
 
+	sched_thermal_decay_shift=
+			[KNL, SMP] Set a decay shift for scheduler thermal
+			pressure signal. Thermal pressure signal follows the
+			default decay period of other scheduler pelt
+			signals(usually 32 ms but configurable). Setting
+			sched_thermal_decay_shift will left shift the decay
+			period for the thermal pressure signal by the shift
+			value.
+			i.e. with the default pelt decay period of 32 ms
+			sched_thermal_decay_shift   thermal pressure decay pr
+				1			64 ms
+				2			128 ms
+			and so on.
+			Format: integer between 0 and 10
+			Default is 0.
+
 	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
 			xtime_lock contention on larger systems, and/or RCU lock
 			contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 160b5e9e8945..166a3edfad5f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3597,7 +3597,7 @@ void scheduler_tick(void)
 
 	update_rq_clock(rq);
 	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
-	update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
+	update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
 	curr->sched_class->task_tick(rq, curr, 0);
 	calc_global_load_tick(rq);
 	psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10e867e540ab..454d7735764e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -86,6 +86,19 @@ static unsigned int normalized_sysctl_sched_wakeup_granularity	= 1000000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
 
+int sched_thermal_decay_shift;
+static int __init setup_sched_thermal_decay_shift(char *str)
+{
+	int _shift = 0;
+
+	if (kstrtoint(str, 0, &_shift))
+		pr_warn("Unable to set scheduler thermal pressure decay shift parameter\n");
+
+	sched_thermal_decay_shift = clamp(_shift, 0, 10);
+	return 1;
+}
+__setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
+
 #ifdef CONFIG_SMP
 /*
  * For asym packing, by default the lower numbered CPU has higher priority.
@@ -7577,7 +7590,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 
 	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
 		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
-		  update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
+		  update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure) |
 		  update_irq_load_avg(rq, 0);
 
 	if (others_have_blocked(rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 211411ac0efa..e6312662679d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1110,6 +1110,24 @@ static inline u64 rq_clock_task(struct rq *rq)
 	return rq->clock_task;
 }
 
+/**
+ * By default the decay is the default pelt decay period.
+ * The decay shift can change the decay period in
+ * multiples of 32.
+ *  Decay shift		Decay period(ms)
+ *	0			32
+ *	1			64
+ *	2			128
+ *	3			256
+ *	4			512
+ */
+extern int sched_thermal_decay_shift;
+
+static inline u64 rq_clock_thermal(struct rq *rq)
+{
+	return rq_clock_task(rq) >> sched_thermal_decay_shift;
+}
+
 static inline void rq_clock_skip_update(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
@ 2020-02-22  0:59   ` Randy Dunlap
  2020-02-22 18:27     ` Thara Gopinath
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  1 sibling, 1 reply; 28+ messages in thread
From: Randy Dunlap @ 2020-02-22  0:59 UTC (permalink / raw)
  To: Thara Gopinath, mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

On 2/21/20 4:52 PM, Thara Gopinath wrote:
> diff --git a/init/Kconfig b/init/Kconfig
> index 2a25c769eaaa..8d56902efa70 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -464,6 +464,10 @@ config HAVE_SCHED_AVG_IRQ
>  	depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
>  	depends on SMP
>  
> +config HAVE_SCHED_THERMAL_PRESSURE
> +	bool "Enable periodic averaging of thermal pressure"

This prompt string makes this symbol user-configurable, but
I don't think that's what you want here.

> +	depends on SMP
> +
>  config BSD_PROCESS_ACCT
>  	bool "BSD Process Accounting"
>  	depends on MULTIUSER


-- 
~Randy


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-22  0:59   ` Randy Dunlap
@ 2020-02-22 18:27     ` Thara Gopinath
  2020-02-22 18:50       ` Randy Dunlap
  0 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-22 18:27 UTC (permalink / raw)
  To: Randy Dunlap, mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria



On 2/21/20 7:59 PM, Randy Dunlap wrote:
> On 2/21/20 4:52 PM, Thara Gopinath wrote:
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 2a25c769eaaa..8d56902efa70 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -464,6 +464,10 @@ config HAVE_SCHED_AVG_IRQ
>>   	depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
>>   	depends on SMP
>>   
>> +config HAVE_SCHED_THERMAL_PRESSURE
>> +	bool "Enable periodic averaging of thermal pressure"
> 
> This prompt string makes this symbol user-configurable, but
> I don't think that's what you want here.

Hi Randy,
Thank you for the review.
Actually I thought being user-configurable is a good idea as it will 
allow users to easily enable it and see if the benefits their systems. 
(I used menuconfig while developing, to enable it).
Do you see a reason why this should not be so?

> 
>> +	depends on SMP
>> +
>>   config BSD_PROCESS_ACCT
>>   	bool "BSD Process Accounting"
>>   	depends on MULTIUSER
> 
> 

-- 
Warm Regards
Thara

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-22 18:27     ` Thara Gopinath
@ 2020-02-22 18:50       ` Randy Dunlap
  2020-02-24 14:33         ` Thara Gopinath
  0 siblings, 1 reply; 28+ messages in thread
From: Randy Dunlap @ 2020-02-22 18:50 UTC (permalink / raw)
  To: Thara Gopinath, mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria

On 2/22/20 10:27 AM, Thara Gopinath wrote:
> 
> 
> On 2/21/20 7:59 PM, Randy Dunlap wrote:
>> On 2/21/20 4:52 PM, Thara Gopinath wrote:
>>> diff --git a/init/Kconfig b/init/Kconfig
>>> index 2a25c769eaaa..8d56902efa70 100644
>>> --- a/init/Kconfig
>>> +++ b/init/Kconfig
>>> @@ -464,6 +464,10 @@ config HAVE_SCHED_AVG_IRQ
>>>       depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
>>>       depends on SMP
>>>   +config HAVE_SCHED_THERMAL_PRESSURE
>>> +    bool "Enable periodic averaging of thermal pressure"
>>
>> This prompt string makes this symbol user-configurable, but
>> I don't think that's what you want here.
> 
> Hi Randy,
> Thank you for the review.
> Actually I thought being user-configurable is a good idea as it will allow users to easily enable it and see if the benefits their systems. (I used menuconfig while developing, to enable it).
> Do you see a reason why this should not be so?
> 
>>
>>> +    depends on SMP
>>> +
>>>   config BSD_PROCESS_ACCT
>>>       bool "BSD Process Accounting"
>>>       depends on MULTIUSER

Hi Thara,
Is there some other way that HAVE_SCHED_THERMAL_PRESSURE can become
set/enabled?  for example, is it selected by any other options?

The Kconfig symbols that begin with HAVE_ are usually something that
are platform-specific and are usually set (selected) by other options,
or they are "default y".

In init/Kconfig, I see 15 other HAVE_ Kconfig symbols,
and none of them have user prompt strings.  They are either selected
elsewhere or set inside their Kconfig block.

Maybe you just want to rename the Kconfig symbol so that it does not
being with HAVE_.


-- 
~Randy


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-22 18:50       ` Randy Dunlap
@ 2020-02-24 14:33         ` Thara Gopinath
  2020-02-25 15:47           ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Thara Gopinath @ 2020-02-24 14:33 UTC (permalink / raw)
  To: Randy Dunlap, mingo, peterz, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet
  Cc: linux-kernel, amit.kachhap, javi.merino, amit.kucheria



On 2/22/20 1:50 PM, Randy Dunlap wrote:
> On 2/22/20 10:27 AM, Thara Gopinath wrote:
>>
>>
>> On 2/21/20 7:59 PM, Randy Dunlap wrote:
>>> On 2/21/20 4:52 PM, Thara Gopinath wrote:
>>>> diff --git a/init/Kconfig b/init/Kconfig
>>>> index 2a25c769eaaa..8d56902efa70 100644
>>>> --- a/init/Kconfig
>>>> +++ b/init/Kconfig
>>>> @@ -464,6 +464,10 @@ config HAVE_SCHED_AVG_IRQ
>>>>        depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
>>>>        depends on SMP
>>>>    +config HAVE_SCHED_THERMAL_PRESSURE
>>>> +    bool "Enable periodic averaging of thermal pressure"
>>>
>>> This prompt string makes this symbol user-configurable, but
>>> I don't think that's what you want here.
>>
>> Hi Randy,
>> Thank you for the review.
>> Actually I thought being user-configurable is a good idea as it will allow users to easily enable it and see if the benefits their systems. (I used menuconfig while developing, to enable it).
>> Do you see a reason why this should not be so?
>>
>>>
>>>> +    depends on SMP
>>>> +
>>>>    config BSD_PROCESS_ACCT
>>>>        bool "BSD Process Accounting"
>>>>        depends on MULTIUSER
> 
> Hi Thara,
> Is there some other way that HAVE_SCHED_THERMAL_PRESSURE can become
> set/enabled?  for example, is it selected by any other options?
> 
> The Kconfig symbols that begin with HAVE_ are usually something that
> are platform-specific and are usually set (selected) by other options,
> or they are "default y".
> 
> In init/Kconfig, I see 15 other HAVE_ Kconfig symbols,
> and none of them have user prompt strings.  They are either selected
> elsewhere or set inside their Kconfig block.
> 
> Maybe you just want to rename the Kconfig symbol so that it does not
> being with HAVE_.

  Hi Randy,

I see what you mean. I will send an update to this patch with HAVE_ 
removed. It is not selected by any other options. Best is for user to 
select it or platform/SoC configs to enable it.
> 
> 

-- 
Warm Regards
Thara

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-24 14:33         ` Thara Gopinath
@ 2020-02-25 15:47           ` Peter Zijlstra
  2020-02-25 16:43             ` Thara Gopinath
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2020-02-25 15:47 UTC (permalink / raw)
  To: Thara Gopinath
  Cc: Randy Dunlap, mingo, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet, linux-kernel, amit.kachhap, javi.merino,
	amit.kucheria

On Mon, Feb 24, 2020 at 09:33:22AM -0500, Thara Gopinath wrote:
> I see what you mean. I will send an update to this patch with HAVE_ removed.
> It is not selected by any other options. Best is for user to select it or
> platform/SoC configs to enable it.

Done that for you ;-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 1/9] sched/pelt: Add support to track thermal pressure
  2020-02-25 15:47           ` Peter Zijlstra
@ 2020-02-25 16:43             ` Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: Thara Gopinath @ 2020-02-25 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Randy Dunlap, mingo, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, corbet, linux-kernel, amit.kachhap, javi.merino,
	amit.kucheria



On 2/25/20 10:47 AM, Peter Zijlstra wrote:
> On Mon, Feb 24, 2020 at 09:33:22AM -0500, Thara Gopinath wrote:
>> I see what you mean. I will send an update to this patch with HAVE_ removed.
>> It is not selected by any other options. Best is for user to select it or
>> platform/SoC configs to enable it.
> 
> Done that for you ;-)

Thanks!

> 

-- 
Warm Regards
Thara

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 0/9] Introduce Thermal Pressure
  2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
                   ` (8 preceding siblings ...)
  2020-02-22  0:52 ` [Patch v10 9/9] sched/fair: Enable tuning of decay period Thara Gopinath
@ 2020-02-27  9:01 ` Amit Kucheria
       [not found] ` <CAP=VYLqWfqOZT6ec9cKyKOsOhu7HhVn2f_eU+ca006i4CV8R-w@mail.gmail.com>
  10 siblings, 0 replies; 28+ messages in thread
From: Amit Kucheria @ 2020-02-27  9:01 UTC (permalink / raw)
  To: Thara Gopinath
  Cc: Ingo Molnar, Peter Zijlstra, ionela.voinescu, Vincent Guittot,
	Dietmar Eggemann, Zhang Rui, qperret, Daniel Lezcano,
	Viresh Kumar, Steven Rostedt, Will Deacon, Catalin Marinas,
	Sudeep Holla, Juri Lelli, Jonathan Corbet, LKML,
	Amit Daniel Kachhap, Javi Merino

On Sat, Feb 22, 2020 at 6:22 AM Thara Gopinath
<thara.gopinath@linaro.org> wrote:
>
> Thermal governors can respond to an overheat event of a cpu by
> capping the cpu's maximum possible frequency. This in turn
> means that the maximum available compute capacity of the
> cpu is restricted. But today in the kernel, task scheduler is
> not notified of capping of maximum frequency of a cpu.
> In other words, scheduler is unaware of maximum capacity
> restrictions placed on a cpu due to thermal activity.
> This patch series attempts to address this issue.
> The benefits identified are better task placement among available
> cpus in event of overheating which in turn leads to better
> performance numbers.
>
> The reduction in the maximum possible capacity of a cpu due to a
> thermal event can be considered as thermal pressure. Instantaneous
> thermal pressure is hard to record and can sometime be erroneous
> as there can be mismatch between the actual capping of capacity
> and scheduler recording it. Thus solution is to have a weighted
> average per cpu value for thermal pressure over time.
> The weight reflects the amount of time the cpu has spent at a
> capped maximum frequency. Since thermal pressure is recorded as
> an average, it must be decayed periodically. Exisiting algorithm
> in the kernel scheduler pelt framework is re-used to calculate
> the weighted average. This patch series also defines a sysctl
> inerface to allow for a configurable decay period.
>
> Regarding testing, basic build, boot and sanity testing have been
> performed on db845c platform with debian file system.
> Further, dhrystone and hackbench tests have been
> run with the thermal pressure algorithm. During testing, due to
> constraints of step wise governor in dealing with big little systems,
> trip point 0 temperature was made assymetric between cpus in little
> cluster and big cluster; the idea being that
> big core will heat up and cpu cooling device will throttle the
> frequency of the big cores faster, there by limiting the maximum available
> capacity and the scheduler will spread out tasks to little cores as well.
>
> Test Results
>
> Hackbench: 1 group , 30000 loops, 10 runs
>                                                Result         SD
>                                                (Secs)     (% of mean)
>  No Thermal Pressure                            14.03       2.69%
>  Thermal Pressure PELT Algo. Decay : 32 ms      13.29       0.56%
>  Thermal Pressure PELT Algo. Decay : 64 ms      12.57       1.56%
>  Thermal Pressure PELT Algo. Decay : 128 ms     12.71       1.04%
>  Thermal Pressure PELT Algo. Decay : 256 ms     12.29       1.42%
>  Thermal Pressure PELT Algo. Decay : 512 ms     12.42       1.15%
>
> Dhrystone Run Time  : 20 threads, 3000 MLOOPS
>                                                  Result      SD
>                                                  (Secs)    (% of mean)
>  No Thermal Pressure                              9.452      4.49%
>  Thermal Pressure PELT Algo. Decay : 32 ms        8.793      5.30%
>  Thermal Pressure PELT Algo. Decay : 64 ms        8.981      5.29%
>  Thermal Pressure PELT Algo. Decay : 128 ms       8.647      6.62%
>  Thermal Pressure PELT Algo. Decay : 256 ms       8.774      6.45%
>  Thermal Pressure PELT Algo. Decay : 512 ms       8.603      5.41%
>

I've tested this series with a patch to artificially reducing the
capacity of big cores on the QCOM sdm845 by reducing the temperature
at which it starts throttling (thereby introducing thermal pressure
earlier) and can see the tasks being migrated to the LITTLE cores.

FWIW,
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>

> A Brief History
>
> The first version of this patch-series was posted with resuing
> PELT algorithm to decay thermal pressure signal. The discussions
> that followed were around whether intanteneous thermal pressure
> solution is better and whether a stand-alone algortihm to accumulate
> and decay thermal pressure is more appropriate than re-using the
> PELT framework.
> Tests on Hikey960 showed the stand-alone algorithm performing slightly
> better than resuing PELT algorithm and V2 was posted with the stand
> alone algorithm. Test results were shared as part of this series.
> Discussions were around re-using PELT algorithm and running
> further tests with more granular decay period.
>
> For some time after this development was impeded due to hardware
> unavailability, some other unforseen and possibly unfortunate events.
> For this version, h/w was switched from hikey960 to db845c.
> Also Instantaneous thermal pressure was never tested as part of this
> cycle as it is clear that weighted average is a better implementation.
> The non-PELT algorithm never gave any conclusive results to prove that it
> is better than reusing PELT algorithm, in this round of testing.
> Also reusing PELT algorithm means thermal pressure tracks the
> other utilization signals in the scheduler.
>
> v3->v4:
>         - "Patch 3/7:sched: Initialize per cpu thermal pressure structure"
>            is dropped as it is no longer needed following changes in other
>            other patches.
>         - rest of the change log mentioned in specific patches.
>
> v5->v6:
>         - "Added arch_ interface APIs to access and update thermal pressure.
>            Moved declaration of per cpu thermal_pressure valriable and
>            infrastructure to update the variable to topology files.
>
> v6->v7:
>         - Added CONFIG_HAVE_SCHED_THERMAL_PRESSURE to stub out
>           update_thermal_load_avg in unsupported architectures as per
>           review comments from Peter, Dietmar and Quentin.
>         - Renamed arch_scale_thermal_capacity to arch_cpu_thermal_pressure
>           as per review comments from Peter, Dietmar and Ionela.
>         - Changed the input argument in arch_set_thermal_pressure from
>           capped capacity to delta capacity(thermal pressure) as per
>           Ionela's review comments. Hence the calculation for delta
>           capacity(thermal pressure) is moved to cpufreq_cooling.c.
>         - Fixed a bunch of spelling typos.
>
> v7->v8:
>         - Fixed typo in defining update_thermal_load_avg which was
>           causing build errors (reported by kbuild test report)
>
> v8->v9:
>         - Defined thermal_load_avg to read rq->avg_thermal.load_avg and
>           avoid cacheline miss in unsupported cases as per Peter's
>           suggestion.
>         - Moved periodic triggering of thermal pressure averaging from CFS
>           tick function to generic scheduler core tick function.
>         - Moved rq_clock_thermal from fair.c to sched.h to enable using
>           the function from multiple files.
>         - Initialized the __shift to 0 in setup_sched_thermal_decay_shift
>           as per Quentin's suggestion
>         - Added an extra patch enabling CONFIG_HAVE_SCHED_THERMAL_PRESSURE
>           as per Dietmar's request.
>
> v9->v10:
>         - Renamed arch_cpu_thermal_pressure to arch_scale_thermal_pressure
>           as per review comments from Dietmar.
>         - Split "[Patch v9 3/8] arm,arm64,drivers:Add infrastructure to
>           store and update instantaneous thermal pressure" into 3 thus
>           separating out arch/arm and arch/arm64 specific code into
>           individual patches as suggested by Amit Kucheria.
>         - Added description for sched_thermal_decay_shift in
>           kernel-parameters.txt following Randy's review comments.
>         - Fixed typos in comments as per Amit Kucheria's review comments.
>
> Thara Gopinath (9):
>   sched/pelt: Add support to track thermal pressure
>   sched/topology: Add hook to read per cpu thermal pressure.
>   drivers/base/arch_topology: Add infrastructure to store and update
>     instantaneous thermal pressure
>   arm64/topology: Populate arch_cpu_thermal_pressure for arm64 platforms
>   arm/topology: Populate arch_cpu_thermal_pressure for arm platforms
>   sched/fair: Enable periodic update of average thermal pressure
>   sched/fair: update cpu_capacity to reflect thermal pressure
>   thermal/cpu-cooling: Update thermal pressure in case of a maximum
>     frequency capping
>   sched/fair: Enable tuning of decay period
>
>  .../admin-guide/kernel-parameters.txt         | 16 ++++++++++
>  arch/arm/include/asm/topology.h               |  3 ++
>  arch/arm64/include/asm/topology.h             |  3 ++
>  drivers/base/arch_topology.c                  | 11 +++++++
>  drivers/thermal/cpufreq_cooling.c             | 19 ++++++++++--
>  include/linux/arch_topology.h                 | 10 ++++++
>  include/linux/sched/topology.h                |  8 +++++
>  include/trace/events/sched.h                  |  4 +++
>  init/Kconfig                                  |  4 +++
>  kernel/sched/core.c                           |  3 ++
>  kernel/sched/fair.c                           | 27 ++++++++++++++++
>  kernel/sched/pelt.c                           | 31 +++++++++++++++++++
>  kernel/sched/pelt.h                           | 31 +++++++++++++++++++
>  kernel/sched/sched.h                          | 21 +++++++++++++
>  14 files changed, 189 insertions(+), 2 deletions(-)
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure
  2020-02-22  0:52 ` [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure Thara Gopinath
@ 2020-02-27  9:03   ` Amit Kucheria
  2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
  1 sibling, 0 replies; 28+ messages in thread
From: Amit Kucheria @ 2020-02-27  9:03 UTC (permalink / raw)
  To: Thara Gopinath
  Cc: Ingo Molnar, Peter Zijlstra, ionela.voinescu, Vincent Guittot,
	Dietmar Eggemann, Zhang Rui, qperret, Daniel Lezcano,
	Viresh Kumar, Steven Rostedt, Will Deacon, Catalin Marinas,
	Sudeep Holla, Juri Lelli, Jonathan Corbet, LKML,
	Amit Daniel Kachhap, Javi Merino

On Sat, Feb 22, 2020 at 6:22 AM Thara Gopinath
<thara.gopinath@linaro.org> wrote:
>
> Introduce support in scheduler periodic tick and other CFS bookkeeping
> apis to trigger the process of computing average thermal pressure for a
> cpu. Also consider avg_thermal.load_avg in others_have_blocked which
> allows for decay of pelt signals.
>
> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
> ---
> v8->v9:
>         - Moved periodic triggering of thermal pressure averaging from CFS
>           tick function to generic scheduler core tick function as per
>           Peter's review comments.
>
>  kernel/sched/core.c | 3 +++
>  kernel/sched/fair.c | 7 +++++++
>  2 files changed, 10 insertions(+)

Hi Thara,

This patch has a fuzz while applying to v5.6-rc2. Just FYI.

Regards,
Amit


>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e94819d573be..160b5e9e8945 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3588,6 +3588,7 @@ void scheduler_tick(void)
>         struct rq *rq = cpu_rq(cpu);
>         struct task_struct *curr = rq->curr;
>         struct rq_flags rf;
> +       unsigned long thermal_pressure;
>
>         arch_scale_freq_tick();
>         sched_clock_tick();
> @@ -3595,6 +3596,8 @@ void scheduler_tick(void)
>         rq_lock(rq, &rf);
>
>         update_rq_clock(rq);
> +       thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
> +       update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
>         curr->sched_class->task_tick(rq, curr, 0);
>         calc_global_load_tick(rq);
>         psi_task_tick(rq);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f38ff5a335d3..00b21a5b71f0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7536,6 +7536,9 @@ static inline bool others_have_blocked(struct rq *rq)
>         if (READ_ONCE(rq->avg_dl.util_avg))
>                 return true;
>
> +       if (thermal_load_avg(rq))
> +               return true;
> +
>  #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
>         if (READ_ONCE(rq->avg_irq.util_avg))
>                 return true;
> @@ -7561,6 +7564,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
>  {
>         const struct sched_class *curr_class;
>         u64 now = rq_clock_pelt(rq);
> +       unsigned long thermal_pressure;
>         bool decayed;
>
>         /*
> @@ -7569,8 +7573,11 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
>          */
>         curr_class = rq->curr->sched_class;
>
> +       thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
> +
>         decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
>                   update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
> +                 update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
>                   update_irq_load_avg(rq, 0);
>
>         if (others_have_blocked(rq))
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [tip: sched/core] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping
  2020-02-22  0:52 ` [Patch v10 8/9] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     f12e4f66ab6a31f17da386b682e5fec87ae46537
Gitweb:        https://git.kernel.org/tip/f12e4f66ab6a31f17da386b682e5fec87ae46537
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:12 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:21 +01:00

thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping

Thermal governors can request for a CPU's maximum supported frequency to
be capped in case of an overheat event. This in turn means that the
maximum capacity available for tasks to run on the particular CPU is
reduced. Delta between the original maximum capacity and capped maximum
capacity is known as thermal pressure. Enable cpufreq cooling device to
update the thermal pressure in event of a capped maximum frequency.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-9-thara.gopinath@linaro.org
---
 drivers/thermal/cpufreq_cooling.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index fe83d7a..4ae8c85 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -431,6 +431,10 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 				 unsigned long state)
 {
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
+	struct cpumask *cpus;
+	unsigned int frequency;
+	unsigned long max_capacity, capacity;
+	int ret;
 
 	/* Request state should be less than max_level */
 	if (WARN_ON(state > cpufreq_cdev->max_level))
@@ -442,8 +446,19 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 
 	cpufreq_cdev->cpufreq_state = state;
 
-	return freq_qos_update_request(&cpufreq_cdev->qos_req,
-				get_state_freq(cpufreq_cdev, state));
+	frequency = get_state_freq(cpufreq_cdev, state);
+
+	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
+
+	if (ret > 0) {
+		cpus = cpufreq_cdev->policy->cpus;
+		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
+		capacity = frequency * max_capacity;
+		capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
+		arch_set_thermal_pressure(cpus, max_capacity - capacity);
+	}
+
+	return ret;
 }
 
 /* Bind cpufreq callbacks to thermal cooling device ops */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] sched/fair: Update cpu_capacity to reflect thermal pressure
  2020-02-22  0:52 ` [Patch v10 7/9] sched/fair: update cpu_capacity to reflect " Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     467b7d01c469dc6aa492c17d1f1d1952632728f1
Gitweb:        https://git.kernel.org/tip/467b7d01c469dc6aa492c17d1f1d1952632728f1
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:11 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:20 +01:00

sched/fair: Update cpu_capacity to reflect thermal pressure

cpu_capacity initially reflects the maximum possible capacity of a CPU.
Thermal pressure on a CPU means this maximum possible capacity is
unavailable due to thermal events. This patch subtracts the average
thermal pressure for a CPU from its maximum possible capacity so that
cpu_capacity reflects the remaining maximum capacity.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-8-thara.gopinath@linaro.org
---
 kernel/sched/fair.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 11f8488..aa51286 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7984,8 +7984,15 @@ static unsigned long scale_rt_capacity(struct sched_domain *sd, int cpu)
 	if (unlikely(irq >= max))
 		return 1;
 
+	/*
+	 * avg_rt.util_avg and avg_dl.util_avg track binary signals
+	 * (running and not running) with weights 0 and 1024 respectively.
+	 * avg_thermal.load_avg tracks thermal pressure and the weighted
+	 * average uses the actual delta max capacity(load).
+	 */
 	used = READ_ONCE(rq->avg_rt.util_avg);
 	used += READ_ONCE(rq->avg_dl.util_avg);
+	used += thermal_load_avg(rq);
 
 	if (unlikely(used >= max))
 		return 1;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] sched/fair: Enable tuning of decay period
  2020-02-22  0:52 ` [Patch v10 9/9] sched/fair: Enable tuning of decay period Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     05289b90c2e40ae80f5c70431cd0be4cc8a6038d
Gitweb:        https://git.kernel.org/tip/05289b90c2e40ae80f5c70431cd0be4cc8a6038d
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:13 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:21 +01:00

sched/fair: Enable tuning of decay period

Thermal pressure follows pelt signals which means the decay period for
thermal pressure is the default pelt decay period. Depending on SoC
characteristics and thermal activity, it might be beneficial to decay
thermal pressure slower, but still in-tune with the pelt signals.  One way
to achieve this is to provide a command line parameter to set a decay
shift parameter to an integer between 0 and 10.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-10-thara.gopinath@linaro.org
---
 Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++-
 kernel/sched/core.c                             |  2 +-
 kernel/sched/fair.c                             | 15 ++++++++++++-
 kernel/sched/sched.h                            | 18 ++++++++++++++++-
 4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d..dac8245 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4392,6 +4392,22 @@
 			incurs a small amount of overhead in the scheduler
 			but is useful for debugging and performance tuning.
 
+	sched_thermal_decay_shift=
+			[KNL, SMP] Set a decay shift for scheduler thermal
+			pressure signal. Thermal pressure signal follows the
+			default decay period of other scheduler pelt
+			signals(usually 32 ms but configurable). Setting
+			sched_thermal_decay_shift will left shift the decay
+			period for the thermal pressure signal by the shift
+			value.
+			i.e. with the default pelt decay period of 32 ms
+			sched_thermal_decay_shift   thermal pressure decay pr
+				1			64 ms
+				2			128 ms
+			and so on.
+			Format: integer between 0 and 10
+			Default is 0.
+
 	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
 			xtime_lock contention on larger systems, and/or RCU lock
 			contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3e620fe..4d76df3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3595,7 +3595,7 @@ void scheduler_tick(void)
 
 	update_rq_clock(rq);
 	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
-	update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
+	update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
 	curr->sched_class->task_tick(rq, curr, 0);
 	calc_global_load_tick(rq);
 	psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aa51286..79bb423 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -86,6 +86,19 @@ static unsigned int normalized_sysctl_sched_wakeup_granularity	= 1000000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
 
+int sched_thermal_decay_shift;
+static int __init setup_sched_thermal_decay_shift(char *str)
+{
+	int _shift = 0;
+
+	if (kstrtoint(str, 0, &_shift))
+		pr_warn("Unable to set scheduler thermal pressure decay shift parameter\n");
+
+	sched_thermal_decay_shift = clamp(_shift, 0, 10);
+	return 1;
+}
+__setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
+
 #ifdef CONFIG_SMP
 /*
  * For asym packing, by default the lower numbered CPU has higher priority.
@@ -7760,7 +7773,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 
 	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
 		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
-		  update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
+		  update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure) |
 		  update_irq_load_avg(rq, 0);
 
 	if (others_have_blocked(rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6c839f8..7f1a85b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1127,6 +1127,24 @@ static inline u64 rq_clock_task(struct rq *rq)
 	return rq->clock_task;
 }
 
+/**
+ * By default the decay is the default pelt decay period.
+ * The decay shift can change the decay period in
+ * multiples of 32.
+ *  Decay shift		Decay period(ms)
+ *	0			32
+ *	1			64
+ *	2			128
+ *	3			256
+ *	4			512
+ */
+extern int sched_thermal_decay_shift;
+
+static inline u64 rq_clock_thermal(struct rq *rq)
+{
+	return rq_clock_task(rq) >> sched_thermal_decay_shift;
+}
+
 static inline void rq_clock_skip_update(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] sched/fair: Enable periodic update of average thermal pressure
  2020-02-22  0:52 ` [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure Thara Gopinath
  2020-02-27  9:03   ` Amit Kucheria
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b4eccf5f8e1dcade112d97be86ad455a94501a0f
Gitweb:        https://git.kernel.org/tip/b4eccf5f8e1dcade112d97be86ad455a94501a0f
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:10 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:20 +01:00

sched/fair: Enable periodic update of average thermal pressure

Introduce support in scheduler periodic tick and other CFS bookkeeping
APIs to trigger the process of computing average thermal pressure for a
CPU. Also consider avg_thermal.load_avg in others_have_blocked which
allows for decay of pelt signals.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-7-thara.gopinath@linaro.org
---
 kernel/sched/core.c | 3 +++
 kernel/sched/fair.c | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8e6f380..3e620fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3586,6 +3586,7 @@ void scheduler_tick(void)
 	struct rq *rq = cpu_rq(cpu);
 	struct task_struct *curr = rq->curr;
 	struct rq_flags rf;
+	unsigned long thermal_pressure;
 
 	arch_scale_freq_tick();
 	sched_clock_tick();
@@ -3593,6 +3594,8 @@ void scheduler_tick(void)
 	rq_lock(rq, &rf);
 
 	update_rq_clock(rq);
+	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
+	update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
 	curr->sched_class->task_tick(rq, curr, 0);
 	calc_global_load_tick(rq);
 	psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b5d5e5..11f8488 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7719,6 +7719,9 @@ static inline bool others_have_blocked(struct rq *rq)
 	if (READ_ONCE(rq->avg_dl.util_avg))
 		return true;
 
+	if (thermal_load_avg(rq))
+		return true;
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 	if (READ_ONCE(rq->avg_irq.util_avg))
 		return true;
@@ -7744,6 +7747,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 {
 	const struct sched_class *curr_class;
 	u64 now = rq_clock_pelt(rq);
+	unsigned long thermal_pressure;
 	bool decayed;
 
 	/*
@@ -7752,8 +7756,11 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 	 */
 	curr_class = rq->curr->sched_class;
 
+	thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
+
 	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
 		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
+		  update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
 		  update_irq_load_avg(rq, 0);
 
 	if (others_have_blocked(rq))

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] arm/topology: Populate arch_scale_thermal_pressure() for ARM platforms
  2020-02-22  0:52 ` [Patch v10 5/9] arm/topology: Populate arch_scale_thermal_pressure for arm platforms Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     8eab879c5463d1a16a314790764c4c9d6c74c64c
Gitweb:        https://git.kernel.org/tip/8eab879c5463d1a16a314790764c4c9d6c74c64c
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:09 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:19 +01:00

arm/topology: Populate arch_scale_thermal_pressure() for ARM platforms

Hook up topology_get_thermal_pressure to arch_scale_thermal_pressure thus
enabling scheduler to retrieve instantaneous thermal pressure of a CPU.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-6-thara.gopinath@linaro.org
---
 arch/arm/include/asm/topology.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 8a0fae9..435aba2 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -16,6 +16,9 @@
 /* Enable topology flag updates */
 #define arch_update_cpu_topology topology_update_cpu_topology
 
+/* Replace task scheduler's default thermal pressure retrieve API */
+#define arch_scale_thermal_pressure topology_get_thermal_pressure
+
 #else
 
 static inline void init_cpu_topology(void) { }

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] sched/topology: Add callback to read per CPU thermal pressure
  2020-02-22  0:52 ` [Patch v10 2/9] sched/topology: Add hook to read per cpu " Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     36a0df85d2e85e1929e8cd607e19243e5a2754e7
Gitweb:        https://git.kernel.org/tip/36a0df85d2e85e1929e8cd607e19243e5a2754e7
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:06 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:17 +01:00

sched/topology: Add callback to read per CPU thermal pressure

Introduce the arch_scale_thermal_pressure() callback to retrieve per CPU thermal
pressure.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-3-thara.gopinath@linaro.org
---
 include/linux/sched/topology.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index f341163..af9319e 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -225,6 +225,14 @@ unsigned long arch_scale_cpu_capacity(int cpu)
 }
 #endif
 
+#ifndef arch_scale_thermal_pressure
+static __always_inline
+unsigned long arch_scale_thermal_pressure(int cpu)
+{
+	return 0;
+}
+#endif
+
 static inline int task_node(const struct task_struct *p)
 {
 	return cpu_to_node(task_cpu(p));

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] drivers/base/arch_topology: Add infrastructure to store and update instantaneous thermal pressure
  2020-02-22  0:52 ` [Patch v10 3/9] drivers/base/arch_topology: Add infrastructure to store and update instantaneous " Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     ad58cc5cc50ca8423cf630778594bd38252a0a58
Gitweb:        https://git.kernel.org/tip/ad58cc5cc50ca8423cf630778594bd38252a0a58
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:07 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:18 +01:00

drivers/base/arch_topology: Add infrastructure to store and update instantaneous thermal pressure

Add architecture specific APIs to update and track thermal pressure on a
per CPU basis. A per CPU variable thermal_pressure is introduced to keep
track of instantaneous per CPU thermal pressure. Thermal pressure is the
delta between maximum capacity and capped capacity due to a thermal event.

topology_get_thermal_pressure can be hooked into the scheduler specified
arch_scale_thermal_pressure to retrieve instantaneous thermal pressure of
a CPU.

arch_set_thermal_pressure can be used to update the thermal pressure.

Considering topology_get_thermal_pressure reads thermal_pressure and
arch_set_thermal_pressure writes into thermal_pressure, one can argue for
some sort of locking mechanism to avoid a stale value.  But considering
topology_get_thermal_pressure can be called from a system critical path
like scheduler tick function, a locking mechanism is not ideal. This means
that it is possible the thermal_pressure value used to calculate average
thermal pressure for a CPU can be stale for up to 1 tick period.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-4-thara.gopinath@linaro.org
---
 drivers/base/arch_topology.c  | 11 +++++++++++
 include/linux/arch_topology.h | 10 ++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 6119e11..68dfa49 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -42,6 +42,17 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 	per_cpu(cpu_scale, cpu) = capacity;
 }
 
+DEFINE_PER_CPU(unsigned long, thermal_pressure);
+
+void arch_set_thermal_pressure(struct cpumask *cpus,
+			       unsigned long th_pressure)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpus)
+		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+}
+
 static ssize_t cpu_capacity_show(struct device *dev,
 				 struct device_attribute *attr,
 				 char *buf)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 3015ecb..88a115e 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -33,6 +33,16 @@ unsigned long topology_get_freq_scale(int cpu)
 	return per_cpu(freq_scale, cpu);
 }
 
+DECLARE_PER_CPU(unsigned long, thermal_pressure);
+
+static inline unsigned long topology_get_thermal_pressure(int cpu)
+{
+	return per_cpu(thermal_pressure, cpu);
+}
+
+void arch_set_thermal_pressure(struct cpumask *cpus,
+			       unsigned long th_pressure);
+
 struct cpu_topology {
 	int thread_id;
 	int core_id;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] arm64/topology: Populate arch_scale_thermal_pressure() for arm64 platforms
  2020-02-22  0:52 ` [Patch v10 4/9] arm64/topology: Populate arch_scale_thermal_pressure for arm64 platforms Thara Gopinath
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thara Gopinath, Peter Zijlstra (Intel), Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     ae1677c0bbe23fe30d634ac0d9f5c147ee4adbc1
Gitweb:        https://git.kernel.org/tip/ae1677c0bbe23fe30d634ac0d9f5c147ee4adbc1
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:08 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:19 +01:00

arm64/topology: Populate arch_scale_thermal_pressure() for arm64 platforms

Hook up topology_get_thermal_pressure to arch_scale_thermal_pressure thus
enabling scheduler to retrieve instantaneous thermal pressure of a CPU.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-5-thara.gopinath@linaro.org
---
 arch/arm64/include/asm/topology.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index a4d945d..cbd70d7 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -25,6 +25,9 @@ int pcibus_to_node(struct pci_bus *bus);
 /* Enable topology flag updates */
 #define arch_update_cpu_topology topology_update_cpu_topology
 
+/* Replace task scheduler's default thermal pressure retrieve API */
+#define arch_scale_thermal_pressure topology_get_thermal_pressure
+
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_ARM_TOPOLOGY_H */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [tip: sched/core] sched/pelt: Add support to track thermal pressure
  2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
  2020-02-22  0:59   ` Randy Dunlap
@ 2020-03-06 14:42   ` tip-bot2 for Thara Gopinath
  1 sibling, 0 replies; 28+ messages in thread
From: tip-bot2 for Thara Gopinath @ 2020-03-06 14:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Thara Gopinath, Peter Zijlstra (Intel),
	Ingo Molnar, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     765047932f153265db6ef15be208d6cbfc03dc62
Gitweb:        https://git.kernel.org/tip/765047932f153265db6ef15be208d6cbfc03dc62
Author:        Thara Gopinath <thara.gopinath@linaro.org>
AuthorDate:    Fri, 21 Feb 2020 19:52:05 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Fri, 06 Mar 2020 12:57:17 +01:00

sched/pelt: Add support to track thermal pressure

Extrapolating on the existing framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressure. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a CPU running a RT/DL task through util_avg, the
average thermal pressure is tracked through load_avg. This is because
thermal pressure signal is weighted time "delta" capacity unlike util_avg
which is binary. "delta capacity" here means delta between the actual
capacity of a CPU and the decreased capacity a CPU due to a thermal event.

In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_load_avg can be called
to do the periodic bookkeeping (accumulate, decay and average) of the
thermal pressure.

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org
---
 include/trace/events/sched.h |  4 ++++
 init/Kconfig                 |  4 ++++
 kernel/sched/pelt.c          | 31 +++++++++++++++++++++++++++++++
 kernel/sched/pelt.h          | 31 +++++++++++++++++++++++++++++++
 kernel/sched/sched.h         |  3 +++
 5 files changed, 73 insertions(+)

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 9c3ebb7..ed168b0 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -618,6 +618,10 @@ DECLARE_TRACE(pelt_dl_tp,
 	TP_PROTO(struct rq *rq),
 	TP_ARGS(rq));
 
+DECLARE_TRACE(pelt_thermal_tp,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq));
+
 DECLARE_TRACE(pelt_irq_tp,
 	TP_PROTO(struct rq *rq),
 	TP_ARGS(rq));
diff --git a/init/Kconfig b/init/Kconfig
index 20a6ac3..275c848 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -451,6 +451,10 @@ config HAVE_SCHED_AVG_IRQ
 	depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
 	depends on SMP
 
+config SCHED_THERMAL_PRESSURE
+	bool "Enable periodic averaging of thermal pressure"
+	depends on SMP
+
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
 	depends on MULTIUSER
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index c40d57a..b647d04 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -368,6 +368,37 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 	return 0;
 }
 
+#ifdef CONFIG_SCHED_THERMAL_PRESSURE
+/*
+ * thermal:
+ *
+ *   load_sum = \Sum se->avg.load_sum but se->avg.load_sum is not tracked
+ *
+ *   util_avg and runnable_load_avg are not supported and meaningless.
+ *
+ * Unlike rt/dl utilization tracking that track time spent by a cpu
+ * running a rt/dl task through util_avg, the average thermal pressure is
+ * tracked through load_avg. This is because thermal pressure signal is
+ * time weighted "delta" capacity unlike util_avg which is binary.
+ * "delta capacity" =  actual capacity  -
+ *			capped capacity a cpu due to a thermal event.
+ */
+
+int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	if (___update_load_sum(now, &rq->avg_thermal,
+			       capacity,
+			       capacity,
+			       capacity)) {
+		___update_load_avg(&rq->avg_thermal, 1);
+		trace_pelt_thermal_tp(rq);
+		return 1;
+	}
+
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 /*
  * irq:
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index afff644..eb034d9 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -7,6 +7,26 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
 int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
 int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
 
+#ifdef CONFIG_SCHED_THERMAL_PRESSURE
+int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity);
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return READ_ONCE(rq->avg_thermal.load_avg);
+}
+#else
+static inline int
+update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	return 0;
+}
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 int update_irq_load_avg(struct rq *rq, u64 running);
 #else
@@ -159,6 +179,17 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 }
 
 static inline int
+update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+{
+	return 0;
+}
+
+static inline u64 thermal_load_avg(struct rq *rq)
+{
+	return 0;
+}
+
+static inline int
 update_irq_load_avg(struct rq *rq, u64 running)
 {
 	return 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2a0caf3..6c839f8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -961,6 +961,9 @@ struct rq {
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 	struct sched_avg	avg_irq;
 #endif
+#ifdef CONFIG_SCHED_THERMAL_PRESSURE
+	struct sched_avg	avg_thermal;
+#endif
 	u64			idle_stamp;
 	u64			avg_idle;
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [Patch v10 0/9] Introduce Thermal Pressure
       [not found] ` <CAP=VYLqWfqOZT6ec9cKyKOsOhu7HhVn2f_eU+ca006i4CV8R-w@mail.gmail.com>
@ 2020-04-16 13:40   ` Thara Gopinath
  0 siblings, 0 replies; 28+ messages in thread
From: Thara Gopinath @ 2020-04-16 13:40 UTC (permalink / raw)
  To: Paul Gortmaker
  Cc: Ingo Molnar, Peter Zijlstra, ionela.voinescu, vincent.guittot,
	dietmar.eggemann, rui.zhang, qperret, daniel.lezcano,
	viresh.kumar, rostedt, will, catalin.marinas, sudeep.holla,
	juri.lelli, Jonathan Corbet, LKML, amit.kachhap, javi.merino,
	amit.kucheria



On 4/14/20 11:57 AM, Paul Gortmaker wrote:
> On Fri, Feb 21, 2020 at 7:52 PM Thara Gopinath <thara.gopinath@linaro.org>
> wrote:
> 
>> Thermal governors can respond to an overheat event of a cpu by
>> capping the cpu's maximum possible frequency. This in turn
>> means that the maximum available compute capacity of the
>> cpu is restricted. But today in the kernel, task scheduler is
>> not notified of capping of maximum frequency of a cpu.
>> In other words, scheduler is unaware of maximum capacity
>> restrictions placed on a cpu due to thermal activity.
>> This patch series attempts to address this issue.
>>
> 
> I'm just seeing this now via -rc1 and "make oldconfig".
> 
> I'd suggest taking some of the above info and using it to
> create a Kconfig help text for the new option that was added.
Hi Paul,

I will send a patch adding some details to the Kconfig text.

-- 
Warm Regards
Thara

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-04-16 14:00 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-22  0:52 [Patch v10 0/9] Introduce Thermal Pressure Thara Gopinath
2020-02-22  0:52 ` [Patch v10 1/9] sched/pelt: Add support to track thermal pressure Thara Gopinath
2020-02-22  0:59   ` Randy Dunlap
2020-02-22 18:27     ` Thara Gopinath
2020-02-22 18:50       ` Randy Dunlap
2020-02-24 14:33         ` Thara Gopinath
2020-02-25 15:47           ` Peter Zijlstra
2020-02-25 16:43             ` Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 2/9] sched/topology: Add hook to read per cpu " Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] sched/topology: Add callback to read per CPU " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 3/9] drivers/base/arch_topology: Add infrastructure to store and update instantaneous " Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 4/9] arm64/topology: Populate arch_scale_thermal_pressure for arm64 platforms Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] arm64/topology: Populate arch_scale_thermal_pressure() " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 5/9] arm/topology: Populate arch_scale_thermal_pressure for arm platforms Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] arm/topology: Populate arch_scale_thermal_pressure() for ARM platforms tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 6/9] sched/fair: Enable periodic update of average thermal pressure Thara Gopinath
2020-02-27  9:03   ` Amit Kucheria
2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 7/9] sched/fair: update cpu_capacity to reflect " Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] sched/fair: Update " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 8/9] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
2020-02-22  0:52 ` [Patch v10 9/9] sched/fair: Enable tuning of decay period Thara Gopinath
2020-03-06 14:42   ` [tip: sched/core] " tip-bot2 for Thara Gopinath
2020-02-27  9:01 ` [Patch v10 0/9] Introduce Thermal Pressure Amit Kucheria
     [not found] ` <CAP=VYLqWfqOZT6ec9cKyKOsOhu7HhVn2f_eU+ca006i4CV8R-w@mail.gmail.com>
2020-04-16 13:40   ` Thara Gopinath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.