linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Refactor thermal pressure update to avoid code duplication
@ 2021-10-07  8:07 Lukasz Luba
  2021-10-07  8:07 ` [PATCH 1/5] arch_topology: Introduce thermal pressure update function Lukasz Luba
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

Hi all,

This patch set aims to refactor the thermal pressure update
code. There are already two clients which do similar thing:
convert the capped frequency value into the capacity of
affected CPU and call the 'set' function to store the
reduced capacity into the per-cpu variable.
There might be more than two of these users. In near future
it will be scmi-cpufreq driver, which receives notification
from FW about reduced frequency due to thermal. Other vendors
might follow. Let's avoid code duplication and potential
conversion bugs. Move the conversion code into the arch_topology.c
where the capacity calculation setup code and thermal pressure sit.

Apart from that $subject patches, there is one patch (3/5) which fixes
issue in qcom-cpufreq-hw.c when the thermal pressure is not
updated for offline CPUs. It's similar fix that has been merged
recently for cpufreq_cooling.c:
2ad8ccc17d1e4270cf65a3f2

Regards,
Lukasz Luba

Lukasz Luba (5):
  arch_topology: Introduce thermal pressure update function
  thermal: cpufreq_cooling: Use new thermal pressure update function
  cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
  cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
  arch_topology: Remove unused topology_set_thermal_pressure() and
    related

 arch/arm/include/asm/topology.h   |  2 +-
 arch/arm64/include/asm/topology.h |  2 +-
 drivers/base/arch_topology.c      | 35 +++++++++++++++++++++++++++----
 drivers/cpufreq/qcom-cpufreq-hw.c | 14 +++++--------
 drivers/thermal/cpufreq_cooling.c |  6 +-----
 include/linux/arch_topology.h     |  4 ++--
 include/linux/sched/topology.h    |  6 +++---
 init/Kconfig                      |  2 +-
 8 files changed, 45 insertions(+), 26 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] arch_topology: Introduce thermal pressure update function
  2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
@ 2021-10-07  8:07 ` Lukasz Luba
  2021-10-11 21:15   ` Thara Gopinath
  2021-10-07  8:07 ` [PATCH 2/5] thermal: cpufreq_cooling: Use new " Lukasz Luba
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

The thermal pressure is a mechanism which is used for providing
information about reduced CPU performance to the scheduler. Usually code
has to convert the value from frequency units into capacity units,
which are understandable by the scheduler. Create a common conversion code
which can be just used via a handy API.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 arch/arm/include/asm/topology.h   |  1 +
 arch/arm64/include/asm/topology.h |  1 +
 drivers/base/arch_topology.c      | 36 ++++++++++++++++++++++++++++++-
 include/linux/arch_topology.h     |  3 +++
 include/linux/sched/topology.h    |  7 ++++++
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index 470299ee2fba..aee6c456c085 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -24,6 +24,7 @@
 /* Replace task scheduler's default thermal pressure API */
 #define arch_scale_thermal_pressure topology_get_thermal_pressure
 #define arch_set_thermal_pressure   topology_set_thermal_pressure
+#define arch_thermal_pressure_update	topology_thermal_pressure_update
 
 #else
 
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index ec2db3419c41..c997015402bc 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -33,6 +33,7 @@ void update_freq_counters_refs(void);
 /* Replace task scheduler's default thermal pressure API */
 #define arch_scale_thermal_pressure topology_get_thermal_pressure
 #define arch_set_thermal_pressure   topology_set_thermal_pressure
+#define arch_thermal_pressure_update	topology_thermal_pressure_update
 
 #include <asm-generic/topology.h>
 
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 43407665918f..ad31513d0104 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -25,6 +25,7 @@
 static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
 static struct cpumask scale_freq_counters_mask;
 static bool scale_freq_invariant;
+static DEFINE_PER_CPU(u32, freq_factor) = 1;
 
 static bool supports_scale_freq_counters(const struct cpumask *cpus)
 {
@@ -168,6 +169,40 @@ void topology_set_thermal_pressure(const struct cpumask *cpus,
 }
 EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
 
+/**
+ * topology_thermal_pressure_update() - Update thermal pressure for CPUs
+ * @cpus	: The related CPUs which capacity has been reduced
+ * @capped_freq	: The maximum allowed frequency that CPUs can run at
+ *
+ * Update the value of thermal pressure for all @cpus in the mask. The
+ * cpumask should include all (online+offline) affected CPUs, to avoid
+ * operating on stale data when hot-plug is used for some CPUs. The
+ * @capped_freq must be less or equal to the max possible frequency and
+ * reflects the currently allowed max CPUs frequency due to thermal capping.
+ * The @capped_freq must be provided in kHz.
+ */
+void topology_thermal_pressure_update(const struct cpumask *cpus,
+				      unsigned long capped_freq)
+{
+	unsigned long max_capacity, capacity;
+	int cpu;
+
+	if (!cpus)
+		return;
+
+	cpu = cpumask_first(cpus);
+	max_capacity = arch_scale_cpu_capacity(cpu);
+
+	/* Convert to MHz scale which is used in 'freq_factor' */
+	capped_freq /= 1000;
+
+	capacity = capped_freq * max_capacity;
+	capacity /= per_cpu(freq_factor, cpu);
+
+	arch_set_thermal_pressure(cpus, max_capacity - capacity);
+}
+EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
+
 static ssize_t cpu_capacity_show(struct device *dev,
 				 struct device_attribute *attr,
 				 char *buf)
@@ -220,7 +255,6 @@ static void update_topology_flags_workfn(struct work_struct *work)
 	update_topology = 0;
 }
 
-static DEFINE_PER_CPU(u32, freq_factor) = 1;
 static u32 *raw_capacity;
 
 static int free_raw_capacity(void)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index f180240dc95f..9e183621a59b 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -59,6 +59,9 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
 void topology_set_thermal_pressure(const struct cpumask *cpus,
 				   unsigned long th_pressure);
 
+void topology_thermal_pressure_update(const struct cpumask *cpus,
+				      unsigned long capped_freq);
+
 struct cpu_topology {
 	int thread_id;
 	int core_id;
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 8f0f778b7c91..990d14814427 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -266,6 +266,13 @@ void arch_set_thermal_pressure(const struct cpumask *cpus,
 { }
 #endif
 
+#ifndef arch_thermal_pressure_update
+static __always_inline
+void arch_thermal_pressure_update(const struct cpumask *cpus,
+				      unsigned long capped_frequency)
+{ }
+#endif
+
 static inline int task_node(const struct task_struct *p)
 {
 	return cpu_to_node(task_cpu(p));
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] thermal: cpufreq_cooling: Use new thermal pressure update function
  2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
  2021-10-07  8:07 ` [PATCH 1/5] arch_topology: Introduce thermal pressure update function Lukasz Luba
@ 2021-10-07  8:07 ` Lukasz Luba
  2021-10-07  8:07 ` [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure Lukasz Luba
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new function and remove old conversion code.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/thermal/cpufreq_cooling.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index 43b1ae8a7789..835c091ce818 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -462,7 +462,6 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
 	struct cpumask *cpus;
 	unsigned int frequency;
-	unsigned long max_capacity, capacity;
 	int ret;
 
 	/* Request state should be less than max_level */
@@ -479,10 +478,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 	if (ret >= 0) {
 		cpufreq_cdev->cpufreq_state = state;
 		cpus = cpufreq_cdev->policy->related_cpus;
-		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
-		capacity = frequency * max_capacity;
-		capacity /= cpufreq_cdev->policy->cpuinfo.max_freq;
-		arch_set_thermal_pressure(cpus, max_capacity - capacity);
+		arch_thermal_pressure_update(cpus, frequency);
 		ret = 0;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
  2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
  2021-10-07  8:07 ` [PATCH 1/5] arch_topology: Introduce thermal pressure update function Lukasz Luba
  2021-10-07  8:07 ` [PATCH 2/5] thermal: cpufreq_cooling: Use new " Lukasz Luba
@ 2021-10-07  8:07 ` Lukasz Luba
  2021-10-11 21:17   ` Thara Gopinath
  2021-10-07  8:07 ` [PATCH 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function Lukasz Luba
  2021-10-07  8:07 ` [PATCH 5/5] arch_topology: Remove unused topology_set_thermal_pressure() and related Lukasz Luba
  4 siblings, 1 reply; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data).  This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.

Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.

Fixes: 275157b367f479 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index a2be0df7e174..0138b2ec406d 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -304,7 +304,8 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 	if (capacity > max_capacity)
 		capacity = max_capacity;
 
-	arch_set_thermal_pressure(policy->cpus, max_capacity - capacity);
+	arch_set_thermal_pressure(policy->related_cpus,
+				  max_capacity - capacity);
 
 	/*
 	 * In the unlikely case policy is unregistered do not enable
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
  2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
                   ` (2 preceding siblings ...)
  2021-10-07  8:07 ` [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure Lukasz Luba
@ 2021-10-07  8:07 ` Lukasz Luba
  2021-10-07  8:07 ` [PATCH 5/5] arch_topology: Remove unused topology_set_thermal_pressure() and related Lukasz Luba
  4 siblings, 0 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 0138b2ec406d..bf7871c2a4c9 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -275,10 +275,10 @@ static unsigned int qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
 
 static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 {
-	unsigned long max_capacity, capacity, freq_hz, throttled_freq;
 	struct cpufreq_policy *policy = data->policy;
 	int cpu = cpumask_first(policy->cpus);
 	struct device *dev = get_cpu_device(cpu);
+	unsigned long freq_hz, throttled_freq;
 	struct dev_pm_opp *opp;
 	unsigned int freq;
 
@@ -295,17 +295,12 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 
 	throttled_freq = freq_hz / HZ_PER_KHZ;
 
-	/* Update thermal pressure */
-
-	max_capacity = arch_scale_cpu_capacity(cpu);
-	capacity = mult_frac(max_capacity, throttled_freq, policy->cpuinfo.max_freq);
-
 	/* Don't pass boost capacity to scheduler */
-	if (capacity > max_capacity)
-		capacity = max_capacity;
+	if (throttled_freq > policy->cpuinfo.max_freq)
+		throttled_freq = policy->cpuinfo.max_freq;
 
-	arch_set_thermal_pressure(policy->related_cpus,
-				  max_capacity - capacity);
+	/* Update thermal pressure */
+	arch_thermal_pressure_update(policy->related_cpus, throttled_freq);
 
 	/*
 	 * In the unlikely case policy is unregistered do not enable
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] arch_topology: Remove unused topology_set_thermal_pressure() and related
  2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
                   ` (3 preceding siblings ...)
  2021-10-07  8:07 ` [PATCH 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function Lukasz Luba
@ 2021-10-07  8:07 ` Lukasz Luba
  4 siblings, 0 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-07  8:07 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, lukasz.luba, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, thara.gopinath, bjorn.andersson,
	agross

There is no need of this function (and related) since code has been
converted to use the new arch_thermal_pressure_update() API. The old
code can be removed.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 arch/arm/include/asm/topology.h   |  1 -
 arch/arm64/include/asm/topology.h |  1 -
 drivers/base/arch_topology.c      | 17 +++++------------
 include/linux/arch_topology.h     |  3 ---
 include/linux/sched/topology.h    |  7 -------
 init/Kconfig                      |  2 +-
 6 files changed, 6 insertions(+), 25 deletions(-)

diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index aee6c456c085..5e51fdcfcbd4 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -23,7 +23,6 @@
 
 /* Replace task scheduler's default thermal pressure API */
 #define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure   topology_set_thermal_pressure
 #define arch_thermal_pressure_update	topology_thermal_pressure_update
 
 #else
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index c997015402bc..92cd1288906f 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -32,7 +32,6 @@ void update_freq_counters_refs(void);
 
 /* Replace task scheduler's default thermal pressure API */
 #define arch_scale_thermal_pressure topology_get_thermal_pressure
-#define arch_set_thermal_pressure   topology_set_thermal_pressure
 #define arch_thermal_pressure_update	topology_thermal_pressure_update
 
 #include <asm-generic/topology.h>
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index ad31513d0104..84f093146ea5 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -159,16 +159,6 @@ void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 
 DEFINE_PER_CPU(unsigned long, thermal_pressure);
 
-void topology_set_thermal_pressure(const struct cpumask *cpus,
-			       unsigned long th_pressure)
-{
-	int cpu;
-
-	for_each_cpu(cpu, cpus)
-		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
-}
-EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
-
 /**
  * topology_thermal_pressure_update() - Update thermal pressure for CPUs
  * @cpus	: The related CPUs which capacity has been reduced
@@ -184,7 +174,7 @@ EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
 void topology_thermal_pressure_update(const struct cpumask *cpus,
 				      unsigned long capped_freq)
 {
-	unsigned long max_capacity, capacity;
+	unsigned long max_capacity, capacity, th_pressure;
 	int cpu;
 
 	if (!cpus)
@@ -199,7 +189,10 @@ void topology_thermal_pressure_update(const struct cpumask *cpus,
 	capacity = capped_freq * max_capacity;
 	capacity /= per_cpu(freq_factor, cpu);
 
-	arch_set_thermal_pressure(cpus, max_capacity - capacity);
+	th_pressure = max_capacity - capacity;
+
+	for_each_cpu(cpu, cpus)
+		WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
 }
 EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
 
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 9e183621a59b..9b95e5b29ee9 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -56,9 +56,6 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
 	return per_cpu(thermal_pressure, cpu);
 }
 
-void topology_set_thermal_pressure(const struct cpumask *cpus,
-				   unsigned long th_pressure);
-
 void topology_thermal_pressure_update(const struct cpumask *cpus,
 				      unsigned long capped_freq);
 
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 990d14814427..f31da5454baa 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -259,13 +259,6 @@ unsigned long arch_scale_thermal_pressure(int cpu)
 }
 #endif
 
-#ifndef arch_set_thermal_pressure
-static __always_inline
-void arch_set_thermal_pressure(const struct cpumask *cpus,
-			       unsigned long th_pressure)
-{ }
-#endif
-
 #ifndef arch_thermal_pressure_update
 static __always_inline
 void arch_thermal_pressure_update(const struct cpumask *cpus,
diff --git a/init/Kconfig b/init/Kconfig
index f494e405c156..334c302e588f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -559,7 +559,7 @@ config SCHED_THERMAL_PRESSURE
 	  i.e. put less load on throttled CPUs than on non/less throttled ones.
 
 	  This requires the architecture to implement
-	  arch_set_thermal_pressure() and arch_scale_thermal_pressure().
+	  arch_thermal_pressure_update() and arch_scale_thermal_pressure().
 
 config BSD_PROCESS_ACCT
 	bool "BSD Process Accounting"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] arch_topology: Introduce thermal pressure update function
  2021-10-07  8:07 ` [PATCH 1/5] arch_topology: Introduce thermal pressure update function Lukasz Luba
@ 2021-10-11 21:15   ` Thara Gopinath
  2021-10-15 12:13     ` Lukasz Luba
  0 siblings, 1 reply; 10+ messages in thread
From: Thara Gopinath @ 2021-10-11 21:15 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, bjorn.andersson, agross



On 10/7/21 4:07 AM, Lukasz Luba wrote:
> The thermal pressure is a mechanism which is used for providing
> information about reduced CPU performance to the scheduler. Usually code
> has to convert the value from frequency units into capacity units,
> which are understandable by the scheduler. Create a common conversion code
> which can be just used via a handy API.
> 
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> ---
>   arch/arm/include/asm/topology.h   |  1 +
>   arch/arm64/include/asm/topology.h |  1 +
>   drivers/base/arch_topology.c      | 36 ++++++++++++++++++++++++++++++-
>   include/linux/arch_topology.h     |  3 +++
>   include/linux/sched/topology.h    |  7 ++++++
>   5 files changed, 47 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
> index 470299ee2fba..aee6c456c085 100644
> --- a/arch/arm/include/asm/topology.h
> +++ b/arch/arm/include/asm/topology.h
> @@ -24,6 +24,7 @@
>   /* Replace task scheduler's default thermal pressure API */
>   #define arch_scale_thermal_pressure topology_get_thermal_pressure
>   #define arch_set_thermal_pressure   topology_set_thermal_pressure
> +#define arch_thermal_pressure_update	topology_thermal_pressure_update
>   
>   #else
>   
> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
> index ec2db3419c41..c997015402bc 100644
> --- a/arch/arm64/include/asm/topology.h
> +++ b/arch/arm64/include/asm/topology.h
> @@ -33,6 +33,7 @@ void update_freq_counters_refs(void);
>   /* Replace task scheduler's default thermal pressure API */
>   #define arch_scale_thermal_pressure topology_get_thermal_pressure
>   #define arch_set_thermal_pressure   topology_set_thermal_pressure
> +#define arch_thermal_pressure_update	topology_thermal_pressure_update
>   
>   #include <asm-generic/topology.h>
>   
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 43407665918f..ad31513d0104 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -25,6 +25,7 @@
>   static DEFINE_PER_CPU(struct scale_freq_data __rcu *, sft_data);
>   static struct cpumask scale_freq_counters_mask;
>   static bool scale_freq_invariant;
> +static DEFINE_PER_CPU(u32, freq_factor) = 1;
>   
>   static bool supports_scale_freq_counters(const struct cpumask *cpus)
>   {
> @@ -168,6 +169,40 @@ void topology_set_thermal_pressure(const struct cpumask *cpus,
>   }
>   EXPORT_SYMBOL_GPL(topology_set_thermal_pressure);
>   
> +/**
> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
> + * @cpus	: The related CPUs which capacity has been reduced

The related CPUs "for" which

> + * @capped_freq	: The maximum allowed frequency that CPUs can run at
> + *
> + * Update the value of thermal pressure for all @cpus in the mask. The
> + * cpumask should include all (online+offline) affected CPUs, to avoid
> + * operating on stale data when hot-plug is used for some CPUs. The
> + * @capped_freq must be less or equal to the max possible frequency and
> + * reflects the currently allowed max CPUs frequency due to thermal capping.
> + * The @capped_freq must be provided in kHz.
> + */
> +void topology_thermal_pressure_update(const struct cpumask *cpus,
> +				      unsigned long capped_freq)
> +{
> +	unsigned long max_capacity, capacity;
> +	int cpu;
> +
> +	if (!cpus)
> +		return;
> +
> +	cpu = cpumask_first(cpus);
> +	max_capacity = arch_scale_cpu_capacity(cpu);
> +
> +	/* Convert to MHz scale which is used in 'freq_factor' */
> +	capped_freq /= 1000;
> +
> +	capacity = capped_freq * max_capacity;
> +	capacity /= per_cpu(freq_factor, cpu);

use mult_frac as used in other implementations ?

-- 
Warm Regards
Thara (She/Her/Hers)

> +
> +	arch_set_thermal_pressure(cpus, max_capacity - capacity);
> +}
> +EXPORT_SYMBOL_GPL(topology_thermal_pressure_update);
> +
>   static ssize_t cpu_capacity_show(struct device *dev,
>   				 struct device_attribute *attr,
>   				 char *buf)
> @@ -220,7 +255,6 @@ static void update_topology_flags_workfn(struct work_struct *work)
>   	update_topology = 0;
>   }
>   
> -static DEFINE_PER_CPU(u32, freq_factor) = 1;
>   static u32 *raw_capacity;
>   
>   static int free_raw_capacity(void)
> diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
> index f180240dc95f..9e183621a59b 100644
> --- a/include/linux/arch_topology.h
> +++ b/include/linux/arch_topology.h
> @@ -59,6 +59,9 @@ static inline unsigned long topology_get_thermal_pressure(int cpu)
>   void topology_set_thermal_pressure(const struct cpumask *cpus,
>   				   unsigned long th_pressure);
>   
> +void topology_thermal_pressure_update(const struct cpumask *cpus,
> +				      unsigned long capped_freq);
> +
>   struct cpu_topology {
>   	int thread_id;
>   	int core_id;
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 8f0f778b7c91..990d14814427 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -266,6 +266,13 @@ void arch_set_thermal_pressure(const struct cpumask *cpus,
>   { }
>   #endif
>   
> +#ifndef arch_thermal_pressure_update
> +static __always_inline
> +void arch_thermal_pressure_update(const struct cpumask *cpus,
> +				      unsigned long capped_frequency)
> +{ }
> +#endif
> +
>   static inline int task_node(const struct task_struct *p)
>   {
>   	return cpu_to_node(task_cpu(p));
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
  2021-10-07  8:07 ` [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure Lukasz Luba
@ 2021-10-11 21:17   ` Thara Gopinath
  2021-10-15 12:13     ` Lukasz Luba
  0 siblings, 1 reply; 10+ messages in thread
From: Thara Gopinath @ 2021-10-11 21:17 UTC (permalink / raw)
  To: Lukasz Luba, linux-kernel, linux-pm
  Cc: linux-arm-kernel, linux-arm-msm, sudeep.holla, will,
	catalin.marinas, linux, gregkh, rafael, viresh.kumar, amitk,
	daniel.lezcano, amit.kachhap, bjorn.andersson, agross



On 10/7/21 4:07 AM, Lukasz Luba wrote:
> The thermal pressure signal gives information to the scheduler about
> reduced CPU capacity due to thermal. It is based on a value stored in
> a per-cpu 'thermal_pressure' variable. The online CPUs will get the
> new value there, while the offline won't. Unfortunately, when the CPU
> is back online, the value read from per-cpu variable might be wrong
> (stale data).  This might affect the scheduler decisions, since it
> sees the CPU capacity differently than what is actually available.
> 
> Fix it by making sure that all online+offline CPUs would get the
> proper value in their per-cpu variable when there is throttling
> or throttling is removed.
> 
> Fixes: 275157b367f479 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

Thanks for the fix.

Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>

> ---
>   drivers/cpufreq/qcom-cpufreq-hw.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index a2be0df7e174..0138b2ec406d 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -304,7 +304,8 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>   	if (capacity > max_capacity)
>   		capacity = max_capacity;
>   
> -	arch_set_thermal_pressure(policy->cpus, max_capacity - capacity);
> +	arch_set_thermal_pressure(policy->related_cpus,
> +				  max_capacity - capacity);
>   
>   	/*
>   	 * In the unlikely case policy is unregistered do not enable
> 

-- 
Warm Regards
Thara (She/Her/Hers)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] arch_topology: Introduce thermal pressure update function
  2021-10-11 21:15   ` Thara Gopinath
@ 2021-10-15 12:13     ` Lukasz Luba
  0 siblings, 0 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-15 12:13 UTC (permalink / raw)
  To: Thara Gopinath
  Cc: linux-kernel, linux-pm, linux-arm-kernel, linux-arm-msm,
	sudeep.holla, will, catalin.marinas, linux, gregkh, rafael,
	viresh.kumar, amitk, daniel.lezcano, amit.kachhap,
	bjorn.andersson, agross

Hi Thara,

Thank you having a look at these patches.
My apologies for the delay, I haven't felt well
recently.

On 10/11/21 10:15 PM, Thara Gopinath wrote:
> 
> 
> On 10/7/21 4:07 AM, Lukasz Luba wrote:

[snip]

>> +/**
>> + * topology_thermal_pressure_update() - Update thermal pressure for CPUs
>> + * @cpus    : The related CPUs which capacity has been reduced
> 
> The related CPUs "for" which

thanks

> 
>> + * @capped_freq    : The maximum allowed frequency that CPUs can run at

[snip]

>> +
>> +    capacity = capped_freq * max_capacity;
>> +    capacity /= per_cpu(freq_factor, cpu);
> 
> use mult_frac as used in other implementations ?
> 

Good point. Let me check that. In that arch_topology file we never
use such trick, but in this case it makes sense IMO.

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
  2021-10-11 21:17   ` Thara Gopinath
@ 2021-10-15 12:13     ` Lukasz Luba
  0 siblings, 0 replies; 10+ messages in thread
From: Lukasz Luba @ 2021-10-15 12:13 UTC (permalink / raw)
  To: Thara Gopinath
  Cc: linux-kernel, linux-pm, linux-arm-kernel, linux-arm-msm,
	sudeep.holla, will, catalin.marinas, linux, gregkh, rafael,
	viresh.kumar, amitk, daniel.lezcano, amit.kachhap,
	bjorn.andersson, agross



On 10/11/21 10:17 PM, Thara Gopinath wrote:
> 
> 
> On 10/7/21 4:07 AM, Lukasz Luba wrote:
>> The thermal pressure signal gives information to the scheduler about
>> reduced CPU capacity due to thermal. It is based on a value stored in
>> a per-cpu 'thermal_pressure' variable. The online CPUs will get the
>> new value there, while the offline won't. Unfortunately, when the CPU
>> is back online, the value read from per-cpu variable might be wrong
>> (stale data).  This might affect the scheduler decisions, since it
>> sees the CPU capacity differently than what is actually available.
>>
>> Fix it by making sure that all online+offline CPUs would get the
>> proper value in their per-cpu variable when there is throttling
>> or throttling is removed.
>>
>> Fixes: 275157b367f479 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt 
>> support")
>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
> 
> Thanks for the fix.
> 
> Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
> 

Thank you for the review!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-15 12:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-07  8:07 [PATCH 0/5] Refactor thermal pressure update to avoid code duplication Lukasz Luba
2021-10-07  8:07 ` [PATCH 1/5] arch_topology: Introduce thermal pressure update function Lukasz Luba
2021-10-11 21:15   ` Thara Gopinath
2021-10-15 12:13     ` Lukasz Luba
2021-10-07  8:07 ` [PATCH 2/5] thermal: cpufreq_cooling: Use new " Lukasz Luba
2021-10-07  8:07 ` [PATCH 3/5] cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure Lukasz Luba
2021-10-11 21:17   ` Thara Gopinath
2021-10-15 12:13     ` Lukasz Luba
2021-10-07  8:07 ` [PATCH 4/5] cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function Lukasz Luba
2021-10-07  8:07 ` [PATCH 5/5] arch_topology: Remove unused topology_set_thermal_pressure() and related Lukasz Luba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).