[PATCH V3 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler
@ 2020-11-19  7:38 Viresh Kumar
  2020-11-19  7:38 ` [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c Viresh Kumar
  2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
  0 siblings, 2 replies; 13+ messages in thread
From: Viresh Kumar @ 2020-11-19  7:38 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Amit Kucheria, Ben Segall,
	Daniel Bristot de Oliveira, Daniel Lezcano, Dietmar Eggemann,
	Javi Merino, Juri Lelli, Mel Gorman, Rafael J. Wysocki,
	Steven Rostedt, Viresh Kumar, Zhang Rui
  Cc: linux-kernel, Quentin Perret, Lukasz Luba, linux-pm

Hi,

This patchset makes the cpufreq_cooling driver reuse the CPU utilization
metric provided by the scheduler instead of depending on idle and busy
times of a CPU, which aren't that accurate to measure the busyness of a
CPU for the next cycle. More details can be seen in the commit log of
patch 2/2.

V2->V3:
- Put the scheduler helpers within ifdef CONFIG_SMP.
- Keep both SMP and !SMP implementations in the cpufreq_cooling driver.
- Improved commit log with testing related information.

--
Viresh

Viresh Kumar (2):
  sched/core: Rename and move schedutil_cpu_util() to core.c
  thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms

 drivers/thermal/cpufreq_cooling.c |  68 ++++++++++++++----
 include/linux/sched.h             |  21 ++++++
 kernel/sched/core.c               | 115 +++++++++++++++++++++++++++++
 kernel/sched/cpufreq_schedutil.c  | 116 +-----------------------------
 kernel/sched/fair.c               |   6 +-
 kernel/sched/sched.h              |  31 +-------
 6 files changed, 199 insertions(+), 158 deletions(-)


base-commit: 3650b228f83adda7e5ee532e2b90429c03f7b9ec
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c
  2020-11-19  7:38 [PATCH V3 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler Viresh Kumar
@ 2020-11-19  7:38 ` Viresh Kumar
  2020-11-19 12:30   ` Rafael J. Wysocki
  2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
  1 sibling, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2020-11-19  7:38 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Rafael J. Wysocki, Viresh Kumar
  Cc: linux-kernel, Quentin Perret, Lukasz Luba, linux-pm

There is nothing schedutil specific in schedutil_cpu_util(), move it to
core.c and rename it to sched_cpu_util(), so it can be used from other
parts of the kernel as well.

The cpufreq_cooling stuff will make use of this in a later commit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched.h            |  21 ++++++
 kernel/sched/core.c              | 115 ++++++++++++++++++++++++++++++
 kernel/sched/cpufreq_schedutil.c | 116 +------------------------------
 kernel/sched/fair.c              |   6 +-
 kernel/sched/sched.h             |  31 +--------
 5 files changed, 145 insertions(+), 144 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 063cd120b459..926b944dae5e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1926,6 +1926,27 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 #define TASK_SIZE_OF(tsk)	TASK_SIZE
 #endif
 
+#ifdef CONFIG_SMP
+/**
+ * enum cpu_util_type - CPU utilization type
+ * @FREQUENCY_UTIL:	Utilization used to select frequency
+ * @ENERGY_UTIL:	Utilization used during energy calculation
+ *
+ * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
+ * need to be aggregated differently depending on the usage made of them. This
+ * enum is used within sched_cpu_util() to differentiate the types of
+ * utilization expected by the callers, and adjust the aggregation accordingly.
+ */
+enum cpu_util_type {
+	FREQUENCY_UTIL,
+	ENERGY_UTIL,
+};
+
+/* Returns effective CPU utilization, as seen by the scheduler */
+unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
+			     unsigned long max);
+#endif /* CONFIG_SMP */
+
 #ifdef CONFIG_RSEQ
 
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d2003a7d5ab5..845c976ccd53 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5117,6 +5117,121 @@ struct task_struct *idle_task(int cpu)
 	return cpu_rq(cpu)->idle;
 }
 
+#ifdef CONFIG_SMP
+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ *   cpu_util_{cfs,rt,dl,irq}()
+ *   cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued
+ * in the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured metric but a value computed
+ * based on the task model parameters and gives the minimal utilization
+ * required to meet deadlines.
+ */
+unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
+				 unsigned long max, enum cpu_util_type type,
+				 struct task_struct *p)
+{
+	unsigned long dl_util, util, irq;
+	struct rq *rq = cpu_rq(cpu);
+
+	if (!uclamp_is_used() &&
+	    type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
+		return max;
+	}
+
+	/*
+	 * Early check to see if IRQ/steal time saturates the CPU, can be
+	 * because of inaccuracies in how we track these -- see
+	 * update_irq_load_avg().
+	 */
+	irq = cpu_util_irq(rq);
+	if (unlikely(irq >= max))
+		return max;
+
+	/*
+	 * Because the time spend on RT/DL tasks is visible as 'lost' time to
+	 * CFS tasks and we use the same metric to track the effective
+	 * utilization (PELT windows are synchronized) we can directly add them
+	 * to obtain the CPU's actual utilization.
+	 *
+	 * CFS and RT utilization can be boosted or capped, depending on
+	 * utilization clamp constraints requested by currently RUNNABLE
+	 * tasks.
+	 * When there are no CFS RUNNABLE tasks, clamps are released and
+	 * frequency will be gracefully reduced with the utilization decay.
+	 */
+	util = util_cfs + cpu_util_rt(rq);
+	if (type == FREQUENCY_UTIL)
+		util = uclamp_rq_util_with(rq, util, p);
+
+	dl_util = cpu_util_dl(rq);
+
+	/*
+	 * For frequency selection we do not make cpu_util_dl() a permanent part
+	 * of this sum because we want to use cpu_bw_dl() later on, but we need
+	 * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
+	 * that we select f_max when there is no idle time.
+	 *
+	 * NOTE: numerical errors or stop class might cause us to not quite hit
+	 * saturation when we should -- something for later.
+	 */
+	if (util + dl_util >= max)
+		return max;
+
+	/*
+	 * OTOH, for energy computation we need the estimated running time, so
+	 * include util_dl and ignore dl_bw.
+	 */
+	if (type == ENERGY_UTIL)
+		util += dl_util;
+
+	/*
+	 * There is still idle time; further improve the number by using the
+	 * irq metric. Because IRQ/steal time is hidden from the task clock we
+	 * need to scale the task numbers:
+	 *
+	 *              max - irq
+	 *   U' = irq + --------- * U
+	 *                 max
+	 */
+	util = scale_irq_capacity(util, irq, max);
+	util += irq;
+
+	/*
+	 * Bandwidth required by DEADLINE must always be granted while, for
+	 * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+	 * to gracefully reduce the frequency when no tasks show up for longer
+	 * periods of time.
+	 *
+	 * Ideally we would like to set bw_dl as min/guaranteed freq and util +
+	 * bw_dl as requested freq. However, cpufreq is not yet ready for such
+	 * an interface. So, we only do the latter for now.
+	 */
+	if (type == FREQUENCY_UTIL)
+		util += cpu_bw_dl(rq);
+
+	return min(max, util);
+}
+
+unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
+			     unsigned long max)
+{
+	return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type,
+				  NULL);
+}
+#endif /* CONFIG_SMP */
+
 /**
  * find_process_by_pid - find a process with a matching PID value.
  * @pid: the pid in question.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e254745a82cb..a6de75c8b984 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -169,122 +169,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-/*
- * This function computes an effective utilization for the given CPU, to be
- * used for frequency selection given the linear relation: f = u * f_max.
- *
- * The scheduler tracks the following metrics:
- *
- *   cpu_util_{cfs,rt,dl,irq}()
- *   cpu_bw_dl()
- *
- * Where the cfs,rt and dl util numbers are tracked with the same metric and
- * synchronized windows and are thus directly comparable.
- *
- * The cfs,rt,dl utilization are the running times measured with rq->clock_task
- * which excludes things like IRQ and steal-time. These latter are then accrued
- * in the irq utilization.
- *
- * The DL bandwidth number otoh is not a measured metric but a value computed
- * based on the task model parameters and gives the minimal utilization
- * required to meet deadlines.
- */
-unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
-				 unsigned long max, enum schedutil_type type,
-				 struct task_struct *p)
-{
-	unsigned long dl_util, util, irq;
-	struct rq *rq = cpu_rq(cpu);
-
-	if (!uclamp_is_used() &&
-	    type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
-		return max;
-	}
-
-	/*
-	 * Early check to see if IRQ/steal time saturates the CPU, can be
-	 * because of inaccuracies in how we track these -- see
-	 * update_irq_load_avg().
-	 */
-	irq = cpu_util_irq(rq);
-	if (unlikely(irq >= max))
-		return max;
-
-	/*
-	 * Because the time spend on RT/DL tasks is visible as 'lost' time to
-	 * CFS tasks and we use the same metric to track the effective
-	 * utilization (PELT windows are synchronized) we can directly add them
-	 * to obtain the CPU's actual utilization.
-	 *
-	 * CFS and RT utilization can be boosted or capped, depending on
-	 * utilization clamp constraints requested by currently RUNNABLE
-	 * tasks.
-	 * When there are no CFS RUNNABLE tasks, clamps are released and
-	 * frequency will be gracefully reduced with the utilization decay.
-	 */
-	util = util_cfs + cpu_util_rt(rq);
-	if (type == FREQUENCY_UTIL)
-		util = uclamp_rq_util_with(rq, util, p);
-
-	dl_util = cpu_util_dl(rq);
-
-	/*
-	 * For frequency selection we do not make cpu_util_dl() a permanent part
-	 * of this sum because we want to use cpu_bw_dl() later on, but we need
-	 * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
-	 * that we select f_max when there is no idle time.
-	 *
-	 * NOTE: numerical errors or stop class might cause us to not quite hit
-	 * saturation when we should -- something for later.
-	 */
-	if (util + dl_util >= max)
-		return max;
-
-	/*
-	 * OTOH, for energy computation we need the estimated running time, so
-	 * include util_dl and ignore dl_bw.
-	 */
-	if (type == ENERGY_UTIL)
-		util += dl_util;
-
-	/*
-	 * There is still idle time; further improve the number by using the
-	 * irq metric. Because IRQ/steal time is hidden from the task clock we
-	 * need to scale the task numbers:
-	 *
-	 *              max - irq
-	 *   U' = irq + --------- * U
-	 *                 max
-	 */
-	util = scale_irq_capacity(util, irq, max);
-	util += irq;
-
-	/*
-	 * Bandwidth required by DEADLINE must always be granted while, for
-	 * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
-	 * to gracefully reduce the frequency when no tasks show up for longer
-	 * periods of time.
-	 *
-	 * Ideally we would like to set bw_dl as min/guaranteed freq and util +
-	 * bw_dl as requested freq. However, cpufreq is not yet ready for such
-	 * an interface. So, we only do the latter for now.
-	 */
-	if (type == FREQUENCY_UTIL)
-		util += cpu_bw_dl(rq);
-
-	return min(max, util);
-}
-
 static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
 {
-	struct rq *rq = cpu_rq(sg_cpu->cpu);
-	unsigned long util = cpu_util_cfs(rq);
-	unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
-
-	sg_cpu->max = max;
-	sg_cpu->bw_dl = cpu_bw_dl(rq);
+	sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
+	sg_cpu->bw_dl = cpu_bw_dl(cpu_rq(sg_cpu->cpu));
 
-	return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
+	return sched_cpu_util(sg_cpu->cpu, FREQUENCY_UTIL, sg_cpu->max);
 }
 
 /**
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 290f9e38378c..0e1c8eb7ad53 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6499,7 +6499,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
 		 * is already enough to scale the EM reported power
 		 * consumption at the (eventually clamped) cpu_capacity.
 		 */
-		sum_util += schedutil_cpu_util(cpu, util_cfs, cpu_cap,
+		sum_util += effective_cpu_util(cpu, util_cfs, cpu_cap,
 					       ENERGY_UTIL, NULL);
 
 		/*
@@ -6509,7 +6509,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
 		 * NOTE: in case RT tasks are running, by default the
 		 * FREQUENCY_UTIL's utilization can be max OPP.
 		 */
-		cpu_util = schedutil_cpu_util(cpu, util_cfs, cpu_cap,
+		cpu_util = effective_cpu_util(cpu, util_cfs, cpu_cap,
 					      FREQUENCY_UTIL, tsk);
 		max_util = max(max_util, cpu_util);
 	}
@@ -6607,7 +6607,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 			 * IOW, placing the task there would make the CPU
 			 * overutilized. Take uclamp into account to see how
 			 * much capacity we can get out of the CPU; this is
-			 * aligned with schedutil_cpu_util().
+			 * aligned with sched_cpu_util().
 			 */
 			util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
 			if (!fits_capacity(util, cpu_cap))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index df80bfcea92e..4fab3b930ace 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2484,27 +2484,9 @@ static inline unsigned long capacity_orig_of(int cpu)
 {
 	return cpu_rq(cpu)->cpu_capacity_orig;
 }
-#endif
-
-/**
- * enum schedutil_type - CPU utilization type
- * @FREQUENCY_UTIL:	Utilization used to select frequency
- * @ENERGY_UTIL:	Utilization used during energy calculation
- *
- * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
- * need to be aggregated differently depending on the usage made of them. This
- * enum is used within schedutil_freq_util() to differentiate the types of
- * utilization expected by the callers, and adjust the aggregation accordingly.
- */
-enum schedutil_type {
-	FREQUENCY_UTIL,
-	ENERGY_UTIL,
-};
 
-#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-
-unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
-				 unsigned long max, enum schedutil_type type,
+unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
+				 unsigned long max, enum cpu_util_type type,
 				 struct task_struct *p);
 
 static inline unsigned long cpu_bw_dl(struct rq *rq)
@@ -2533,14 +2515,7 @@ static inline unsigned long cpu_util_rt(struct rq *rq)
 {
 	return READ_ONCE(rq->avg_rt.util_avg);
 }
-#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
-static inline unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
-				 unsigned long max, enum schedutil_type type,
-				 struct task_struct *p)
-{
-	return 0;
-}
-#endif /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+#endif
 
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 static inline unsigned long cpu_util_irq(struct rq *rq)
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-19  7:38 [PATCH V3 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler Viresh Kumar
  2020-11-19  7:38 ` [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c Viresh Kumar
@ 2020-11-19  7:38 ` Viresh Kumar
  2020-11-20 14:51   ` Lukasz Luba
                     ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Viresh Kumar @ 2020-11-19  7:38 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Viresh Kumar, Javi Merino,
	Zhang Rui, Amit Kucheria
  Cc: linux-kernel, Quentin Perret, Lukasz Luba, linux-pm

Several parts of the kernel are already using the effective CPU
utilization (as seen by the scheduler) to get the current load on the
CPU, do the same here instead of depending on the idle time of the CPU,
which isn't that accurate comparatively.

This is also the right thing to do as it makes the cpufreq governor
(schedutil) align better with the cpufreq_cooling driver, as the power
requested by cpufreq_cooling governor will exactly match the next
frequency requested by the schedutil governor since they are both using
the same metric to calculate load.

Note that, this (and CPU frequency scaling in general) doesn't work that
well with idle injection as that is done from rt threads and is counted
as load while it tries to do quite the opposite. That should be solved
separately though.

This was tested on ARM Hikey6220 platform with hackbench, sysbench and
schbench. None of them showed any regression or significant
improvements. Schbench is the most important ones out of these as it
creates the scenario where the utilization numbers provide a better
estimate of the future.

Scenario 1: The CPUs were mostly idle in the previous polling window of
the IPA governor as the tasks were sleeping and here are the details
from traces (load is in %):

 Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960

Here, the "Old" line gives the load and requested_power (dynamic_power
here) numbers calculated using the idle time based implementation, while
"New" is based on the CPU utilization from scheduler.

As can be clearly seen, the load and requested_power numbers are simply
incorrect in the idle time based approach and the numbers collected from
CPU's utilization are much closer to the reality.

Scenario 2: The CPUs were busy in the previous polling window of the IPA
governor:

 Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672

As can be seen, the idle time based load is 100% for all the CPUs as it
took only the last window into account, but in reality the CPUs aren't
that loaded as shown by the utilization numbers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/thermal/cpufreq_cooling.c | 68 ++++++++++++++++++++++++-------
 1 file changed, 54 insertions(+), 14 deletions(-)

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index cc2959f22f01..a364a2fd84b1 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -76,7 +76,9 @@ struct cpufreq_cooling_device {
 	struct em_perf_domain *em;
 	struct cpufreq_policy *policy;
 	struct list_head node;
+#ifndef CONFIG_SMP
 	struct time_in_idle *idle_time;
+#endif
 	struct freq_qos_request qos_req;
 };
 
@@ -132,14 +134,35 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
 }
 
 /**
- * get_load() - get load for a cpu since last updated
- * @cpufreq_cdev:	&struct cpufreq_cooling_device for this cpu
- * @cpu:	cpu number
- * @cpu_idx:	index of the cpu in time_in_idle*
+ * get_load() - get load for a cpu
+ * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
+ * @cpu: cpu number
+ * @cpu_idx: index of the cpu in time_in_idle array
  *
  * Return: The average load of cpu @cpu in percentage since this
  * function was last called.
  */
+#ifdef CONFIG_SMP
+static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
+		    int cpu_idx)
+{
+	unsigned long max = arch_scale_cpu_capacity(cpu);
+	unsigned long util;
+
+	util = sched_cpu_util(cpu, ENERGY_UTIL, max);
+	return (util * 100) / max;
+}
+
+static inline int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+	return 0;
+}
+
+static inline void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+}
+
+#else /* !CONFIG_SMP */
 static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
 		    int cpu_idx)
 {
@@ -162,6 +185,26 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
 	return load;
 }
 
+static int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+	unsigned int num_cpus = cpumask_weight(cpufreq_cdev->policy->related_cpus);
+
+	cpufreq_cdev->idle_time = kcalloc(num_cpus,
+					 sizeof(*cpufreq_cdev->idle_time),
+					 GFP_KERNEL);
+	if (!cpufreq_cdev->idle_time)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
+{
+	kfree(cpufreq_cdev->idle_time);
+	cpufreq_cdev->idle_time = NULL;
+}
+#endif /* CONFIG_SMP */
+
 /**
  * get_dynamic_power() - calculate the dynamic power
  * @cpufreq_cdev:	&cpufreq_cooling_device for this cdev
@@ -487,7 +530,7 @@ __cpufreq_cooling_register(struct device_node *np,
 	struct thermal_cooling_device *cdev;
 	struct cpufreq_cooling_device *cpufreq_cdev;
 	char dev_name[THERMAL_NAME_LENGTH];
-	unsigned int i, num_cpus;
+	unsigned int i;
 	struct device *dev;
 	int ret;
 	struct thermal_cooling_device_ops *cooling_ops;
@@ -498,7 +541,6 @@ __cpufreq_cooling_register(struct device_node *np,
 		return ERR_PTR(-ENODEV);
 	}
 
-
 	if (IS_ERR_OR_NULL(policy)) {
 		pr_err("%s: cpufreq policy isn't valid: %p\n", __func__, policy);
 		return ERR_PTR(-EINVAL);
@@ -516,12 +558,10 @@ __cpufreq_cooling_register(struct device_node *np,
 		return ERR_PTR(-ENOMEM);
 
 	cpufreq_cdev->policy = policy;
-	num_cpus = cpumask_weight(policy->related_cpus);
-	cpufreq_cdev->idle_time = kcalloc(num_cpus,
-					 sizeof(*cpufreq_cdev->idle_time),
-					 GFP_KERNEL);
-	if (!cpufreq_cdev->idle_time) {
-		cdev = ERR_PTR(-ENOMEM);
+
+	ret = allocate_idle_time(cpufreq_cdev);
+	if (ret) {
+		cdev = ERR_PTR(ret);
 		goto free_cdev;
 	}
 
@@ -581,7 +621,7 @@ __cpufreq_cooling_register(struct device_node *np,
 remove_ida:
 	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
 free_idle_time:
-	kfree(cpufreq_cdev->idle_time);
+	free_idle_time(cpufreq_cdev);
 free_cdev:
 	kfree(cpufreq_cdev);
 	return cdev;
@@ -674,7 +714,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
 	thermal_cooling_device_unregister(cdev);
 	freq_qos_remove_request(&cpufreq_cdev->qos_req);
 	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
-	kfree(cpufreq_cdev->idle_time);
+	free_idle_time(cpufreq_cdev);
 	kfree(cpufreq_cdev);
 }
 EXPORT_SYMBOL_GPL(cpufreq_cooling_unregister);
-- 
2.25.0.rc1.19.g042ed3e048af


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c
  2020-11-19  7:38 ` [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c Viresh Kumar
@ 2020-11-19 12:30   ` Rafael J. Wysocki
  2020-11-23 10:04     ` Viresh Kumar
  0 siblings, 1 reply; 13+ messages in thread
From: Rafael J. Wysocki @ 2020-11-19 12:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Rafael J. Wysocki,
	Linux Kernel Mailing List, Quentin Perret, Lukasz Luba, Linux PM

On Thu, Nov 19, 2020 at 8:38 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> There is nothing schedutil specific in schedutil_cpu_util(), move it to
> core.c and rename it to sched_cpu_util(), so it can be used from other
> parts of the kernel as well.

The patch does more than this, though.

I would do that in two patches: (1) move the function as is and (2)
rename it and rearrange the users.

> The cpufreq_cooling stuff will make use of this in a later commit.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  include/linux/sched.h            |  21 ++++++
>  kernel/sched/core.c              | 115 ++++++++++++++++++++++++++++++
>  kernel/sched/cpufreq_schedutil.c | 116 +------------------------------
>  kernel/sched/fair.c              |   6 +-
>  kernel/sched/sched.h             |  31 +--------
>  5 files changed, 145 insertions(+), 144 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 063cd120b459..926b944dae5e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1926,6 +1926,27 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
>  #define TASK_SIZE_OF(tsk)      TASK_SIZE
>  #endif
>
> +#ifdef CONFIG_SMP
> +/**
> + * enum cpu_util_type - CPU utilization type
> + * @FREQUENCY_UTIL:    Utilization used to select frequency
> + * @ENERGY_UTIL:       Utilization used during energy calculation
> + *
> + * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time
> + * need to be aggregated differently depending on the usage made of them. This
> + * enum is used within sched_cpu_util() to differentiate the types of
> + * utilization expected by the callers, and adjust the aggregation accordingly.
> + */
> +enum cpu_util_type {
> +       FREQUENCY_UTIL,
> +       ENERGY_UTIL,
> +};
> +
> +/* Returns effective CPU utilization, as seen by the scheduler */
> +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
> +                            unsigned long max);
> +#endif /* CONFIG_SMP */
> +
>  #ifdef CONFIG_RSEQ
>
>  /*
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d2003a7d5ab5..845c976ccd53 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5117,6 +5117,121 @@ struct task_struct *idle_task(int cpu)
>         return cpu_rq(cpu)->idle;
>  }
>
> +#ifdef CONFIG_SMP
> +/*
> + * This function computes an effective utilization for the given CPU, to be
> + * used for frequency selection given the linear relation: f = u * f_max.
> + *
> + * The scheduler tracks the following metrics:
> + *
> + *   cpu_util_{cfs,rt,dl,irq}()
> + *   cpu_bw_dl()
> + *
> + * Where the cfs,rt and dl util numbers are tracked with the same metric and
> + * synchronized windows and are thus directly comparable.
> + *
> + * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> + * which excludes things like IRQ and steal-time. These latter are then accrued
> + * in the irq utilization.
> + *
> + * The DL bandwidth number otoh is not a measured metric but a value computed
> + * based on the task model parameters and gives the minimal utilization
> + * required to meet deadlines.
> + */
> +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> +                                unsigned long max, enum cpu_util_type type,
> +                                struct task_struct *p)
> +{
> +       unsigned long dl_util, util, irq;
> +       struct rq *rq = cpu_rq(cpu);
> +
> +       if (!uclamp_is_used() &&
> +           type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
> +               return max;
> +       }
> +
> +       /*
> +        * Early check to see if IRQ/steal time saturates the CPU, can be
> +        * because of inaccuracies in how we track these -- see
> +        * update_irq_load_avg().
> +        */
> +       irq = cpu_util_irq(rq);
> +       if (unlikely(irq >= max))
> +               return max;
> +
> +       /*
> +        * Because the time spend on RT/DL tasks is visible as 'lost' time to
> +        * CFS tasks and we use the same metric to track the effective
> +        * utilization (PELT windows are synchronized) we can directly add them
> +        * to obtain the CPU's actual utilization.
> +        *
> +        * CFS and RT utilization can be boosted or capped, depending on
> +        * utilization clamp constraints requested by currently RUNNABLE
> +        * tasks.
> +        * When there are no CFS RUNNABLE tasks, clamps are released and
> +        * frequency will be gracefully reduced with the utilization decay.
> +        */
> +       util = util_cfs + cpu_util_rt(rq);
> +       if (type == FREQUENCY_UTIL)
> +               util = uclamp_rq_util_with(rq, util, p);
> +
> +       dl_util = cpu_util_dl(rq);
> +
> +       /*
> +        * For frequency selection we do not make cpu_util_dl() a permanent part
> +        * of this sum because we want to use cpu_bw_dl() later on, but we need
> +        * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
> +        * that we select f_max when there is no idle time.
> +        *
> +        * NOTE: numerical errors or stop class might cause us to not quite hit
> +        * saturation when we should -- something for later.
> +        */
> +       if (util + dl_util >= max)
> +               return max;
> +
> +       /*
> +        * OTOH, for energy computation we need the estimated running time, so
> +        * include util_dl and ignore dl_bw.
> +        */
> +       if (type == ENERGY_UTIL)
> +               util += dl_util;
> +
> +       /*
> +        * There is still idle time; further improve the number by using the
> +        * irq metric. Because IRQ/steal time is hidden from the task clock we
> +        * need to scale the task numbers:
> +        *
> +        *              max - irq
> +        *   U' = irq + --------- * U
> +        *                 max
> +        */
> +       util = scale_irq_capacity(util, irq, max);
> +       util += irq;
> +
> +       /*
> +        * Bandwidth required by DEADLINE must always be granted while, for
> +        * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
> +        * to gracefully reduce the frequency when no tasks show up for longer
> +        * periods of time.
> +        *
> +        * Ideally we would like to set bw_dl as min/guaranteed freq and util +
> +        * bw_dl as requested freq. However, cpufreq is not yet ready for such
> +        * an interface. So, we only do the latter for now.
> +        */
> +       if (type == FREQUENCY_UTIL)
> +               util += cpu_bw_dl(rq);
> +
> +       return min(max, util);
> +}
> +
> +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
> +                            unsigned long max)
> +{
> +       return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type,
> +                                 NULL);
> +}
> +#endif /* CONFIG_SMP */
> +
>  /**
>   * find_process_by_pid - find a process with a matching PID value.
>   * @pid: the pid in question.
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e254745a82cb..a6de75c8b984 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -169,122 +169,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         return cpufreq_driver_resolve_freq(policy, freq);
>  }
>
> -/*
> - * This function computes an effective utilization for the given CPU, to be
> - * used for frequency selection given the linear relation: f = u * f_max.
> - *
> - * The scheduler tracks the following metrics:
> - *
> - *   cpu_util_{cfs,rt,dl,irq}()
> - *   cpu_bw_dl()
> - *
> - * Where the cfs,rt and dl util numbers are tracked with the same metric and
> - * synchronized windows and are thus directly comparable.
> - *
> - * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> - * which excludes things like IRQ and steal-time. These latter are then accrued
> - * in the irq utilization.
> - *
> - * The DL bandwidth number otoh is not a measured metric but a value computed
> - * based on the task model parameters and gives the minimal utilization
> - * required to meet deadlines.
> - */
> -unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs,
> -                                unsigned long max, enum schedutil_type type,
> -                                struct task_struct *p)
> -{
> -       unsigned long dl_util, util, irq;
> -       struct rq *rq = cpu_rq(cpu);
> -
> -       if (!uclamp_is_used() &&
> -           type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) {
> -               return max;
> -       }
> -
> -       /*
> -        * Early check to see if IRQ/steal time saturates the CPU, can be
> -        * because of inaccuracies in how we track these -- see
> -        * update_irq_load_avg().
> -        */
> -       irq = cpu_util_irq(rq);
> -       if (unlikely(irq >= max))
> -               return max;
> -
> -       /*
> -        * Because the time spend on RT/DL tasks is visible as 'lost' time to
> -        * CFS tasks and we use the same metric to track the effective
> -        * utilization (PELT windows are synchronized) we can directly add them
> -        * to obtain the CPU's actual utilization.
> -        *
> -        * CFS and RT utilization can be boosted or capped, depending on
> -        * utilization clamp constraints requested by currently RUNNABLE
> -        * tasks.
> -        * When there are no CFS RUNNABLE tasks, clamps are released and
> -        * frequency will be gracefully reduced with the utilization decay.
> -        */
> -       util = util_cfs + cpu_util_rt(rq);
> -       if (type == FREQUENCY_UTIL)
> -               util = uclamp_rq_util_with(rq, util, p);
> -
> -       dl_util = cpu_util_dl(rq);
> -
> -       /*
> -        * For frequency selection we do not make cpu_util_dl() a permanent part
> -        * of this sum because we want to use cpu_bw_dl() later on, but we need
> -        * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such
> -        * that we select f_max when there is no idle time.
> -        *
> -        * NOTE: numerical errors or stop class might cause us to not quite hit
> -        * saturation when we should -- something for later.
> -        */
> -       if (util + dl_util >= max)
> -               return max;
> -
> -       /*
> -        * OTOH, for energy computation we need the estimated running time, so
> -        * include util_dl and ignore dl_bw.
> -        */
> -       if (type == ENERGY_UTIL)
> -               util += dl_util;
> -
> -       /*
> -        * There is still idle time; further improve the number by using the
> -        * irq metric. Because IRQ/steal time is hidden from the task clock we
> -        * need to scale the task numbers:
> -        *
> -        *              max - irq
> -        *   U' = irq + --------- * U
> -        *                 max
> -        */
> -       util = scale_irq_capacity(util, irq, max);
> -       util += irq;
> -
> -       /*
> -        * Bandwidth required by DEADLINE must always be granted while, for
> -        * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
> -        * to gracefully reduce the frequency when no tasks show up for longer
> -        * periods of time.
> -        *
> -        * Ideally we would like to set bw_dl as min/guaranteed freq and util +
> -        * bw_dl as requested freq. However, cpufreq is not yet ready for such
> -        * an interface. So, we only do the latter for now.
> -        */
> -       if (type == FREQUENCY_UTIL)
> -               util += cpu_bw_dl(rq);
> -
> -       return min(max, util);
> -}
> -
>  static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
> -       struct rq *rq = cpu_rq(sg_cpu->cpu);
> -       unsigned long util = cpu_util_cfs(rq);
> -       unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
> -
> -       sg_cpu->max = max;
> -       sg_cpu->bw_dl = cpu_bw_dl(rq);
> +       sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> +       sg_cpu->bw_dl = cpu_bw_dl(cpu_rq(sg_cpu->cpu));
>
> -       return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
> +       return sched_cpu_util(sg_cpu->cpu, FREQUENCY_UTIL, sg_cpu->max);

I don't see much value in using this wrapper here TBH and it
introduces an otherwise redundant cpu_rq() computation.

>  }
>
>  /**

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
@ 2020-11-20 14:51   ` Lukasz Luba
  2020-11-23 10:41     ` Viresh Kumar
  2020-11-23 15:32   ` Lukasz Luba
  2020-12-01 17:25   ` Valentin Schneider
  2 siblings, 1 reply; 13+ messages in thread
From: Lukasz Luba @ 2020-11-20 14:51 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, linux-pm

Hi Viresh,

On 11/19/20 7:38 AM, Viresh Kumar wrote:
> Several parts of the kernel are already using the effective CPU
> utilization (as seen by the scheduler) to get the current load on the
> CPU, do the same here instead of depending on the idle time of the CPU,
> which isn't that accurate comparatively.
> 
> This is also the right thing to do as it makes the cpufreq governor
> (schedutil) align better with the cpufreq_cooling driver, as the power
> requested by cpufreq_cooling governor will exactly match the next
> frequency requested by the schedutil governor since they are both using
> the same metric to calculate load.
> 
> Note that, this (and CPU frequency scaling in general) doesn't work that
> well with idle injection as that is done from rt threads and is counted
> as load while it tries to do quite the opposite. That should be solved
> separately though.

I think cpuidle cooling is not used with IPA, but Daniel might correct
me here.

> 
> This was tested on ARM Hikey6220 platform with hackbench, sysbench and
> schbench. None of them showed any regression or significant
> improvements. Schbench is the most important ones out of these as it
> creates the scenario where the utilization numbers provide a better
> estimate of the future.
> 
> Scenario 1: The CPUs were mostly idle in the previous polling window of
> the IPA governor as the tasks were sleeping and here are the details
> from traces (load is in %):
> 
>   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
>   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
> 
> Here, the "Old" line gives the load and requested_power (dynamic_power
> here) numbers calculated using the idle time based implementation, while
> "New" is based on the CPU utilization from scheduler.
> 
> As can be clearly seen, the load and requested_power numbers are simply
> incorrect in the idle time based approach and the numbers collected from
> CPU's utilization are much closer to the reality.

It is contradicting to what you have put in 'Scenario 1' description,
isn't it?
Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were
mostly idle than 1.3W would better reflect that state.

What was the IPA period in your setup?

It depends on your platform IPA period (e.g. 100ms) and your current
runqueues state (at that sampling point in time). The PELT decay/rise
period is different. I am not sure if you observe the system avg load
for last e.g. 100ms looking at these signals. Maybe IPA period is too
short/long and couldn't catch up with PELT signals?
But we won't too short averaging, since 16ms is a display tick.

IMHO based on this result it looks like the util could lost older
information from the past or didn't converge yet to this low load yet.

> 
> Scenario 2: The CPUs were busy in the previous polling window of the IPA
> governor:
> 
>   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
>   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
> 
> As can be seen, the idle time based load is 100% for all the CPUs as it
> took only the last window into account, but in reality the CPUs aren't
> that loaded as shown by the utilization numbers.

This is also odd. The ~88% of total_load, looks like started decaying or
didn't converge yet to 100% or some task vanished?

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c
  2020-11-19 12:30   ` Rafael J. Wysocki
@ 2020-11-23 10:04     ` Viresh Kumar
  2020-11-23 10:29       ` Rafael J. Wysocki
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2020-11-23 10:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Juri Lelli,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Rafael J. Wysocki,
	Linux Kernel Mailing List, Quentin Perret, Lukasz Luba, Linux PM

On 19-11-20, 13:30, Rafael J. Wysocki wrote:
> On Thu, Nov 19, 2020 at 8:38 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > There is nothing schedutil specific in schedutil_cpu_util(), move it to
> > core.c and rename it to sched_cpu_util(), so it can be used from other
> > parts of the kernel as well.
> 
> The patch does more than this, though.
> 
> I would do that in two patches: (1) move the function as is and (2)
> rename it and rearrange the users.

Sure.

> >  static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
> >  {
> > -       struct rq *rq = cpu_rq(sg_cpu->cpu);
> > -       unsigned long util = cpu_util_cfs(rq);
> > -       unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
> > -
> > -       sg_cpu->max = max;
> > -       sg_cpu->bw_dl = cpu_bw_dl(rq);
> > +       sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> > +       sg_cpu->bw_dl = cpu_bw_dl(cpu_rq(sg_cpu->cpu));
> >
> > -       return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
> > +       return sched_cpu_util(sg_cpu->cpu, FREQUENCY_UTIL, sg_cpu->max);
> 
> I don't see much value in using this wrapper here TBH and it
> introduces an otherwise redundant cpu_rq() computation.

You want to call effective_cpu_util() here instead, right ?

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c
  2020-11-23 10:04     ` Viresh Kumar
@ 2020-11-23 10:29       ` Rafael J. Wysocki
  0 siblings, 0 replies; 13+ messages in thread
From: Rafael J. Wysocki @ 2020-11-23 10:29 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Juri Lelli, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Rafael J. Wysocki,
	Linux Kernel Mailing List, Quentin Perret, Lukasz Luba, Linux PM

On Mon, Nov 23, 2020 at 11:05 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 19-11-20, 13:30, Rafael J. Wysocki wrote:
> > On Thu, Nov 19, 2020 at 8:38 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > There is nothing schedutil specific in schedutil_cpu_util(), move it to
> > > core.c and rename it to sched_cpu_util(), so it can be used from other
> > > parts of the kernel as well.
> >
> > The patch does more than this, though.
> >
> > I would do that in two patches: (1) move the function as is and (2)
> > rename it and rearrange the users.
>
> Sure.
>
> > >  static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
> > >  {
> > > -       struct rq *rq = cpu_rq(sg_cpu->cpu);
> > > -       unsigned long util = cpu_util_cfs(rq);
> > > -       unsigned long max = arch_scale_cpu_capacity(sg_cpu->cpu);
> > > -
> > > -       sg_cpu->max = max;
> > > -       sg_cpu->bw_dl = cpu_bw_dl(rq);
> > > +       sg_cpu->max = arch_scale_cpu_capacity(sg_cpu->cpu);
> > > +       sg_cpu->bw_dl = cpu_bw_dl(cpu_rq(sg_cpu->cpu));
> > >
> > > -       return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL);
> > > +       return sched_cpu_util(sg_cpu->cpu, FREQUENCY_UTIL, sg_cpu->max);
> >
> > I don't see much value in using this wrapper here TBH and it
> > introduces an otherwise redundant cpu_rq() computation.
>
> You want to call effective_cpu_util() here instead, right ?

Right.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-20 14:51   ` Lukasz Luba
@ 2020-11-23 10:41     ` Viresh Kumar
  2020-11-23 11:34       ` Lukasz Luba
  0 siblings, 1 reply; 13+ messages in thread
From: Viresh Kumar @ 2020-11-23 10:41 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, linux-pm

On 20-11-20, 14:51, Lukasz Luba wrote:
> On 11/19/20 7:38 AM, Viresh Kumar wrote:
> > Scenario 1: The CPUs were mostly idle in the previous polling window of
> > the IPA governor as the tasks were sleeping and here are the details
> > from traces (load is in %):
> > 
> >   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
> >   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
> > 
> > Here, the "Old" line gives the load and requested_power (dynamic_power
> > here) numbers calculated using the idle time based implementation, while
> > "New" is based on the CPU utilization from scheduler.
> > 
> > As can be clearly seen, the load and requested_power numbers are simply
> > incorrect in the idle time based approach and the numbers collected from
> > CPU's utilization are much closer to the reality.
> 
> It is contradicting to what you have put in 'Scenario 1' description,
> isn't it?

At least I didn't think so when I wrote this and am still not sure :)

> Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were
> mostly idle than 1.3W would better reflect that state.

The CPUs were idle because the tasks were sleeping, but once the tasks
resume to work, we need a frequency that matches the real load of the
tasks. This is exactly what schedutil would ask for as well as it uses
the same metric and so we should be looking to ask for the same power
budget..

> What was the IPA period in your setup?

It is 100 ms by default, though I remember that I tried with 10 ms as
well.

> It depends on your platform IPA period (e.g. 100ms) and your current
> runqueues state (at that sampling point in time). The PELT decay/rise
> period is different. I am not sure if you observe the system avg load
> for last e.g. 100ms looking at these signals. Maybe IPA period is too
> short/long and couldn't catch up with PELT signals?
> But we won't too short averaging, since 16ms is a display tick.
> 
> IMHO based on this result it looks like the util could lost older
> information from the past or didn't converge yet to this low load yet.
> 
> > 
> > Scenario 2: The CPUs were busy in the previous polling window of the IPA
> > governor:
> > 
> >   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
> >   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
> > 
> > As can be seen, the idle time based load is 100% for all the CPUs as it
> > took only the last window into account, but in reality the CPUs aren't
> > that loaded as shown by the utilization numbers.
> 
> This is also odd. The ~88% of total_load, looks like started decaying or
> didn't converge yet to 100% or some task vanished?

They must have decayed a bit because of the idle period, so looks okay
that way.

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-23 10:41     ` Viresh Kumar
@ 2020-11-23 11:34       ` Lukasz Luba
  0 siblings, 0 replies; 13+ messages in thread
From: Lukasz Luba @ 2020-11-23 11:34 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, linux-pm



On 11/23/20 10:41 AM, Viresh Kumar wrote:
> On 20-11-20, 14:51, Lukasz Luba wrote:
>> On 11/19/20 7:38 AM, Viresh Kumar wrote:
>>> Scenario 1: The CPUs were mostly idle in the previous polling window of
>>> the IPA governor as the tasks were sleeping and here are the details
>>> from traces (load is in %):
>>>
>>>    Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
>>>    New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
>>>
>>> Here, the "Old" line gives the load and requested_power (dynamic_power
>>> here) numbers calculated using the idle time based implementation, while
>>> "New" is based on the CPU utilization from scheduler.
>>>
>>> As can be clearly seen, the load and requested_power numbers are simply
>>> incorrect in the idle time based approach and the numbers collected from
>>> CPU's utilization are much closer to the reality.
>>
>> It is contradicting to what you have put in 'Scenario 1' description,
>> isn't it?
> 
> At least I didn't think so when I wrote this and am still not sure :)
> 
>> Frequency at 1.2GHz, 75% total_load, power 4W... I'd say if CPUs were
>> mostly idle than 1.3W would better reflect that state.
> 
> The CPUs were idle because the tasks were sleeping, but once the tasks
> resume to work, we need a frequency that matches the real load of the
> tasks. This is exactly what schedutil would ask for as well as it uses
> the same metric and so we should be looking to ask for the same power
> budget..

Yes, agree.

> 
>> What was the IPA period in your setup?
> 
> It is 100 ms by default, though I remember that I tried with 10 ms as
> well.
> 
>> It depends on your platform IPA period (e.g. 100ms) and your current
>> runqueues state (at that sampling point in time). The PELT decay/rise
>> period is different. I am not sure if you observe the system avg load
>> for last e.g. 100ms looking at these signals. Maybe IPA period is too
>> short/long and couldn't catch up with PELT signals?
>> But we won't too short averaging, since 16ms is a display tick.
>>
>> IMHO based on this result it looks like the util could lost older
>> information from the past or didn't converge yet to this low load yet.
>>
>>>
>>> Scenario 2: The CPUs were busy in the previous polling window of the IPA
>>> governor:
>>>
>>>    Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
>>>    New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
>>>
>>> As can be seen, the idle time based load is 100% for all the CPUs as it
>>> took only the last window into account, but in reality the CPUs aren't
>>> that loaded as shown by the utilization numbers.
>>
>> This is also odd. The ~88% of total_load, looks like started decaying or
>> didn't converge yet to 100% or some task vanished?
> 
> They must have decayed a bit because of the idle period, so looks okay
> that way.
> 

I have experimented with this new estimation and compared with real
power meter and other models. It looks good, better than current
mainline. I will continue experiments, but this patch LGTM and
I will add my reviewed-by today (after finishing it).

It would make more sense to adjust IPA period to util signal then the
opposite. I have to play with this a bit...

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
  2020-11-20 14:51   ` Lukasz Luba
@ 2020-11-23 15:32   ` Lukasz Luba
  2020-11-24  4:56     ` Viresh Kumar
  2020-12-01 17:25   ` Valentin Schneider
  2 siblings, 1 reply; 13+ messages in thread
From: Lukasz Luba @ 2020-11-23 15:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, linux-pm



On 11/19/20 7:38 AM, Viresh Kumar wrote:
> Several parts of the kernel are already using the effective CPU
> utilization (as seen by the scheduler) to get the current load on the
> CPU, do the same here instead of depending on the idle time of the CPU,
> which isn't that accurate comparatively.
> 
> This is also the right thing to do as it makes the cpufreq governor
> (schedutil) align better with the cpufreq_cooling driver, as the power
> requested by cpufreq_cooling governor will exactly match the next
> frequency requested by the schedutil governor since they are both using
> the same metric to calculate load.
> 
> Note that, this (and CPU frequency scaling in general) doesn't work that
> well with idle injection as that is done from rt threads and is counted
> as load while it tries to do quite the opposite. That should be solved
> separately though.
> 
> This was tested on ARM Hikey6220 platform with hackbench, sysbench and
> schbench. None of them showed any regression or significant
> improvements. Schbench is the most important ones out of these as it
> creates the scenario where the utilization numbers provide a better
> estimate of the future.
> 
> Scenario 1: The CPUs were mostly idle in the previous polling window of
> the IPA governor as the tasks were sleeping and here are the details
> from traces (load is in %):
> 
>   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339
>   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960
> 
> Here, the "Old" line gives the load and requested_power (dynamic_power
> here) numbers calculated using the idle time based implementation, while
> "New" is based on the CPU utilization from scheduler.
> 
> As can be clearly seen, the load and requested_power numbers are simply
> incorrect in the idle time based approach and the numbers collected from
> CPU's utilization are much closer to the reality.
> 
> Scenario 2: The CPUs were busy in the previous polling window of the IPA
> governor:
> 
>   Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280
>   New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672
> 
> As can be seen, the idle time based load is 100% for all the CPUs as it
> took only the last window into account, but in reality the CPUs aren't
> that loaded as shown by the utilization numbers.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   drivers/thermal/cpufreq_cooling.c | 68 ++++++++++++++++++++++++-------
>   1 file changed, 54 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index cc2959f22f01..a364a2fd84b1 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -76,7 +76,9 @@ struct cpufreq_cooling_device {
>   	struct em_perf_domain *em;
>   	struct cpufreq_policy *policy;
>   	struct list_head node;
> +#ifndef CONFIG_SMP
>   	struct time_in_idle *idle_time;
> +#endif
>   	struct freq_qos_request qos_req;
>   };
>   
> @@ -132,14 +134,35 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev,
>   }
>   
>   /**
> - * get_load() - get load for a cpu since last updated
> - * @cpufreq_cdev:	&struct cpufreq_cooling_device for this cpu
> - * @cpu:	cpu number
> - * @cpu_idx:	index of the cpu in time_in_idle*
> + * get_load() - get load for a cpu
> + * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu
> + * @cpu: cpu number
> + * @cpu_idx: index of the cpu in time_in_idle array
>    *
>    * Return: The average load of cpu @cpu in percentage since this
>    * function was last called.
>    */
> +#ifdef CONFIG_SMP
> +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
> +		    int cpu_idx)
> +{
> +	unsigned long max = arch_scale_cpu_capacity(cpu);
> +	unsigned long util;
> +
> +	util = sched_cpu_util(cpu, ENERGY_UTIL, max);
> +	return (util * 100) / max;
> +}
> +
> +static inline int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
> +{
> +	return 0;
> +}
> +
> +static inline void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
> +{
> +}
> +
> +#else /* !CONFIG_SMP */
>   static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
>   		    int cpu_idx)
>   {
> @@ -162,6 +185,26 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
>   	return load;
>   }
>   
> +static int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
> +{
> +	unsigned int num_cpus = cpumask_weight(cpufreq_cdev->policy->related_cpus);
> +
> +	cpufreq_cdev->idle_time = kcalloc(num_cpus,
> +					 sizeof(*cpufreq_cdev->idle_time),
> +					 GFP_KERNEL);
> +	if (!cpufreq_cdev->idle_time)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev)
> +{
> +	kfree(cpufreq_cdev->idle_time);
> +	cpufreq_cdev->idle_time = NULL;
> +}
> +#endif /* CONFIG_SMP */
> +
>   /**
>    * get_dynamic_power() - calculate the dynamic power
>    * @cpufreq_cdev:	&cpufreq_cooling_device for this cdev
> @@ -487,7 +530,7 @@ __cpufreq_cooling_register(struct device_node *np,
>   	struct thermal_cooling_device *cdev;
>   	struct cpufreq_cooling_device *cpufreq_cdev;
>   	char dev_name[THERMAL_NAME_LENGTH];
> -	unsigned int i, num_cpus;
> +	unsigned int i;
>   	struct device *dev;
>   	int ret;
>   	struct thermal_cooling_device_ops *cooling_ops;
> @@ -498,7 +541,6 @@ __cpufreq_cooling_register(struct device_node *np,
>   		return ERR_PTR(-ENODEV);
>   	}
>   
> -
>   	if (IS_ERR_OR_NULL(policy)) {
>   		pr_err("%s: cpufreq policy isn't valid: %p\n", __func__, policy);
>   		return ERR_PTR(-EINVAL);
> @@ -516,12 +558,10 @@ __cpufreq_cooling_register(struct device_node *np,
>   		return ERR_PTR(-ENOMEM);
>   
>   	cpufreq_cdev->policy = policy;
> -	num_cpus = cpumask_weight(policy->related_cpus);
> -	cpufreq_cdev->idle_time = kcalloc(num_cpus,
> -					 sizeof(*cpufreq_cdev->idle_time),
> -					 GFP_KERNEL);
> -	if (!cpufreq_cdev->idle_time) {
> -		cdev = ERR_PTR(-ENOMEM);
> +
> +	ret = allocate_idle_time(cpufreq_cdev);
> +	if (ret) {
> +		cdev = ERR_PTR(ret);
>   		goto free_cdev;
>   	}
>   
> @@ -581,7 +621,7 @@ __cpufreq_cooling_register(struct device_node *np,
>   remove_ida:
>   	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
>   free_idle_time:
> -	kfree(cpufreq_cdev->idle_time);
> +	free_idle_time(cpufreq_cdev);
>   free_cdev:
>   	kfree(cpufreq_cdev);
>   	return cdev;
> @@ -674,7 +714,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
>   	thermal_cooling_device_unregister(cdev);
>   	freq_qos_remove_request(&cpufreq_cdev->qos_req);
>   	ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id);
> -	kfree(cpufreq_cdev->idle_time);
> +	free_idle_time(cpufreq_cdev);
>   	kfree(cpufreq_cdev);
>   }
>   EXPORT_SYMBOL_GPL(cpufreq_cooling_unregister);
> 


LGTM. It has potential. We will see how far we can improve IPA with this
model.

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-23 15:32   ` Lukasz Luba
@ 2020-11-24  4:56     ` Viresh Kumar
  0 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2020-11-24  4:56 UTC (permalink / raw)
  To: Lukasz Luba
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, linux-pm

On 23-11-20, 15:32, Lukasz Luba wrote:
> LGTM. It has potential. We will see how far we can improve IPA with this
> model.
> 
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Thanks Lukasz for your review :)

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
  2020-11-20 14:51   ` Lukasz Luba
  2020-11-23 15:32   ` Lukasz Luba
@ 2020-12-01 17:25   ` Valentin Schneider
  2020-12-07  9:13     ` Viresh Kumar
  2 siblings, 1 reply; 13+ messages in thread
From: Valentin Schneider @ 2020-12-01 17:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, Lukasz Luba,
	linux-pm

Hi Viresh,

On 19/11/20 07:38, Viresh Kumar wrote:
> As can be clearly seen, the load and requested_power numbers are simply
> incorrect in the idle time based approach and the numbers collected from
> CPU's utilization are much closer to the reality.
>

PELT time-scaling can make the util signals behave strangely from an
external PoV. For instance, on a big.LITTLE system, the rq util of a LITTLE
CPU may suddenly drop if it was stuck on a too-low OPP for some time and
eventually reached the "right" OPP (i.e. got idle time). 

Also, as Peter pointed out in [1], task migrations can easily confuse an
external observer that considers util to be "an image of the recent past".

This will need testing on asymmetric CPU capacity systems, IMO.

[1]: https://lore.kernel.org/r/20201120075527.GB2414@hirez.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms
  2020-12-01 17:25   ` Valentin Schneider
@ 2020-12-07  9:13     ` Viresh Kumar
  0 siblings, 0 replies; 13+ messages in thread
From: Viresh Kumar @ 2020-12-07  9:13 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Amit Daniel Kachhap, Daniel Lezcano, Javi Merino, Zhang Rui,
	Amit Kucheria, linux-kernel, Quentin Perret, Lukasz Luba,
	linux-pm

Hi Valentin,

On 01-12-20, 17:25, Valentin Schneider wrote:
> PELT time-scaling can make the util signals behave strangely from an
> external PoV. For instance, on a big.LITTLE system, the rq util of a LITTLE
> CPU may suddenly drop if it was stuck on a too-low OPP for some time and
> eventually reached the "right" OPP (i.e. got idle time). 
> 
> Also, as Peter pointed out in [1], task migrations can easily confuse an
> external observer that considers util to be "an image of the recent past".

I agree with what you wrote and such issues may happen here as they
can in case of schedutil as well. The idea behind this patchset was to
get the allocator (IPA) and consumer (schedutil) in sync with respect
to frequency and power. It is better to allocate the power that
schedutil is going to request, then to allocate something based on
different metrics. If there is a problem with PELT signal then I will
let both the entities suffer with that.

-- 
viresh

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-12-07  9:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  7:38 [PATCH V3 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler Viresh Kumar
2020-11-19  7:38 ` [PATCH V3 1/2] sched/core: Rename and move schedutil_cpu_util() to core.c Viresh Kumar
2020-11-19 12:30   ` Rafael J. Wysocki
2020-11-23 10:04     ` Viresh Kumar
2020-11-23 10:29       ` Rafael J. Wysocki
2020-11-19  7:38 ` [PATCH V3 2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Viresh Kumar
2020-11-20 14:51   ` Lukasz Luba
2020-11-23 10:41     ` Viresh Kumar
2020-11-23 11:34       ` Lukasz Luba
2020-11-23 15:32   ` Lukasz Luba
2020-11-24  4:56     ` Viresh Kumar
2020-12-01 17:25   ` Valentin Schneider
2020-12-07  9:13     ` Viresh Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).