All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] sched and cpufreq fixes/cleanups
@ 2017-10-28  9:59 Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 1/5] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()" Joel Fernandes
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Rafael J. Wysocki, Cc: Viresh Kumar, Cc: Ingo Molnar,
	Cc: Peter Zijlstra, Cc: Juri Lelli, Cc: Patrick Bellasi,
	Cc: Steve Muckle, Cc: Brendan Jackman, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Dietmar Eggemann, Cc: Vincent Guittot,
	Cc: Morten Ramussen, Cc: Frederic Weisbecker,
	Cc: Thomas Gleixner, Cc: EAS Dev, Cc: Android Kernel

Here are some patches that are generally minor changes and I am posting them
together. Patches 1/5 and 2/5 are related to skipping cpufreq updates for the
dequeue of the last task before the CPU enters idle. That's just a rebase of
[1] mostly. Patches 3/5 and 4/5 fix some minor things I noticed after the
remote cpufreq update work. and patch 5/5 is just a small clean up of
find_idlest_group. Let me know your thoughts and thanks. I've based these
patches on peterz's queue.git master branch.

[1] https://patchwork.kernel.org/patch/9936555/

Joel Fernandes (5):
  Revert "sched/fair: Drop always true parameter of
    update_cfs_rq_load_avg()"
  sched/fair: Skip frequency update if CPU about to idle
  cpufreq: schedutil: Use idle_calls counter of the remote CPU
  sched/fair: Correct obsolete comment about cpufreq_update_util
  sched/fair: remove impossible condition from find_idlest_group_cpu

 include/linux/tick.h             |  1 +
 kernel/sched/cpufreq_schedutil.c |  2 +-
 kernel/sched/fair.c              | 44 ++++++++++++++++++++++++++++------------
 kernel/sched/sched.h             |  1 +
 kernel/time/tick-sched.c         | 13 ++++++++++++
 5 files changed, 47 insertions(+), 14 deletions(-)

-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 1/5] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()"
  2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
@ 2017-10-28  9:59 ` Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle Joel Fernandes
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Rafael J . Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Juri Lelli, Cc: Patrick Bellasi, Cc: Steve Muckle,
	Cc: Brendan Jackman, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Dietmar Eggemann, Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

This reverts commit 3a123bbbb10d54dbdde6ccbbd519c74c91ba2f52.

It needs a revert for controlling whether cpufreq is notified about
updating frequency during an update to the utilization.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/fair.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56f343b8e749..f97693fe8b6e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -784,7 +784,7 @@ void post_init_entity_util_avg(struct sched_entity *se)
 			/*
 			 * For !fair tasks do:
 			 *
-			update_cfs_rq_load_avg(now, cfs_rq);
+			update_cfs_rq_load_avg(now, cfs_rq, false);
 			attach_entity_load_avg(cfs_rq, se);
 			switched_from_fair(rq, p);
 			 *
@@ -3596,6 +3596,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum
  * update_cfs_rq_load_avg - update the cfs_rq's load/util averages
  * @now: current time, as per cfs_rq_clock_task()
  * @cfs_rq: cfs_rq to update
+ * @update_freq: should we call cfs_rq_util_change() or will the call do so
  *
  * The cfs_rq avg is the direct sum of all its entities (blocked and runnable)
  * avg. The immediate corollary is that all (fair) tasks must be attached, see
@@ -3609,7 +3610,7 @@ static inline void add_tg_cfs_propagate(struct cfs_rq *cfs_rq, long runnable_sum
  * call update_tg_load_avg() when this function returns true.
  */
 static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 {
 	unsigned long removed_load = 0, removed_util = 0, removed_runnable_sum = 0;
 	struct sched_avg *sa = &cfs_rq->avg;
@@ -3646,7 +3647,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 	cfs_rq->load_last_update_time_copy = sa->last_update_time;
 #endif
 
-	if (decayed)
+	if (update_freq && decayed)
 		cfs_rq_util_change(cfs_rq);
 
 	return decayed;
@@ -3740,7 +3741,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
 		__update_load_avg_se(now, cpu, cfs_rq, se);
 
-	decayed  = update_cfs_rq_load_avg(now, cfs_rq);
+	decayed  = update_cfs_rq_load_avg(now, cfs_rq, true);
 	decayed |= propagate_entity_load_avg(se);
 
 	if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
@@ -3830,7 +3831,7 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
 #else /* CONFIG_SMP */
 
 static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 {
 	return 0;
 }
@@ -7318,7 +7319,7 @@ static void update_blocked_averages(int cpu)
 		if (throttled_hierarchy(cfs_rq))
 			continue;
 
-		if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq))
+		if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))
 			update_tg_load_avg(cfs_rq, 0);
 
 		/* Propagate pending load changes to the parent, if any: */
@@ -7391,7 +7392,7 @@ static inline void update_blocked_averages(int cpu)
 
 	rq_lock_irqsave(rq, &rf);
 	update_rq_clock(rq);
-	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true);
 	rq_unlock_irqrestore(rq, &rf);
 }
 
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 1/5] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()" Joel Fernandes
@ 2017-10-28  9:59 ` Joel Fernandes
  2017-10-30 12:07   ` Viresh Kumar
  2017-10-28  9:59 ` [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU Joel Fernandes
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Rafael J . Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Juri Lelli, Cc: Patrick Bellasi, Cc: Steve Muckle,
	Cc: Brendan Jackman, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Dietmar Eggemann, Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

Updating CPU frequency on last dequeue of a CPU is useless. Because the
utilization since CPU came out of idle can increase till the last dequeue, this
means we are requesting for a higher frequency before entering idle which is
not very meaningful or useful. It causes unwanted wakeups of the schedutil
governor kthread in slow-switch systems resulting in large number of wake ups
that could have been avoided. In an Android application playing music where the
music app's thread wakes up and sleeps periodically on an Android device, its
seen that the frequency increases slightly on the dequeue and is reduced when
the task wakes up again. This oscillation continues between 300Mhz and 350Mhz,
and while the task is running, its at 300MHz the whole time. This is pointless.
Adding to that, these are unnecessary wake ups. Infact most of the time when
the sugov thread wakes up, all the CPUs are idle - so it can hurt power by
disturbing the cluster when it is idling.

This patch prevents a frequency update on the last dequeue. With this the
number of schedutil governor thread wake ups are reduces more than 2 times
(1389 -> 527).

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/fair.c  | 25 ++++++++++++++++++++++---
 kernel/sched/sched.h |  1 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f97693fe8b6e..4c06e52935d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3725,6 +3725,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 #define UPDATE_TG	0x1
 #define SKIP_AGE_LOAD	0x2
 #define DO_ATTACH	0x4
+#define SKIP_CPUFREQ	0x8
 
 /* Update task and its cfs_rq load average */
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
@@ -3741,7 +3742,7 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
 	if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
 		__update_load_avg_se(now, cpu, cfs_rq, se);
 
-	decayed  = update_cfs_rq_load_avg(now, cfs_rq, true);
+	decayed  = update_cfs_rq_load_avg(now, cfs_rq, !(flags & SKIP_CPUFREQ));
 	decayed |= propagate_entity_load_avg(se);
 
 	if (!se->avg.last_update_time && (flags & DO_ATTACH)) {
@@ -3839,6 +3840,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
+#define SKIP_CPUFREQ	0x0
 
 static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int not_used1)
 {
@@ -4060,6 +4062,8 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	int update_flags;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -4073,7 +4077,12 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 *   - For group entity, update its weight to reflect the new share
 	 *     of its group cfs_rq.
 	 */
-	update_load_avg(cfs_rq, se, UPDATE_TG);
+	update_flags = UPDATE_TG;
+
+	if (flags & DEQUEUE_IDLE)
+		update_flags |= SKIP_CPUFREQ;
+
+	update_load_avg(cfs_rq, se, update_flags);
 	dequeue_runnable_load_avg(cfs_rq, se);
 
 	update_stats_dequeue(cfs_rq, se, flags);
@@ -5220,6 +5229,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	struct sched_entity *se = &p->se;
 	int task_sleep = flags & DEQUEUE_SLEEP;
 
+	if (task_sleep && rq->nr_running == 1)
+		flags |= DEQUEUE_IDLE;
+
 	for_each_sched_entity(se) {
 		cfs_rq = cfs_rq_of(se);
 		dequeue_entity(cfs_rq, se, flags);
@@ -5250,13 +5262,20 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	}
 
 	for_each_sched_entity(se) {
+		int update_flags;
+
 		cfs_rq = cfs_rq_of(se);
 		cfs_rq->h_nr_running--;
 
 		if (cfs_rq_throttled(cfs_rq))
 			break;
 
-		update_load_avg(cfs_rq, se, UPDATE_TG);
+		update_flags = UPDATE_TG;
+
+		if (flags & DEQUEUE_IDLE)
+			update_flags |= SKIP_CPUFREQ;
+
+		update_load_avg(cfs_rq, se, update_flags);
 		update_cfs_group(se);
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8aa24b41f652..68f5cd102744 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1394,6 +1394,7 @@ extern const u32 sched_prio_to_wmult[40];
 #define DEQUEUE_SAVE		0x02 /* matches ENQUEUE_RESTORE */
 #define DEQUEUE_MOVE		0x04 /* matches ENQUEUE_MOVE */
 #define DEQUEUE_NOCLOCK		0x08 /* matches ENQUEUE_NOCLOCK */
+#define DEQUEUE_IDLE		0x10
 
 #define ENQUEUE_WAKEUP		0x01
 #define ENQUEUE_RESTORE		0x02
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU
  2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 1/5] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()" Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle Joel Fernandes
@ 2017-10-28  9:59 ` Joel Fernandes
  2017-10-30  9:18   ` Viresh Kumar
  2017-10-28  9:59 ` [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util Joel Fernandes
  2017-10-28  9:59 ` [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu Joel Fernandes
  4 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Rafael J . Wysocki, Viresh Kumar, Ingo Molnar,
	Peter Zijlstra, Frederic Weisbecker, Thomas Gleixner,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen, Cc: EAS Dev,
	Cc: Android Kernel

Since the recent remote cpufreq callback work, its possible that a cpufreq
update is triggered from a remote CPU. For single policies however, the current
code uses the local CPU when trying to determine if the remote sg_cpu entered
idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
idle_calls counter of the remote CPU.

Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 include/linux/tick.h             |  1 +
 kernel/sched/cpufreq_schedutil.c |  2 +-
 kernel/time/tick-sched.c         | 13 +++++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index fe01e68bf520..31bf2ae3e9a7 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -118,6 +118,7 @@ extern void tick_nohz_idle_exit(void);
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
+extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 #else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 9209d83ecdcf..1383aa9efb4f 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -244,7 +244,7 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
 #ifdef CONFIG_NO_HZ_COMMON
 static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
 {
-	unsigned long idle_calls = tick_nohz_get_idle_calls();
+	unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu);
 	bool ret = idle_calls == sg_cpu->saved_idle_calls;
 
 	sg_cpu->saved_idle_calls = idle_calls;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 7b258c59d78a..91d5e7be805b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1011,6 +1011,19 @@ ktime_t tick_nohz_get_sleep_length(void)
 	return ts->sleep_length;
 }
 
+/**
+ * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
+ * for a particular CPU.
+ *
+ * Called from the schedutil frequency scaling governor in scheduler context.
+ */
+unsigned long tick_nohz_get_idle_calls_cpu(int cpu)
+{
+	struct tick_sched *ts = tick_get_tick_sched(cpu);
+
+	return ts->idle_calls;
+}
+
 /**
  * tick_nohz_get_idle_calls - return the current idle calls counter value
  *
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util
  2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
                   ` (2 preceding siblings ...)
  2017-10-28  9:59 ` [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU Joel Fernandes
@ 2017-10-28  9:59 ` Joel Fernandes
  2017-10-30  9:22   ` Viresh Kumar
  2017-10-28  9:59 ` [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu Joel Fernandes
  4 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Viresh Kumar, Rafael J . Wysocki, Ingo Molnar,
	Peter Zijlstra, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Juri Lelli, Cc: Patrick Bellasi, Cc: Steve Muckle,
	Cc: Brendan Jackman, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Dietmar Eggemann, Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

Since the remote cpufreq callback work, the cpufreq_update_util call can happen
from remote CPUs. The comment about local CPUs is thus obsolete. Update it
accordingly.

Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/fair.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4c06e52935d3..5c49fdb4c508 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3018,9 +3018,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 		/*
 		 * There are a few boundary cases this might miss but it should
 		 * get called often enough that that should (hopefully) not be
-		 * a real problem -- added to that it only calls on the local
-		 * CPU, so if we enqueue remotely we'll miss an update, but
-		 * the next tick/schedule should update.
+		 * a real problem.
 		 *
 		 * It will not get called when we go idle, because the idle
 		 * thread is a different class (!fair), nor will the utilization
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu
  2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
                   ` (3 preceding siblings ...)
  2017-10-28  9:59 ` [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util Joel Fernandes
@ 2017-10-28  9:59 ` Joel Fernandes
  2017-10-30 15:41   ` Brendan Jackman
  2017-10-30 16:00   ` Vincent Guittot
  4 siblings, 2 replies; 20+ messages in thread
From: Joel Fernandes @ 2017-10-28  9:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: Joel Fernandes, Ingo Molnar, Peter Zijlstra, Brendan Jackman,
	Dietmar, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Rafael J. Wysocki, Cc: Viresh Kumar, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

find_idlest_group_cpu goes through CPUs of a group previous selected by
find_idlest_group. find_idlest_group returns NULL if the local group is the
selected one and doesn't execute find_idlest_group_cpu if the group to which
'cpu' belongs to is chosen. So we're always guaranteed to call
find_idlest_group_cpu with a group to which cpu is non-local. This makes one of
the conditions in find_idlest_group_cpu an impossible one, which we can get rid
off.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brendan Jackman <brendan.jackman@arm.com>
Cc: Dietmar <dietmar.eggemann@arm.com>
Signed-off-by: Joel Fernandes <joelaf@google.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5c49fdb4c508..740602ce799f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5922,7 +5922,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 			}
 		} else if (shallowest_idle_cpu == -1) {
 			load = weighted_cpuload(cpu_rq(i));
-			if (load < min_load || (load == min_load && i == this_cpu)) {
+			if (load < min_load) {
 				min_load = load;
 				least_loaded_cpu = i;
 			}
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU
  2017-10-28  9:59 ` [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU Joel Fernandes
@ 2017-10-30  9:18   ` Viresh Kumar
  0 siblings, 0 replies; 20+ messages in thread
From: Viresh Kumar @ 2017-10-30  9:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Frederic Weisbecker, Thomas Gleixner, Cc: Srinivas Pandruvada,
	Cc: Len Brown, Cc: Juri Lelli, Cc: Patrick Bellasi,
	Cc: Steve Muckle, Cc: Brendan Jackman, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Dietmar Eggemann, Cc: Vincent Guittot,
	Cc: Morten Ramussen, Cc: EAS Dev, Cc: Android Kernel

On 28-10-17, 02:59, Joel Fernandes wrote:
> Since the recent remote cpufreq callback work, its possible that a cpufreq
> update is triggered from a remote CPU. For single policies however, the current
> code uses the local CPU when trying to determine if the remote sg_cpu entered
> idle or is busy. This is incorrect. To remedy this, compare with the nohz tick
> idle_calls counter of the remote CPU.
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>

Fixes: 674e75411fc2 ("sched: cpufreq: Allow remote cpufreq callbacks")

> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  include/linux/tick.h             |  1 +
>  kernel/sched/cpufreq_schedutil.c |  2 +-
>  kernel/time/tick-sched.c         | 13 +++++++++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util
  2017-10-28  9:59 ` [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util Joel Fernandes
@ 2017-10-30  9:22   ` Viresh Kumar
  2017-10-30 19:16     ` Joel Fernandes
  0 siblings, 1 reply; 20+ messages in thread
From: Viresh Kumar @ 2017-10-30  9:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

You have prefixed most of the Cc'd names with "Cc: " somehow :)

On 28-10-17, 02:59, Joel Fernandes wrote:
> Since the remote cpufreq callback work, the cpufreq_update_util call can happen
> from remote CPUs. The comment about local CPUs is thus obsolete. Update it
> accordingly.

We normally keep the column width as 72 in commit logs instead of 80,
as with 'git log' this is indented by a tab and then we would cross 80
columns.

> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  kernel/sched/fair.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4c06e52935d3..5c49fdb4c508 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3018,9 +3018,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
>  		/*
>  		 * There are a few boundary cases this might miss but it should
>  		 * get called often enough that that should (hopefully) not be
> -		 * a real problem -- added to that it only calls on the local
> -		 * CPU, so if we enqueue remotely we'll miss an update, but
> -		 * the next tick/schedule should update.
> +		 * a real problem.
>  		 *
>  		 * It will not get called when we go idle, because the idle
>  		 * thread is a different class (!fair), nor will the utilization

Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-10-28  9:59 ` [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle Joel Fernandes
@ 2017-10-30 12:07   ` Viresh Kumar
  2017-10-30 19:02     ` Joel Fernandes
  0 siblings, 1 reply; 20+ messages in thread
From: Viresh Kumar @ 2017-10-30 12:07 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

On 28-10-17, 02:59, Joel Fernandes wrote:
> Updating CPU frequency on last dequeue of a CPU is useless. Because the
> utilization since CPU came out of idle can increase till the last dequeue, this
> means we are requesting for a higher frequency before entering idle which is
> not very meaningful or useful. It causes unwanted wakeups of the schedutil
> governor kthread in slow-switch systems resulting in large number of wake ups
> that could have been avoided. In an Android application playing music where the
> music app's thread wakes up and sleeps periodically on an Android device, its
> seen that the frequency increases slightly on the dequeue and is reduced when
> the task wakes up again. This oscillation continues between 300Mhz and 350Mhz,
> and while the task is running, its at 300MHz the whole time. This is pointless.
> Adding to that, these are unnecessary wake ups. Infact most of the time when
> the sugov thread wakes up, all the CPUs are idle - so it can hurt power by
> disturbing the cluster when it is idling.
> 
> This patch prevents a frequency update on the last dequeue. With this the
> number of schedutil governor thread wake ups are reduces more than 2 times
> (1389 -> 527).
> 
> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  kernel/sched/fair.c  | 25 ++++++++++++++++++++++---
>  kernel/sched/sched.h |  1 +
>  2 files changed, 23 insertions(+), 3 deletions(-)

So you are doing this only for CFS, isn't that required for RT/DL as
well?

Also, this more looks like a policy decision. Will it be better to
put that directly into schedutil? Like this:

        if (cpu_idle())
                "Don't change the freq";

Will something like that work?

-- 
viresh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu
  2017-10-28  9:59 ` [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu Joel Fernandes
@ 2017-10-30 15:41   ` Brendan Jackman
  2017-10-30 16:00   ` Vincent Guittot
  1 sibling, 0 replies; 20+ messages in thread
From: Brendan Jackman @ 2017-10-30 15:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Rafael J. Wysocki,
	Cc: Viresh Kumar, Cc: Juri Lelli, Cc: Patrick Bellasi,
	Cc: Steve Muckle, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel


On Sat, Oct 28 2017 at 09:59, Joel Fernandes wrote:
> find_idlest_group_cpu goes through CPUs of a group previous selected by
> find_idlest_group. find_idlest_group returns NULL if the local group is the
> selected one and doesn't execute find_idlest_group_cpu if the group to which
> 'cpu' belongs to is chosen. So we're always guaranteed to call
> find_idlest_group_cpu with a group to which cpu is non-local. This makes one of
> the conditions in find_idlest_group_cpu an impossible one, which we can get rid
> off.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Brendan Jackman <brendan.jackman@arm.com>
> Cc: Dietmar <dietmar.eggemann@arm.com>
> Signed-off-by: Joel Fernandes <joelaf@google.com>

FWIW:

Reviewed-by: Brendan Jackman <brendan.jackman@arm.com>

> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5c49fdb4c508..740602ce799f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5922,7 +5922,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
>  			}
>  		} else if (shallowest_idle_cpu == -1) {
>  			load = weighted_cpuload(cpu_rq(i));
> -			if (load < min_load || (load == min_load && i == this_cpu)) {
> +			if (load < min_load) {
>  				min_load = load;
>  				least_loaded_cpu = i;
>  			}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu
  2017-10-28  9:59 ` [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu Joel Fernandes
  2017-10-30 15:41   ` Brendan Jackman
@ 2017-10-30 16:00   ` Vincent Guittot
  2017-10-30 16:19     ` Joel Fernandes
  1 sibling, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2017-10-30 16:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Brendan Jackman,
	Dietmar, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Rafael J. Wysocki, Cc: Viresh Kumar, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Morten Ramussen, Cc: Frederic Weisbecker,
	Cc: Thomas Gleixner, Cc: EAS Dev, Cc: Android Kernel

On 28 October 2017 at 11:59, Joel Fernandes <joelaf@google.com> wrote:
> find_idlest_group_cpu goes through CPUs of a group previous selected by
> find_idlest_group. find_idlest_group returns NULL if the local group is the
> selected one and doesn't execute find_idlest_group_cpu if the group to which
> 'cpu' belongs to is chosen. So we're always guaranteed to call
> find_idlest_group_cpu with a group to which cpu is non-local. This makes one of

Is this still true in case of overlapping topology ?

> the conditions in find_idlest_group_cpu an impossible one, which we can get rid
> off.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Brendan Jackman <brendan.jackman@arm.com>
> Cc: Dietmar <dietmar.eggemann@arm.com>
> Signed-off-by: Joel Fernandes <joelaf@google.com>
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5c49fdb4c508..740602ce799f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5922,7 +5922,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
>                         }
>                 } else if (shallowest_idle_cpu == -1) {
>                         load = weighted_cpuload(cpu_rq(i));
> -                       if (load < min_load || (load == min_load && i == this_cpu)) {
> +                       if (load < min_load) {
>                                 min_load = load;
>                                 least_loaded_cpu = i;
>                         }
> --
> 2.15.0.rc2.357.g7e34df9404-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu
  2017-10-30 16:00   ` Vincent Guittot
@ 2017-10-30 16:19     ` Joel Fernandes
  2017-10-30 16:26       ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-10-30 16:19 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Brendan Jackman,
	Dietmar, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Rafael J. Wysocki, Cc: Viresh Kumar, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Morten Ramussen, Cc: Frederic Weisbecker,
	Cc: Thomas Gleixner, Cc: EAS Dev, Cc: Android Kernel

Hi Vincent,

On Mon, Oct 30, 2017 at 9:00 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 28 October 2017 at 11:59, Joel Fernandes <joelaf@google.com> wrote:
>> find_idlest_group_cpu goes through CPUs of a group previous selected by
>> find_idlest_group. find_idlest_group returns NULL if the local group is the
>> selected one and doesn't execute find_idlest_group_cpu if the group to which
>> 'cpu' belongs to is chosen. So we're always guaranteed to call
>> find_idlest_group_cpu with a group to which cpu is non-local. This makes one of
>
> Is this still true in case of overlapping topology ?

Yes, I believe so. As per the code, find_idlest_group will only return
a group to which this_cpu doesn't belong to. So incase an overlapping
group was returned by find_idlest_group (instead of NULL), then none
of the groups (among the set of overlapping groups) is local to
this_cpu. Incase NULL is returned, then find_idlest_group_cpu doesn't
execute at all.

Do you agree?

thanks,

- Joel

[..]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu
  2017-10-30 16:19     ` Joel Fernandes
@ 2017-10-30 16:26       ` Vincent Guittot
  0 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2017-10-30 16:26 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Brendan Jackman,
	Dietmar, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Rafael J. Wysocki, Cc: Viresh Kumar, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Morten Ramussen, Cc: Frederic Weisbecker,
	Cc: Thomas Gleixner, Cc: EAS Dev, Cc: Android Kernel

On 30 October 2017 at 17:19, Joel Fernandes <joelaf@google.com> wrote:
> Hi Vincent,
>
> On Mon, Oct 30, 2017 at 9:00 AM, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
>> On 28 October 2017 at 11:59, Joel Fernandes <joelaf@google.com> wrote:
>>> find_idlest_group_cpu goes through CPUs of a group previous selected by
>>> find_idlest_group. find_idlest_group returns NULL if the local group is the
>>> selected one and doesn't execute find_idlest_group_cpu if the group to which
>>> 'cpu' belongs to is chosen. So we're always guaranteed to call
>>> find_idlest_group_cpu with a group to which cpu is non-local. This makes one of
>>
>> Is this still true in case of overlapping topology ?
>
> Yes, I believe so. As per the code, find_idlest_group will only return
> a group to which this_cpu doesn't belong to. So incase an overlapping
> group was returned by find_idlest_group (instead of NULL), then none
> of the groups (among the set of overlapping groups) is local to
> this_cpu. Incase NULL is returned, then find_idlest_group_cpu doesn't
> execute at all.
>
> Do you agree?

Yes you're right. we skip of all groups to which this_cpu belong to

>
> thanks,
>
> - Joel
>
> [..]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-10-30 12:07   ` Viresh Kumar
@ 2017-10-30 19:02     ` Joel Fernandes
  2017-11-01 19:35       ` Steve Muckle
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-10-30 19:02 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: LKML, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel, Steven Rostedt

Hi Viresh,

On Mon, Oct 30, 2017 at 5:07 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 28-10-17, 02:59, Joel Fernandes wrote:
>> Updating CPU frequency on last dequeue of a CPU is useless. Because the
>> utilization since CPU came out of idle can increase till the last dequeue, this
>> means we are requesting for a higher frequency before entering idle which is
>> not very meaningful or useful. It causes unwanted wakeups of the schedutil
>> governor kthread in slow-switch systems resulting in large number of wake ups
>> that could have been avoided. In an Android application playing music where the
>> music app's thread wakes up and sleeps periodically on an Android device, its
>> seen that the frequency increases slightly on the dequeue and is reduced when
>> the task wakes up again. This oscillation continues between 300Mhz and 350Mhz,
>> and while the task is running, its at 300MHz the whole time. This is pointless.
>> Adding to that, these are unnecessary wake ups. Infact most of the time when
>> the sugov thread wakes up, all the CPUs are idle - so it can hurt power by
>> disturbing the cluster when it is idling.
>>
>> This patch prevents a frequency update on the last dequeue. With this the
>> number of schedutil governor thread wake ups are reduces more than 2 times
>> (1389 -> 527).
>>
>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Joel Fernandes <joelaf@google.com>
>> ---
>>  kernel/sched/fair.c  | 25 ++++++++++++++++++++++---
>>  kernel/sched/sched.h |  1 +
>>  2 files changed, 23 insertions(+), 3 deletions(-)
>
> So you are doing this only for CFS, isn't that required for RT/DL as
> well?

Yes I agree we should. I remember this patch from Patrick doing
something similar for RT:
https://patchwork.kernel.org/patch/9825461/

That patch prevents cpufreq update from dequeue_task_rt path since we
no longer would trigger it from update_curr_rt all the time. Is this
an acceptable solution for RT?

>
> Also, this more looks like a policy decision. Will it be better to
> put that directly into schedutil? Like this:
>
>         if (cpu_idle())
>                 "Don't change the freq";
>
> Will something like that work?

I thought about this and I think it wont work very well. In the
dequeue path we're still running the task being dequeued so the CPU is
not yet idle. What is needed here IMO is a notion that the CPU is
possibly about to idle and we can get predict that from the dequeue
path of the CFS class.

Also just looking at whether the CPU is currently idle or not in the
governor doesn't help to differentiate between say the dequeue path /
tick path. Both of which can occur when the CPU is not idle.

Any thoughts about this?

thanks,

- Joel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util
  2017-10-30  9:22   ` Viresh Kumar
@ 2017-10-30 19:16     ` Joel Fernandes
  0 siblings, 0 replies; 20+ messages in thread
From: Joel Fernandes @ 2017-10-30 19:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: LKML, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Steve Muckle, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel

Hi Viresh,

On Mon, Oct 30, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> You have prefixed most of the Cc'd names with "Cc: " somehow :)

Yeah :( What happened is I decided to play with using a text file to
input --cc-cmd for git send-patch. Turns out I was too careless with
forgetting to remove the "Cc: " prefix when I created the CC list.
Anyway lesson learnt :)

>
> On 28-10-17, 02:59, Joel Fernandes wrote:
>> Since the remote cpufreq callback work, the cpufreq_update_util call can happen
>> from remote CPUs. The comment about local CPUs is thus obsolete. Update it
>> accordingly.
>
> We normally keep the column width as 72 in commit logs instead of 80,
> as with 'git log' this is indented by a tab and then we would cross 80
> columns.

Ok, I'll keep that in mind and use 72 characters for future patches.

>
>> Cc: Viresh Kumar <viresh.kumar@linaro.org>
>> Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Joel Fernandes <joelaf@google.com>
>> ---
>>  kernel/sched/fair.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 4c06e52935d3..5c49fdb4c508 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3018,9 +3018,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
>>               /*
>>                * There are a few boundary cases this might miss but it should
>>                * get called often enough that that should (hopefully) not be
>> -              * a real problem -- added to that it only calls on the local
>> -              * CPU, so if we enqueue remotely we'll miss an update, but
>> -              * the next tick/schedule should update.
>> +              * a real problem.
>>                *
>>                * It will not get called when we go idle, because the idle
>>                * thread is a different class (!fair), nor will the utilization
>
> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

Thank you!

- Joel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-10-30 19:02     ` Joel Fernandes
@ 2017-11-01 19:35       ` Steve Muckle
  2017-11-04  5:44         ` Joel Fernandes
  0 siblings, 1 reply; 20+ messages in thread
From: Steve Muckle @ 2017-11-01 19:35 UTC (permalink / raw)
  To: Joel Fernandes, Viresh Kumar
  Cc: LKML, Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Brendan Jackman, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Dietmar Eggemann, Cc: Vincent Guittot,
	Cc: Morten Ramussen, Cc: Frederic Weisbecker,
	Cc: Thomas Gleixner, Cc: EAS Dev, Cc: Android Kernel,
	Steven Rostedt

On 10/30/2017 12:02 PM, Joel Fernandes wrote:
>> Also, this more looks like a policy decision. Will it be better to
>> put that directly into schedutil? Like this:
>>
>>          if (cpu_idle())
>>                  "Don't change the freq";
>>
>> Will something like that work?
> 
> I thought about this and I think it wont work very well. In the
> dequeue path we're still running the task being dequeued so the CPU is
> not yet idle. What is needed here IMO is a notion that the CPU is
> possibly about to idle and we can get predict that from the dequeue
> path of the CFS class.
> 
> Also just looking at whether the CPU is currently idle or not in the
> governor doesn't help to differentiate between say the dequeue path /
> tick path. Both of which can occur when the CPU is not idle.
> 
> Any thoughts about this?

Also if it really is the case that this bit of policy is universally 
desirable, I'd think it is better to do this in the scheduler and avoid 
a needless trip through a fn pointer out to schedutil for performance 
reasons.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-11-01 19:35       ` Steve Muckle
@ 2017-11-04  5:44         ` Joel Fernandes
  2017-11-06  8:00           ` Vincent Guittot
  0 siblings, 1 reply; 20+ messages in thread
From: Joel Fernandes @ 2017-11-04  5:44 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Viresh Kumar, LKML, Rafael J . Wysocki, Ingo Molnar,
	Peter Zijlstra, Cc: Srinivas Pandruvada, Cc: Len Brown,
	Cc: Juri Lelli, Cc: Patrick Bellasi, Cc: Brendan Jackman,
	Cc: Chris Redpath, Cc: Atish Patra, Cc: Dietmar Eggemann,
	Cc: Vincent Guittot, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel, Steven Rostedt

On Wed, Nov 1, 2017 at 12:35 PM, Steve Muckle <smuckle@google.com> wrote:
> On 10/30/2017 12:02 PM, Joel Fernandes wrote:
>>>
>>> Also, this more looks like a policy decision. Will it be better to
>>> put that directly into schedutil? Like this:
>>>
>>>          if (cpu_idle())
>>>                  "Don't change the freq";
>>>
>>> Will something like that work?
>>
>>
>> I thought about this and I think it wont work very well. In the
>> dequeue path we're still running the task being dequeued so the CPU is
>> not yet idle. What is needed here IMO is a notion that the CPU is
>> possibly about to idle and we can get predict that from the dequeue
>> path of the CFS class.
>>
>> Also just looking at whether the CPU is currently idle or not in the
>> governor doesn't help to differentiate between say the dequeue path /
>> tick path. Both of which can occur when the CPU is not idle.
>>
>> Any thoughts about this?
>
>
> Also if it really is the case that this bit of policy is universally
> desirable, I'd think it is better to do this in the scheduler and avoid a
> needless trip through a fn pointer out to schedutil for performance reasons.

I agree.

Peter, what do you think? Are you Ok with the approach of this patch
(preventing of the cpufreq update call during dequeue)?

thanks,

- Joel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-11-04  5:44         ` Joel Fernandes
@ 2017-11-06  8:00           ` Vincent Guittot
  2017-11-06  9:29             ` Uladzislau Rezki
  2017-11-08  5:11             ` Joel Fernandes
  0 siblings, 2 replies; 20+ messages in thread
From: Vincent Guittot @ 2017-11-06  8:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Steve Muckle, Viresh Kumar, LKML, Rafael J . Wysocki,
	Ingo Molnar, Peter Zijlstra, Cc: Srinivas Pandruvada,
	Cc: Len Brown, Cc: Juri Lelli, Cc: Patrick Bellasi,
	Cc: Brendan Jackman, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Dietmar Eggemann, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel, Steven Rostedt

Hi Joel,

On 4 November 2017 at 06:44, Joel Fernandes <joelaf@google.com> wrote:
> On Wed, Nov 1, 2017 at 12:35 PM, Steve Muckle <smuckle@google.com> wrote:
>> On 10/30/2017 12:02 PM, Joel Fernandes wrote:
>>>>
>>>> Also, this more looks like a policy decision. Will it be better to
>>>> put that directly into schedutil? Like this:
>>>>
>>>>          if (cpu_idle())
>>>>                  "Don't change the freq";
>>>>
>>>> Will something like that work?
>>>
>>>
>>> I thought about this and I think it wont work very well. In the
>>> dequeue path we're still running the task being dequeued so the CPU is
>>> not yet idle. What is needed here IMO is a notion that the CPU is
>>> possibly about to idle and we can get predict that from the dequeue
>>> path of the CFS class.
>>>
>>> Also just looking at whether the CPU is currently idle or not in the
>>> governor doesn't help to differentiate between say the dequeue path /
>>> tick path. Both of which can occur when the CPU is not idle.
>>>
>>> Any thoughts about this?
>>
>>
>> Also if it really is the case that this bit of policy is universally
>> desirable, I'd think it is better to do this in the scheduler and avoid a
>> needless trip through a fn pointer out to schedutil for performance reasons.
>
> I agree.
>
> Peter, what do you think? Are you Ok with the approach of this patch
> (preventing of the cpufreq update call during dequeue)?

I'm not really convinced that we should do change OPP at dequeue.
Although i agree that it makes perfect sense to prevent increasing OPP
just before going idle for mp3 playback, i'm not sure that this is
always the case especially for performance oriented use case because
we delay the OPP increase and as a result the responsiveness of the
CPU
In  fact this decision really depends about how long we are going to
sleep. If the cpu will wakes up in few ms, it's worth already
increasing the frequency when utilization is above the threshold
because it will not decreases enough to go back to lower OPP. At the
opposite, if the cpu will not wake up shortly skipping OPP change can
make sense.

Regarding the reduction of the number of OPP changes, will the
util_est feature provide the same amount of reduction by directly
providing the estimated max utilization ?

Just to say that IMHO skipping or not OPP change at dequeue is a
policy decision and not a generic one

>
> thanks,
>
> - Joel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-11-06  8:00           ` Vincent Guittot
@ 2017-11-06  9:29             ` Uladzislau Rezki
  2017-11-08  5:11             ` Joel Fernandes
  1 sibling, 0 replies; 20+ messages in thread
From: Uladzislau Rezki @ 2017-11-06  9:29 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Joel Fernandes, Steve Muckle, Viresh Kumar, LKML,
	Rafael J . Wysocki, Ingo Molnar, Peter Zijlstra,
	Cc: Srinivas Pandruvada, Cc: Len Brown, Cc: Juri Lelli,
	Cc: Patrick Bellasi, Cc: Brendan Jackman, Cc: Chris Redpath,
	Cc: Atish Patra, Cc: Dietmar Eggemann, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel, Steven Rostedt

On Mon, Nov 06, 2017 at 09:00:45AM +0100, Vincent Guittot wrote:
> Hi Joel,
> 
> On 4 November 2017 at 06:44, Joel Fernandes <joelaf@google.com> wrote:
> > On Wed, Nov 1, 2017 at 12:35 PM, Steve Muckle <smuckle@google.com> wrote:
> >> On 10/30/2017 12:02 PM, Joel Fernandes wrote:
> >>>>
> >>>> Also, this more looks like a policy decision. Will it be better to
> >>>> put that directly into schedutil? Like this:
> >>>>
> >>>>          if (cpu_idle())
> >>>>                  "Don't change the freq";
> >>>>
> >>>> Will something like that work?
> >>>
> >>>
> >>> I thought about this and I think it wont work very well. In the
> >>> dequeue path we're still running the task being dequeued so the CPU is
> >>> not yet idle. What is needed here IMO is a notion that the CPU is
> >>> possibly about to idle and we can get predict that from the dequeue
> >>> path of the CFS class.
> >>>
> >>> Also just looking at whether the CPU is currently idle or not in the
> >>> governor doesn't help to differentiate between say the dequeue path /
> >>> tick path. Both of which can occur when the CPU is not idle.
> >>>
> >>> Any thoughts about this?
> >>
> >>
> >> Also if it really is the case that this bit of policy is universally
> >> desirable, I'd think it is better to do this in the scheduler and avoid a
> >> needless trip through a fn pointer out to schedutil for performance reasons.
> >
> > I agree.
> >
> > Peter, what do you think? Are you Ok with the approach of this patch
> > (preventing of the cpufreq update call during dequeue)?
> 
> I'm not really convinced that we should do change OPP at dequeue.
> Although i agree that it makes perfect sense to prevent increasing OPP
> just before going idle for mp3 playback, i'm not sure that this is
> always the case especially for performance oriented use case because
> we delay the OPP increase and as a result the responsiveness of the
> CPU
> In  fact this decision really depends about how long we are going to
> sleep. If the cpu will wakes up in few ms, it's worth already
> increasing the frequency when utilization is above the threshold
> because it will not decreases enough to go back to lower OPP. At the
> opposite, if the cpu will not wake up shortly skipping OPP change can
> make sense.
> 
> Regarding the reduction of the number of OPP changes, will the
> util_est feature provide the same amount of reduction by directly
> providing the estimated max utilization ?
> 
> Just to say that IMHO skipping or not OPP change at dequeue is a
> policy decision and not a generic one
> 
Indeed. Otherwise we may end up in a situation of handling corner
cases in a scheduler when it comes to OPP updates. I also agree that
it is up to policy to do update or not.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle
  2017-11-06  8:00           ` Vincent Guittot
  2017-11-06  9:29             ` Uladzislau Rezki
@ 2017-11-08  5:11             ` Joel Fernandes
  1 sibling, 0 replies; 20+ messages in thread
From: Joel Fernandes @ 2017-11-08  5:11 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Steve Muckle, Viresh Kumar, LKML, Rafael J . Wysocki,
	Ingo Molnar, Peter Zijlstra, Cc: Srinivas Pandruvada,
	Cc: Len Brown, Cc: Juri Lelli, Cc: Patrick Bellasi,
	Cc: Brendan Jackman, Cc: Chris Redpath, Cc: Atish Patra,
	Cc: Dietmar Eggemann, Cc: Morten Ramussen,
	Cc: Frederic Weisbecker, Cc: Thomas Gleixner, Cc: EAS Dev,
	Cc: Android Kernel, Steven Rostedt, Kannan, Saravana

Hi Vincent,

On Mon, Nov 6, 2017 at 12:00 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> Hi Joel,
>
> On 4 November 2017 at 06:44, Joel Fernandes <joelaf@google.com> wrote:
>> On Wed, Nov 1, 2017 at 12:35 PM, Steve Muckle <smuckle@google.com> wrote:
>>> On 10/30/2017 12:02 PM, Joel Fernandes wrote:
>>>>>
>>>>> Also, this more looks like a policy decision. Will it be better to
>>>>> put that directly into schedutil? Like this:
>>>>>
>>>>>          if (cpu_idle())
>>>>>                  "Don't change the freq";
>>>>>
>>>>> Will something like that work?
>>>>
>>>>
>>>> I thought about this and I think it wont work very well. In the
>>>> dequeue path we're still running the task being dequeued so the CPU is
>>>> not yet idle. What is needed here IMO is a notion that the CPU is
>>>> possibly about to idle and we can get predict that from the dequeue
>>>> path of the CFS class.
>>>>
>>>> Also just looking at whether the CPU is currently idle or not in the
>>>> governor doesn't help to differentiate between say the dequeue path /
>>>> tick path. Both of which can occur when the CPU is not idle.
>>>>
>>>> Any thoughts about this?
>>>
>>>
>>> Also if it really is the case that this bit of policy is universally
>>> desirable, I'd think it is better to do this in the scheduler and avoid a
>>> needless trip through a fn pointer out to schedutil for performance reasons.
>>
>> I agree.
>>
>> Peter, what do you think? Are you Ok with the approach of this patch
>> (preventing of the cpufreq update call during dequeue)?
>
> I'm not really convinced that we should do change OPP at dequeue.

You mean "should not do", right?

> Although i agree that it makes perfect sense to prevent increasing OPP
> just before going idle for mp3 playback, i'm not sure that this is
> always the case especially for performance oriented use case because
> we delay the OPP increase and as a result the responsiveness of the
> CPU

Actually I think another way to look at it is this is no worse
performance-wise than if the task was running all the time (didn't
sleep) - the next granularity for increasing the frequency would then
be the next Tick.. So, I am not sure practically it makes any
difference to the performance of the task. Even if sleep was short, we
would update frequency on next enqueue/tick so I think we're still
fine from a time-granularity standpoint.

> In  fact this decision really depends about how long we are going to
> sleep. If the cpu will wakes up in few ms, it's worth already
> increasing the frequency when utilization is above the threshold
> because it will not decreases enough to go back to lower OPP. At the
> opposite, if the cpu will not wake up shortly skipping OPP change can
> make sense.
>
> Regarding the reduction of the number of OPP changes, will the
> util_est feature provide the same amount of reduction by directly
> providing the estimated max utilization ?

Yes, I think util_est can help reduce the oscillations which cause
this issue but other than the fact that util_est is not currently
mainlined, I think util_est will still have the same issue if the
util_est's estimation itself oscillates so I think util_est is a
mitigation than a solution. Do you agree?

> Just to say that IMHO skipping or not OPP change at dequeue is a
> policy decision and not a generic one

Yes I agree, but also this means we have to expose scheduler internals
to the governor to detect this case. Are you suggesting if this was to
be implemented - that we pass a flag to governor and make it to do the
decision there based on policy?
As I was discussing in earlier thread with Viresh, simply checking if
CPU is idle in the governor isn't good enough and since this issue is
about the scheduler making cpufreq update call when it didn't need to,
avoiding such a call would make sense to me.

thanks,

- Joel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-11-08  5:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-28  9:59 [PATCH RFC 0/5] sched and cpufreq fixes/cleanups Joel Fernandes
2017-10-28  9:59 ` [PATCH RFC 1/5] Revert "sched/fair: Drop always true parameter of update_cfs_rq_load_avg()" Joel Fernandes
2017-10-28  9:59 ` [PATCH RFC 2/5] sched/fair: Skip frequency update if CPU about to idle Joel Fernandes
2017-10-30 12:07   ` Viresh Kumar
2017-10-30 19:02     ` Joel Fernandes
2017-11-01 19:35       ` Steve Muckle
2017-11-04  5:44         ` Joel Fernandes
2017-11-06  8:00           ` Vincent Guittot
2017-11-06  9:29             ` Uladzislau Rezki
2017-11-08  5:11             ` Joel Fernandes
2017-10-28  9:59 ` [PATCH RFC 3/5] cpufreq: schedutil: Use idle_calls counter of the remote CPU Joel Fernandes
2017-10-30  9:18   ` Viresh Kumar
2017-10-28  9:59 ` [PATCH RFC 4/5] sched/fair: Correct obsolete comment about cpufreq_update_util Joel Fernandes
2017-10-30  9:22   ` Viresh Kumar
2017-10-30 19:16     ` Joel Fernandes
2017-10-28  9:59 ` [PATCH RFC 5/5] sched/fair: remove impossible condition from find_idlest_group_cpu Joel Fernandes
2017-10-30 15:41   ` Brendan Jackman
2017-10-30 16:00   ` Vincent Guittot
2017-10-30 16:19     ` Joel Fernandes
2017-10-30 16:26       ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.