linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
@ 2017-11-30 11:47 Patrick Bellasi
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
                   ` (6 more replies)
  0 siblings, 7 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

This is a re-spin of a previous posting [1], rebased on v4.15-rc1.

A detailed description of the series, as well as experimental results,
is available in the cover letter of the previous posting.
Here is a resume of the main results:

- reduces energy consumption by ~50% by ensuring that a small task is always
  running at the minimum OPP even when the sugov's RT kthread is used to change
  frequencies in the same cluster

- increase from 4% to 98% the chances for a RT task to complete its activations
  while running at the max OPP.

- The sequence of tracing events reported in a trace is much more deterministic
  and matching the expected system behaviors.  For example, as soon as a RT
  task wakes up we ask for an OPP switch to max frequency.

A detailed report for these results is available here [2].

According to the discussion in the previous posting, one patch:

   cpufreq: schedutil: ignore the sugov kthread for frequencies selections

was considered the more controversial.
That's why in this posting I'm still proposing that patch, since it
still makes sense to me, but I've moved it at the end of the series,
after the less controversial ones.

Cheers Patrick

.:: References
==============

[1] https://lkml.org/lkml/2017/7/4/495
[2] https://gist.github.com/derkling/0cd7210e4fa6f2ec3558073006e5ad70

Patrick Bellasi (6):
  cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  cpufreq: schedutil: update CFS util only if used
  sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  cpufreq: schedutil: relax rate-limiting while running RT/DL tasks
  cpufreq: schedutil: ignore sugov kthreads

 include/linux/sched/cpufreq.h    |  1 +
 kernel/sched/cpufreq_schedutil.c | 61 ++++++++++++++++++++++++++++++++--------
 kernel/sched/idle_task.c         |  4 +++
 kernel/sched/rt.c                | 15 ++++++++--
 4 files changed, 67 insertions(+), 14 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:12   ` Juri Lelli
                     ` (2 more replies)
  2017-11-30 11:47 ` [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks Patrick Bellasi
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

Currently, sg_cpu's flags are set to the value defined by the last call
of the cpufreq_update_util(); for RT/DL classes this corresponds to the
SCHED_CPUFREQ_{RT/DL} flags always being set.

When multiple CPUs share the same frequency domain it might happen that
a CPU which executed an RT task, right before entering IDLE, has one of
the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE.

Although such an idle CPU is _going to be_ ignored by the
sugov_next_freq_shared():
  1. this kind of "useless RT requests" are ignored only if more then
     TICK_NSEC have elapsed since the last update
  2. we can still potentially trigger an already too late switch to
     MAX, which starts also a new throttling interval
  3. the internal state machine is not consistent with what the
     scheduler knows, i.e. the CPU is now actually idle

Thus, in sugov_next_freq_shared(), where utilisation and flags are
aggregated across all the CPUs of a frequency domain, it can turn out
that all the CPUs of that domain can run unnecessary at the maximum OPP
until another event happens in the idle CPU, which eventually clears the
SCHED_CPUFREQ_{RT/DL} flag, or the IDLE CPUs gets ignored after
TICK_NSEC [ns] since the CPU entering IDLE.

Such a behaviour can harm the energy efficiency of systems where RT
workloads are not so frequent and other CPUs in the same frequency
domain are running small utilisation workloads, which is a quite common
scenario in mobile embedded systems.

This patch proposes a solution which is aligned with the current
principle to update the flags each time a scheduling event happens. The
scheduling of the idle_task on a CPU is considered one of such
meaningful events.  That's why when the idle_task is selected for
execution we poke the schedutil policy to reset the flags for that CPU.

No frequency transitions are activated at that point, which is fair in
case the RT workload should come back in the future. However, this still
allows other CPUs in the same frequency domain to scale down the
frequency in case that should be possible.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- use cpufreq_update_util() instead of cpufreq_update_this_cpu()
- rebased on v4.15-rc1

Changes from v1:
- added "unlikely()" around the statement (SteveR)

Change-Id: I1192ca9a3acb767cb3a745967a7a23a17e1af7b7
---
 include/linux/sched/cpufreq.h    | 1 +
 kernel/sched/cpufreq_schedutil.c | 7 +++++++
 kernel/sched/idle_task.c         | 4 ++++
 3 files changed, 12 insertions(+)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d1ad3d825561..bb5f778db023 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -11,6 +11,7 @@
 #define SCHED_CPUFREQ_RT	(1U << 0)
 #define SCHED_CPUFREQ_DL	(1U << 1)
 #define SCHED_CPUFREQ_IOWAIT	(1U << 2)
+#define SCHED_CPUFREQ_IDLE	(1U << 3)
 
 #define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
 
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2f52ec0f1539..67339ccb5595 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 
 	sg_cpu->util = util;
 	sg_cpu->max = max;
+
+	/* CPU is entering IDLE, reset flags without triggering an update */
+	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
+		sg_cpu->flags = 0;
+		goto done;
+	}
 	sg_cpu->flags = flags;
 
 	sugov_set_iowait_boost(sg_cpu, time, flags);
@@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
+done:
 	raw_spin_unlock(&sg_policy->update_lock);
 }
 
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664cce4f..6e8ae2aa7a13 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
 	put_prev_task(rq, prev);
 	update_idle_core(rq);
 	schedstat_inc(rq->sched_goidle);
+
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
+
 	return rq->idle;
 }
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:17   ` Juri Lelli
  2017-12-07  5:05   ` Viresh Kumar
  2017-11-30 11:47 ` [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used Patrick Bellasi
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes, Steve Muckle

The policy in use for RT/DL tasks sets the maximum frequency when a task
in these classes calls for a cpufreq_update_util().  However, the
current implementation might cause a frequency drop while a RT/DL task
is still running, just because for example a FAIR task wakes up and it's
enqueued in the same CPU.

This issue is due to the sg_cpu's flags being overwritten at each call
of sugov_update_*. Thus, the wakeup of a FAIR task resets the flags and
can trigger a frequency update thus affecting the currently running
RT/DL task.

This can be fixed, in shared frequency domains, by ORing (instead of
overwriting) the new flag before triggering a frequency update.  This
grants to stay at least at the frequency requested by the RT/DL class,
which is the maximum one for the time being.

This patch does the flags aggregation in the schedutil governor, where
it's easy to verify if we currently have RT/DL workload on a CPU.
This approach is aligned with the current schedutil API design where the
core scheduler does not interact directly with schedutil, while instead
are the scheduling classes which call directly into the policy via
cpufreq_update_util. Thus, it makes more sense to have flags
aggregation in the schedutil code instead of the core scheduler.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Steve Muckle <smuckle.linux@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- rebased on v4.15-rc1

Changes from v1:
- use "current" to check for RT/DL tasks (PeterZ)

Change-Id: Ia4bd6ae09ae034a954d37cd38ffea86396ac1257
---
 kernel/sched/cpufreq_schedutil.c | 34 +++++++++++++++++++++++++++-------
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 67339ccb5595..448f49de5335 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -262,6 +262,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned long util, max;
 	unsigned int next_f;
+	bool rt_mode;
 	bool busy;
 
 	sugov_set_iowait_boost(sg_cpu, time, flags);
@@ -272,7 +273,15 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	if (flags & SCHED_CPUFREQ_RT_DL) {
+	/*
+	 * While RT/DL tasks are running we do not want FAIR tasks to
+	 * overvrite this CPU's flags, still we can update utilization and
+	 * frequency (if required/possible) to be fair with these tasks.
+	 */
+	rt_mode = task_has_dl_policy(current) ||
+		  task_has_rt_policy(current) ||
+		  (flags & SCHED_CPUFREQ_RT_DL);
+	if (rt_mode) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
 		sugov_get_util(&util, &max, sg_cpu->cpu);
@@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long util, max;
 	unsigned int next_f;
+	bool rt_mode;
 
 	sugov_get_util(&util, &max, sg_cpu->cpu);
 
@@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 		sg_cpu->flags = 0;
 		goto done;
 	}
-	sg_cpu->flags = flags;
+
+	/*
+	 * While RT/DL tasks are running we do not want FAIR tasks to
+	 * overwrite this CPU's flags, still we can update utilization and
+	 * frequency (if required/possible) to be fair with these tasks.
+	 */
+	rt_mode = task_has_dl_policy(current) ||
+		  task_has_rt_policy(current) ||
+		  (flags & SCHED_CPUFREQ_RT_DL);
+	if (rt_mode)
+		sg_cpu->flags |= flags;
+	else
+		sg_cpu->flags = flags;
 
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT_DL)
-			next_f = sg_policy->policy->cpuinfo.max_freq;
-		else
-			next_f = sugov_next_freq_shared(sg_cpu, time);
-
+		next_f = rt_mode
+			? sg_policy->policy->cpuinfo.max_freq
+			: sugov_next_freq_shared(sg_cpu, time);
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
  2017-11-30 11:47 ` [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:22   ` Juri Lelli
  2017-11-30 11:47 ` [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled Patrick Bellasi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

Currently the utilization of the FAIR class is collected before locking
the policy. Although that should not be a big issue for most cases, we
also don't really know how much latency there can be between the
utilization reading and its usage.

Let's get the FAIR utilization right before its usage to be better in
sync with the current status of a CPU.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- rebased on v4.15-rc1

Change-Id: I9291a560bcad7db76894e3f0fcdb917511d0479e
---
 kernel/sched/cpufreq_schedutil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 448f49de5335..40521d59630b 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -351,10 +351,9 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	unsigned int next_f;
 	bool rt_mode;
 
-	sugov_get_util(&util, &max, sg_cpu->cpu);
-
 	raw_spin_lock(&sg_policy->update_lock);
 
+	sugov_get_util(&util, &max, sg_cpu->cpu);
 	sg_cpu->util = util;
 	sg_cpu->max = max;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
                   ` (2 preceding siblings ...)
  2017-11-30 11:47 ` [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:28   ` Juri Lelli
  2017-12-06  9:39   ` Vincent Guittot
  2017-11-30 11:47 ` [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks Patrick Bellasi
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

Currently schedutil updates are triggered for the RT class using a single
call place, which is part of the rt::update_curr_rt() used in:

- dequeue_task_rt:
  but it does not make sense to set the schedutil's SCHED_CPUFREQ_RT in
  case the next task should not be an RT one

- put_prev_task_rt:
  likewise, we set the SCHED_CPUFREQ_RT flag without knowing if required
  by the next task

- pick_next_task_rt:
  likewise, the schedutil's SCHED_CPUFREQ_RT is set in case the prev task
  was RT, while we don't yet know if the next will be RT

- task_tick_rt:
  that's the only really useful call, which can ramp up the frequency in
  case a RT task started its execution without a chance to order a
  frequency switch (e.g. because of the schedutil ratelimit)

Apart from the last call in task_tick_rt, the others are at least useless.
Thus, although being a simple solution, not all the call sites of that
update_curr_rt() are interesting to trigger a frequency switch as well as
some of the most interesting points are not covered by that call.
For example, a task set to RT has to wait the next tick to get the
frequency boost.

This patch fixes these issues by placing explicitly the schedutils
update calls in the only sensible places, which are:
- when an RT task wakes up and it's enqueued in a CPU
- when we actually pick a RT task for execution
- at each tick time
- when a task is set to be RT

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- rebased on v4.15-rc1
- use cpufreq_update_util() instead of cpufreq_update_this_cpu()

Change-Id: I3794615819270fe175cb118eef3f7edd61f602ba
---
 kernel/sched/rt.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4056c19ca3f0..6984032598a6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq)
 	if (unlikely((s64)delta_exec <= 0))
 		return;
 
-	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
-
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
 
@@ -1327,6 +1324,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_task(rq, p);
+
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
 }
 
 static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
@@ -1564,6 +1564,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	p = _pick_next_task_rt(rq);
 
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
+
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
 
@@ -2282,6 +2285,9 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 {
 	struct sched_rt_entity *rt_se = &p->rt;
 
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
+
 	update_curr_rt(rq);
 
 	watchdog(rq, p);
@@ -2317,6 +2323,9 @@ static void set_curr_task_rt(struct rq *rq)
 
 	p->se.exec_start = rq_clock_task(rq);
 
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
+
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
 }
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
                   ` (3 preceding siblings ...)
  2017-11-30 11:47 ` [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:36   ` Juri Lelli
  2017-11-30 11:47 ` [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads Patrick Bellasi
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
  6 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

The policy in use for RT/DL tasks sets the maximum frequency when a task
in these classes calls for a cpufreq_update_util().  However, the
current implementation is still enforcing a frequency switch rate
limiting when these tasks are running.
This is potentially working against the goal to switch to the maximum OPP
when RT tasks are running. In certain unfortunate cases it can also happen
that a RT task almost completes its activation at a lower OPP.

This patch overrides on purpose the rate limiting configuration
to better serve RT/DL tasks. As long as a frequency scaling operation
is not in progress, a frequency switch is always authorized when
running in "rt_mode", i.e. the current task in a CPU belongs to the
RT/DL class.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- rebased on v4.15-rc1

Change-Id: I733d47b9e265cebb2e3e5e71a3cd468e9be002d1
---
 kernel/sched/cpufreq_schedutil.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 40521d59630b..3eea8884e61b 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -74,7 +74,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
 
 /************************ Governor internals ***********************/
 
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
+static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
+				     u64 time, bool rt_mode)
 {
 	s64 delta_ns;
 
@@ -111,6 +112,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 		return true;
 	}
 
+	/* Always update if a RT/DL task is running */
+	if (rt_mode)
+		return true;
+
 	delta_ns = time - sg_policy->last_freq_update_time;
 	return delta_ns >= sg_policy->freq_update_delay_ns;
 }
@@ -268,11 +273,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
-	if (!sugov_should_update_freq(sg_policy, time))
-		return;
-
-	busy = sugov_cpu_is_busy(sg_cpu);
-
 	/*
 	 * While RT/DL tasks are running we do not want FAIR tasks to
 	 * overvrite this CPU's flags, still we can update utilization and
@@ -281,6 +281,11 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	rt_mode = task_has_dl_policy(current) ||
 		  task_has_rt_policy(current) ||
 		  (flags & SCHED_CPUFREQ_RT_DL);
+	if (!sugov_should_update_freq(sg_policy, time, rt_mode))
+		return;
+
+	busy = sugov_cpu_is_busy(sg_cpu);
+
 	if (rt_mode) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
@@ -379,7 +384,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
-	if (sugov_should_update_freq(sg_policy, time)) {
+	if (sugov_should_update_freq(sg_policy, time, rt_mode)) {
 		next_f = rt_mode
 			? sg_policy->policy->cpuinfo.max_freq
 			: sugov_next_freq_shared(sg_cpu, time);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
                   ` (4 preceding siblings ...)
  2017-11-30 11:47 ` [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks Patrick Bellasi
@ 2017-11-30 11:47 ` Patrick Bellasi
  2017-11-30 13:41   ` Juri Lelli
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
  6 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 11:47 UTC (permalink / raw)
  To: linux-kernel, linux-pm
  Cc: Ingo Molnar, Peter Zijlstra, Rafael J . Wysocki, Viresh Kumar,
	Vincent Guittot, Dietmar Eggemann, Morten Rasmussen, Juri Lelli,
	Todd Kjos, Joel Fernandes

In system where multiple CPUs shares the same frequency domain a small
workload on a CPU can still be subject to frequency spikes, generated by
the activation of the sugov's kthread.

Since the sugov kthread is a special RT task, which goal is just that to
activate a frequency transition, it does not make sense for it to bias
the schedutil's frequency selection policy.

This patch exploits the information related to the current task to silently
ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
the sugov kthread is running.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes from v2:
- rebased on v4.15-rc1
- moved at the end of the stack since considered more controversial
Changes from v1:
- move check before policy spinlock (JuriL)

Change-Id: I4d749458229b6496dd24a8c357be42cd35a739fd
---
 kernel/sched/cpufreq_schedutil.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 3eea8884e61b..a93ad5b0c40d 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -270,6 +270,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	bool rt_mode;
 	bool busy;
 
+	/* Skip updates generated by sugov kthreads */
+	if (unlikely(current == sg_policy->thread))
+		return;
+
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
@@ -356,6 +360,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	unsigned int next_f;
 	bool rt_mode;
 
+	/* Skip updates generated by sugov kthreads */
+	if (unlikely(current == sg_policy->thread))
+		return;
+
 	raw_spin_lock(&sg_policy->update_lock);
 
 	sugov_get_util(&util, &max, sg_cpu->cpu);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
@ 2017-11-30 13:12   ` Juri Lelli
  2017-11-30 15:41     ` Patrick Bellasi
  2017-12-07  5:01   ` Viresh Kumar
  2017-12-20 14:33   ` Peter Zijlstra
  2 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:12 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:

[...]

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2f52ec0f1539..67339ccb5595 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  
>  	sg_cpu->util = util;
>  	sg_cpu->max = max;
> +
> +	/* CPU is entering IDLE, reset flags without triggering an update */
> +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> +		sg_cpu->flags = 0;
> +		goto done;
> +	}

Looks good for now. I'm just thinking that we will happen for DL, as a
CPU that still "has" a sleeping task is not going to be really idle
until the 0-lag time. I guess we could move this at that point in time?

>  	sg_cpu->flags = flags;
>  
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
> @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  		sugov_update_commit(sg_policy, time, next_f);
>  	}
>  
> +done:
>  	raw_spin_unlock(&sg_policy->update_lock);
>  }
>  
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index d518664cce4f..6e8ae2aa7a13 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
>  	put_prev_task(rq, prev);
>  	update_idle_core(rq);
>  	schedstat_inc(rq->sched_goidle);
> +
> +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);

Don't know if it make things any cleaner, but you could add to the
comment that we don't actually trigger a frequency update with this
call.

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-11-30 11:47 ` [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks Patrick Bellasi
@ 2017-11-30 13:17   ` Juri Lelli
  2017-11-30 15:45     ` Patrick Bellasi
  2017-12-07  5:05   ` Viresh Kumar
  1 sibling, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:17 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes,
	Steve Muckle

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:

[...]

> @@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>  	unsigned long util, max;
>  	unsigned int next_f;
> +	bool rt_mode;
>  
>  	sugov_get_util(&util, &max, sg_cpu->cpu);
>  
> @@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  		sg_cpu->flags = 0;
>  		goto done;
>  	}
> -	sg_cpu->flags = flags;
> +
> +	/*
> +	 * While RT/DL tasks are running we do not want FAIR tasks to
> +	 * overwrite this CPU's flags, still we can update utilization and
> +	 * frequency (if required/possible) to be fair with these tasks.
> +	 */
> +	rt_mode = task_has_dl_policy(current) ||
> +		  task_has_rt_policy(current) ||
> +		  (flags & SCHED_CPUFREQ_RT_DL);
> +	if (rt_mode)
> +		sg_cpu->flags |= flags;
> +	else
> +		sg_cpu->flags = flags;
>  
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
>  	if (sugov_should_update_freq(sg_policy, time)) {
> -		if (flags & SCHED_CPUFREQ_RT_DL)
> -			next_f = sg_policy->policy->cpuinfo.max_freq;
> -		else
> -			next_f = sugov_next_freq_shared(sg_cpu, time);
> -
> +		next_f = rt_mode
> +			? sg_policy->policy->cpuinfo.max_freq
> +			: sugov_next_freq_shared(sg_cpu, time);
>  		sugov_update_commit(sg_policy, time, next_f);

Aren't we already at max_freq at this point (since an RT/DL task is
running)? Do we need to trigger a frequency update?

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-11-30 11:47 ` [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used Patrick Bellasi
@ 2017-11-30 13:22   ` Juri Lelli
  2017-11-30 15:57     ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:22 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:
> Currently the utilization of the FAIR class is collected before locking
> the policy. Although that should not be a big issue for most cases, we
> also don't really know how much latency there can be between the
> utilization reading and its usage.
> 
> Let's get the FAIR utilization right before its usage to be better in
> sync with the current status of a CPU.
> 
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> 
> ---
> Changes from v2:
> - rebased on v4.15-rc1
> 
> Change-Id: I9291a560bcad7db76894e3f0fcdb917511d0479e
> ---
>  kernel/sched/cpufreq_schedutil.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 448f49de5335..40521d59630b 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -351,10 +351,9 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	unsigned int next_f;
>  	bool rt_mode;
>  
> -	sugov_get_util(&util, &max, sg_cpu->cpu);
> -
>  	raw_spin_lock(&sg_policy->update_lock);
>  
> +	sugov_get_util(&util, &max, sg_cpu->cpu);
>  	sg_cpu->util = util;
>  	sg_cpu->max = max;

Patch looks good.

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

However, not sure $SUBJECT is really in sync with what the patch does?
CFS gets "used" before and after the patch... last paragraph of the
changelog looks more like it. :)

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  2017-11-30 11:47 ` [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled Patrick Bellasi
@ 2017-11-30 13:28   ` Juri Lelli
  2017-12-06  9:39   ` Vincent Guittot
  1 sibling, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:28 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:
> Currently schedutil updates are triggered for the RT class using a single
> call place, which is part of the rt::update_curr_rt() used in:
> 
> - dequeue_task_rt:
>   but it does not make sense to set the schedutil's SCHED_CPUFREQ_RT in
>   case the next task should not be an RT one
> 
> - put_prev_task_rt:
>   likewise, we set the SCHED_CPUFREQ_RT flag without knowing if required
>   by the next task
> 
> - pick_next_task_rt:
>   likewise, the schedutil's SCHED_CPUFREQ_RT is set in case the prev task
>   was RT, while we don't yet know if the next will be RT
> 
> - task_tick_rt:
>   that's the only really useful call, which can ramp up the frequency in
>   case a RT task started its execution without a chance to order a
>   frequency switch (e.g. because of the schedutil ratelimit)
> 
> Apart from the last call in task_tick_rt, the others are at least useless.
> Thus, although being a simple solution, not all the call sites of that
> update_curr_rt() are interesting to trigger a frequency switch as well as
> some of the most interesting points are not covered by that call.
> For example, a task set to RT has to wait the next tick to get the
> frequency boost.
> 
> This patch fixes these issues by placing explicitly the schedutils
> update calls in the only sensible places, which are:
> - when an RT task wakes up and it's enqueued in a CPU
> - when we actually pick a RT task for execution
> - at each tick time
> - when a task is set to be RT
> 
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks
  2017-11-30 11:47 ` [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks Patrick Bellasi
@ 2017-11-30 13:36   ` Juri Lelli
  2017-11-30 15:54     ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:36 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:
> The policy in use for RT/DL tasks sets the maximum frequency when a task
> in these classes calls for a cpufreq_update_util().  However, the
> current implementation is still enforcing a frequency switch rate
> limiting when these tasks are running.
> This is potentially working against the goal to switch to the maximum OPP
> when RT tasks are running. In certain unfortunate cases it can also happen
> that a RT task almost completes its activation at a lower OPP.
> 
> This patch overrides on purpose the rate limiting configuration
> to better serve RT/DL tasks. As long as a frequency scaling operation
> is not in progress, a frequency switch is always authorized when
> running in "rt_mode", i.e. the current task in a CPU belongs to the
> RT/DL class.
> 
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> 
> ---
> Changes from v2:
> - rebased on v4.15-rc1
> 
> Change-Id: I733d47b9e265cebb2e3e5e71a3cd468e9be002d1

Luckily this gets ignored... :)

> ---
>  kernel/sched/cpufreq_schedutil.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 40521d59630b..3eea8884e61b 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -74,7 +74,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
>  
>  /************************ Governor internals ***********************/
>  
> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
> +				     u64 time, bool rt_mode)
>  {
>  	s64 delta_ns;
>  
> @@ -111,6 +112,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  		return true;
>  	}
>  
> +	/* Always update if a RT/DL task is running */
> +	if (rt_mode)
> +		return true;
> +
>  	delta_ns = time - sg_policy->last_freq_update_time;
>  	return delta_ns >= sg_policy->freq_update_delay_ns;
>  }
> @@ -268,11 +273,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
> -	if (!sugov_should_update_freq(sg_policy, time))
> -		return;
> -
> -	busy = sugov_cpu_is_busy(sg_cpu);
> -
>  	/*
>  	 * While RT/DL tasks are running we do not want FAIR tasks to
>  	 * overvrite this CPU's flags, still we can update utilization and
> @@ -281,6 +281,11 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	rt_mode = task_has_dl_policy(current) ||
>  		  task_has_rt_policy(current) ||
>  		  (flags & SCHED_CPUFREQ_RT_DL);
> +	if (!sugov_should_update_freq(sg_policy, time, rt_mode))
> +		return;
> +
> +	busy = sugov_cpu_is_busy(sg_cpu);
> +
>  	if (rt_mode) {
>  		next_f = policy->cpuinfo.max_freq;
>  	} else {
> @@ -379,7 +384,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
> -	if (sugov_should_update_freq(sg_policy, time)) {
> +	if (sugov_should_update_freq(sg_policy, time, rt_mode)) {
>  		next_f = rt_mode
>  			? sg_policy->policy->cpuinfo.max_freq
>  			: sugov_next_freq_shared(sg_cpu, time);

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

I wonder if we would also need some way to trigger a back to back update
as soon as a currently running one finishes and an RT/DL task asked for
an update (without waiting for the next tick).

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 11:47 ` [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads Patrick Bellasi
@ 2017-11-30 13:41   ` Juri Lelli
  2017-11-30 16:02     ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 13:41 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi,

On 30/11/17 11:47, Patrick Bellasi wrote:
> In system where multiple CPUs shares the same frequency domain a small
> workload on a CPU can still be subject to frequency spikes, generated by
> the activation of the sugov's kthread.
> 
> Since the sugov kthread is a special RT task, which goal is just that to
> activate a frequency transition, it does not make sense for it to bias
> the schedutil's frequency selection policy.
> 
> This patch exploits the information related to the current task to silently
> ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> the sugov kthread is running.
> 
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
> 
> ---
> Changes from v2:
> - rebased on v4.15-rc1
> - moved at the end of the stack since considered more controversial
> Changes from v1:
> - move check before policy spinlock (JuriL)
> 
> Change-Id: I4d749458229b6496dd24a8c357be42cd35a739fd
> ---
>  kernel/sched/cpufreq_schedutil.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 3eea8884e61b..a93ad5b0c40d 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -270,6 +270,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	bool rt_mode;
>  	bool busy;
>  
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
> @@ -356,6 +360,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	unsigned int next_f;
>  	bool rt_mode;
>  
> +	/* Skip updates generated by sugov kthreads */
> +	if (unlikely(current == sg_policy->thread))
> +		return;
> +
>  	raw_spin_lock(&sg_policy->update_lock);
>  
>  	sugov_get_util(&util, &max, sg_cpu->cpu);

If the DL changes (which I shall post again as soon as tip/sched/core is
bumped up to 4.15-rc1) get in first, this is going to be useless (as the
DL kthread gets ignored by the scheduling class itself). But, this looks
good to me "in the meantime".

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 13:12   ` Juri Lelli
@ 2017-11-30 15:41     ` Patrick Bellasi
  2017-11-30 16:02       ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 15:41 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 14:12, Juri Lelli wrote:
> Hi,
> 
> On 30/11/17 11:47, Patrick Bellasi wrote:
> 
> [...]
> 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 2f52ec0f1539..67339ccb5595 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  
> >  	sg_cpu->util = util;
> >  	sg_cpu->max = max;
> > +
> > +	/* CPU is entering IDLE, reset flags without triggering an update */
> > +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > +		sg_cpu->flags = 0;
> > +		goto done;
> > +	}
> 
> Looks good for now. I'm just thinking that we will happen for DL, as a
> CPU that still "has" a sleeping task is not going to be really idle
> until the 0-lag time.

AFAIU, for the time being, DL already cannot really rely on this flag
for its behaviors to be correct. Indeed, flags are reset as soon as
a FAIR task wakes up and it's enqueued.

Only once your DL integration patches are in, we do not depends on
flags anymore since DL will report a ceratain utilization up to the
0-lag time, isn't it?

If that's the case, I would say that the flags will be used only to
jump to the max OPP for RT tasks. Thus, this patch should still be valid.

> I guess we could move this at that point in time?

Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag
is notified only by idle tasks. That's the only code path where we are
sure the CPU is entering IDLE.

> >  	sg_cpu->flags = flags;
> >  
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  		sugov_update_commit(sg_policy, time, next_f);
> >  	}
> >  
> > +done:
> >  	raw_spin_unlock(&sg_policy->update_lock);
> >  }
> >  
> > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> > index d518664cce4f..6e8ae2aa7a13 100644
> > --- a/kernel/sched/idle_task.c
> > +++ b/kernel/sched/idle_task.c
> > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> >  	put_prev_task(rq, prev);
> >  	update_idle_core(rq);
> >  	schedstat_inc(rq->sched_goidle);
> > +
> > +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> > +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
> 
> Don't know if it make things any cleaner, but you could add to the
> comment that we don't actually trigger a frequency update with this
> call.

Right, will add on next posting.

> Best,
> 
> Juri

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-11-30 13:17   ` Juri Lelli
@ 2017-11-30 15:45     ` Patrick Bellasi
  2017-11-30 16:03       ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 15:45 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes,
	Steve Muckle

On 30-Nov 14:17, Juri Lelli wrote:
> Hi,
> 
> On 30/11/17 11:47, Patrick Bellasi wrote:
> 
> [...]
> 
> > @@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >  	unsigned long util, max;
> >  	unsigned int next_f;
> > +	bool rt_mode;
> >  
> >  	sugov_get_util(&util, &max, sg_cpu->cpu);
> >  
> > @@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  		sg_cpu->flags = 0;
> >  		goto done;
> >  	}
> > -	sg_cpu->flags = flags;
> > +
> > +	/*
> > +	 * While RT/DL tasks are running we do not want FAIR tasks to
> > +	 * overwrite this CPU's flags, still we can update utilization and
> > +	 * frequency (if required/possible) to be fair with these tasks.
> > +	 */
> > +	rt_mode = task_has_dl_policy(current) ||
> > +		  task_has_rt_policy(current) ||
> > +		  (flags & SCHED_CPUFREQ_RT_DL);
> > +	if (rt_mode)
> > +		sg_cpu->flags |= flags;
> > +	else
> > +		sg_cpu->flags = flags;
> >  
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> >  	if (sugov_should_update_freq(sg_policy, time)) {
> > -		if (flags & SCHED_CPUFREQ_RT_DL)
> > -			next_f = sg_policy->policy->cpuinfo.max_freq;
> > -		else
> > -			next_f = sugov_next_freq_shared(sg_cpu, time);
> > -
> > +		next_f = rt_mode
> > +			? sg_policy->policy->cpuinfo.max_freq
> > +			: sugov_next_freq_shared(sg_cpu, time);
> >  		sugov_update_commit(sg_policy, time, next_f);
> 
> Aren't we already at max_freq at this point (since an RT/DL task is
> running)? Do we need to trigger a frequency update?

I think that's required to cover the first time we enter rt_mode in
order to jump to max OPP.

If we are already at max OPP, sugov_update_commit() will just bail out
without doing anything.

Am I missing something?

> 
> Best,
> 
> Juri

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks
  2017-11-30 13:36   ` Juri Lelli
@ 2017-11-30 15:54     ` Patrick Bellasi
  2017-11-30 16:06       ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 15:54 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 14:36, Juri Lelli wrote:
> Hi,
> 
> On 30/11/17 11:47, Patrick Bellasi wrote:
> > The policy in use for RT/DL tasks sets the maximum frequency when a task
> > in these classes calls for a cpufreq_update_util().  However, the
> > current implementation is still enforcing a frequency switch rate
> > limiting when these tasks are running.
> > This is potentially working against the goal to switch to the maximum OPP
> > when RT tasks are running. In certain unfortunate cases it can also happen
> > that a RT task almost completes its activation at a lower OPP.
> > 
> > This patch overrides on purpose the rate limiting configuration
> > to better serve RT/DL tasks. As long as a frequency scaling operation
> > is not in progress, a frequency switch is always authorized when
> > running in "rt_mode", i.e. the current task in a CPU belongs to the
> > RT/DL class.
> > 
> > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > 
> > ---
> > Changes from v2:
> > - rebased on v4.15-rc1
> > 
> > Change-Id: I733d47b9e265cebb2e3e5e71a3cd468e9be002d1
> 
> Luckily this gets ignored... :)

Indeed, and it was intended... just to verify if ML folks can tolerate
the idea to have Change-Ids in the "notes section".

This would help some internal review workflow, which sometimes are
based... yes... ehm... on gerrit :-]

> > ---
> >  kernel/sched/cpufreq_schedutil.c | 19 ++++++++++++-------
> >  1 file changed, 12 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 40521d59630b..3eea8884e61b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -74,7 +74,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
> >  
> >  /************************ Governor internals ***********************/
> >  
> > -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > +static bool sugov_should_update_freq(struct sugov_policy *sg_policy,
> > +				     u64 time, bool rt_mode)
> >  {
> >  	s64 delta_ns;
> >  
> > @@ -111,6 +112,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  		return true;
> >  	}
> >  
> > +	/* Always update if a RT/DL task is running */
> > +	if (rt_mode)
> > +		return true;
> > +
> >  	delta_ns = time - sg_policy->last_freq_update_time;
> >  	return delta_ns >= sg_policy->freq_update_delay_ns;
> >  }
> > @@ -268,11 +273,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> > -	if (!sugov_should_update_freq(sg_policy, time))
> > -		return;
> > -
> > -	busy = sugov_cpu_is_busy(sg_cpu);
> > -
> >  	/*
> >  	 * While RT/DL tasks are running we do not want FAIR tasks to
> >  	 * overvrite this CPU's flags, still we can update utilization and
> > @@ -281,6 +281,11 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  	rt_mode = task_has_dl_policy(current) ||
> >  		  task_has_rt_policy(current) ||
> >  		  (flags & SCHED_CPUFREQ_RT_DL);
> > +	if (!sugov_should_update_freq(sg_policy, time, rt_mode))
> > +		return;
> > +
> > +	busy = sugov_cpu_is_busy(sg_cpu);
> > +
> >  	if (rt_mode) {
> >  		next_f = policy->cpuinfo.max_freq;
> >  	} else {
> > @@ -379,7 +384,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> > -	if (sugov_should_update_freq(sg_policy, time)) {
> > +	if (sugov_should_update_freq(sg_policy, time, rt_mode)) {
> >  		next_f = rt_mode
> >  			? sg_policy->policy->cpuinfo.max_freq
> >  			: sugov_next_freq_shared(sg_cpu, time);
> 
> Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Tks ;-)
 
> I wonder if we would also need some way to trigger a back to back update
> as soon as a currently running one finishes and an RT/DL task asked for
> an update (without waiting for the next tick).

Good point, I think that would actually be an interesting
optimization. We already discussed that in the past, but refrained by
adding more on top of this already "substantial" set of changes.

Can we think about that once we decided about some patches of this
series?

> Best,
> 
> Juri

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-11-30 13:22   ` Juri Lelli
@ 2017-11-30 15:57     ` Patrick Bellasi
  2017-12-07  5:15       ` Viresh Kumar
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 15:57 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 14:22, Juri Lelli wrote:
> Hi,
> 
> On 30/11/17 11:47, Patrick Bellasi wrote:
> > Currently the utilization of the FAIR class is collected before locking
> > the policy. Although that should not be a big issue for most cases, we
> > also don't really know how much latency there can be between the
> > utilization reading and its usage.
> > 
> > Let's get the FAIR utilization right before its usage to be better in
> > sync with the current status of a CPU.
> > 
> > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > 
> > ---
> > Changes from v2:
> > - rebased on v4.15-rc1
> > 
> > Change-Id: I9291a560bcad7db76894e3f0fcdb917511d0479e
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 448f49de5335..40521d59630b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -351,10 +351,9 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	unsigned int next_f;
> >  	bool rt_mode;
> >  
> > -	sugov_get_util(&util, &max, sg_cpu->cpu);
> > -
> >  	raw_spin_lock(&sg_policy->update_lock);
> >  
> > +	sugov_get_util(&util, &max, sg_cpu->cpu);
> >  	sg_cpu->util = util;
> >  	sg_cpu->max = max;
> 
> Patch looks good.
> 
> Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
> 
> However, not sure $SUBJECT is really in sync with what the patch does?
> CFS gets "used" before and after the patch... last paragraph of the
> changelog looks more like it. :)

Yes, that's a pretty trivial update with a confusing changelog.

If we think it's worth to keep (and correct as well) I'll update the
commit message.

Thanks for poinint that out.

> Best,
> 
> Juri

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 15:41     ` Patrick Bellasi
@ 2017-11-30 16:02       ` Juri Lelli
  2017-11-30 16:19         ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 16:02 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30/11/17 15:41, Patrick Bellasi wrote:
> On 30-Nov 14:12, Juri Lelli wrote:
> > Hi,
> > 
> > On 30/11/17 11:47, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 2f52ec0f1539..67339ccb5595 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > >  
> > >  	sg_cpu->util = util;
> > >  	sg_cpu->max = max;
> > > +
> > > +	/* CPU is entering IDLE, reset flags without triggering an update */
> > > +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > > +		sg_cpu->flags = 0;
> > > +		goto done;
> > > +	}
> > 
> > Looks good for now. I'm just thinking that we will happen for DL, as a
> > CPU that still "has" a sleeping task is not going to be really idle
> > until the 0-lag time.
> 
> AFAIU, for the time being, DL already cannot really rely on this flag
> for its behaviors to be correct. Indeed, flags are reset as soon as
> a FAIR task wakes up and it's enqueued.

Right, and your flags ORing patch should help with this.

> 
> Only once your DL integration patches are in, we do not depends on
> flags anymore since DL will report a ceratain utilization up to the
> 0-lag time, isn't it?

Utilization won't decrease until 0-lag time, correct. I was just
wondering if resetting flags before that time (when a CPU enters idle)
might be an issue.

> 
> If that's the case, I would say that the flags will be used only to
> jump to the max OPP for RT tasks. Thus, this patch should still be valid.
> 
> > I guess we could move this at that point in time?
> 
> Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag
> is notified only by idle tasks. That's the only code path where we are
> sure the CPU is entering IDLE.
> 

W.r.t. the possible issue above, I was thinking that we might want to
reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two
distinct set of patches. Who gets in last will have to ponder the thing
a little bit more. :)

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 13:41   ` Juri Lelli
@ 2017-11-30 16:02     ` Patrick Bellasi
  2017-11-30 16:12       ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 16:02 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 14:41, Juri Lelli wrote:
> Hi,
> 
> On 30/11/17 11:47, Patrick Bellasi wrote:
> > In system where multiple CPUs shares the same frequency domain a small
> > workload on a CPU can still be subject to frequency spikes, generated by
> > the activation of the sugov's kthread.
> > 
> > Since the sugov kthread is a special RT task, which goal is just that to
> > activate a frequency transition, it does not make sense for it to bias
> > the schedutil's frequency selection policy.
> > 
> > This patch exploits the information related to the current task to silently
> > ignore cpufreq_update_this_cpu() calls, coming from the RT scheduler, while
> > the sugov kthread is running.
> > 
> > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Cc: Viresh Kumar <viresh.kumar@linaro.org>
> > Cc: linux-kernel@vger.kernel.org
> > Cc: linux-pm@vger.kernel.org
> > 
> > ---
> > Changes from v2:
> > - rebased on v4.15-rc1
> > - moved at the end of the stack since considered more controversial
> > Changes from v1:
> > - move check before policy spinlock (JuriL)
> > 
> > Change-Id: I4d749458229b6496dd24a8c357be42cd35a739fd
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 3eea8884e61b..a93ad5b0c40d 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -270,6 +270,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  	bool rt_mode;
> >  	bool busy;
> >  
> > +	/* Skip updates generated by sugov kthreads */
> > +	if (unlikely(current == sg_policy->thread))
> > +		return;
> > +
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> > @@ -356,6 +360,10 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	unsigned int next_f;
> >  	bool rt_mode;
> >  
> > +	/* Skip updates generated by sugov kthreads */
> > +	if (unlikely(current == sg_policy->thread))
> > +		return;
> > +
> >  	raw_spin_lock(&sg_policy->update_lock);
> >  
> >  	sugov_get_util(&util, &max, sg_cpu->cpu);
> 
> If the DL changes (which I shall post again as soon as tip/sched/core is
> bumped up to 4.15-rc1) get in first, this is going to be useless (as the
> DL kthread gets ignored by the scheduling class itself). But, this looks
> good to me "in the meantime".

Just to better understand, you mean that the DL kthread does not send
out schedutil updates?

If that's the case I agree we can discard this patch... that's also
one of the reasons why I move it at the end of this series.

> Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
> 
> Best,
> 
> Juri

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-11-30 15:45     ` Patrick Bellasi
@ 2017-11-30 16:03       ` Juri Lelli
  0 siblings, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 16:03 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes,
	Steve Muckle

On 30/11/17 15:45, Patrick Bellasi wrote:
> On 30-Nov 14:17, Juri Lelli wrote:
> > Hi,
> > 
> > On 30/11/17 11:47, Patrick Bellasi wrote:
> > 
> > [...]
> > 
> > > @@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > >  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> > >  	unsigned long util, max;
> > >  	unsigned int next_f;
> > > +	bool rt_mode;
> > >  
> > >  	sugov_get_util(&util, &max, sg_cpu->cpu);
> > >  
> > > @@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > >  		sg_cpu->flags = 0;
> > >  		goto done;
> > >  	}
> > > -	sg_cpu->flags = flags;
> > > +
> > > +	/*
> > > +	 * While RT/DL tasks are running we do not want FAIR tasks to
> > > +	 * overwrite this CPU's flags, still we can update utilization and
> > > +	 * frequency (if required/possible) to be fair with these tasks.
> > > +	 */
> > > +	rt_mode = task_has_dl_policy(current) ||
> > > +		  task_has_rt_policy(current) ||
> > > +		  (flags & SCHED_CPUFREQ_RT_DL);
> > > +	if (rt_mode)
> > > +		sg_cpu->flags |= flags;
> > > +	else
> > > +		sg_cpu->flags = flags;
> > >  
> > >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> > >  	sg_cpu->last_update = time;
> > >  
> > >  	if (sugov_should_update_freq(sg_policy, time)) {
> > > -		if (flags & SCHED_CPUFREQ_RT_DL)
> > > -			next_f = sg_policy->policy->cpuinfo.max_freq;
> > > -		else
> > > -			next_f = sugov_next_freq_shared(sg_cpu, time);
> > > -
> > > +		next_f = rt_mode
> > > +			? sg_policy->policy->cpuinfo.max_freq
> > > +			: sugov_next_freq_shared(sg_cpu, time);
> > >  		sugov_update_commit(sg_policy, time, next_f);
> > 
> > Aren't we already at max_freq at this point (since an RT/DL task is
> > running)? Do we need to trigger a frequency update?
> 
> I think that's required to cover the first time we enter rt_mode in
> order to jump to max OPP.
> 
> If we are already at max OPP, sugov_update_commit() will just bail out
> without doing anything.

Indeed!

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks
  2017-11-30 15:54     ` Patrick Bellasi
@ 2017-11-30 16:06       ` Juri Lelli
  0 siblings, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 16:06 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30/11/17 15:54, Patrick Bellasi wrote:
> On 30-Nov 14:36, Juri Lelli wrote:

[...]

> > I wonder if we would also need some way to trigger a back to back update
> > as soon as a currently running one finishes and an RT/DL task asked for
> > an update (without waiting for the next tick).
> 
> Good point, I think that would actually be an interesting
> optimization. We already discussed that in the past, but refrained by
> adding more on top of this already "substantial" set of changes.
> 
> Can we think about that once we decided about some patches of this
> series?

Definitely yes for me. One bit at a time. :)

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 16:02     ` Patrick Bellasi
@ 2017-11-30 16:12       ` Juri Lelli
  2017-11-30 16:42         ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 16:12 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30/11/17 16:02, Patrick Bellasi wrote:
> On 30-Nov 14:41, Juri Lelli wrote:

[...]

> > If the DL changes (which I shall post again as soon as tip/sched/core is
> > bumped up to 4.15-rc1) get in first, this is going to be useless (as the
> > DL kthread gets ignored by the scheduling class itself). But, this looks
> > good to me "in the meantime".
> 
> Just to better understand, you mean that the DL kthread does not send
> out schedutil updates?

It doesn't have a "proper" bandwidth (utilization) :/. So it gets
"unnoticed" and schedutil updates are not triggered.

> 
> If that's the case I agree we can discard this patch... that's also
> one of the reasons why I move it at the end of this series.

Not sure about this though, not my call :). I guess this still helps
until we get the DL changes in.

Best,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 16:02       ` Juri Lelli
@ 2017-11-30 16:19         ` Patrick Bellasi
  2017-11-30 16:45           ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 16:19 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 17:02, Juri Lelli wrote:
> On 30/11/17 15:41, Patrick Bellasi wrote:
> > On 30-Nov 14:12, Juri Lelli wrote:
> > > Hi,
> > > 
> > > On 30/11/17 11:47, Patrick Bellasi wrote:
> > > 
> > > [...]
> > > 
> > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > index 2f52ec0f1539..67339ccb5595 100644
> > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > > >  
> > > >  	sg_cpu->util = util;
> > > >  	sg_cpu->max = max;
> > > > +
> > > > +	/* CPU is entering IDLE, reset flags without triggering an update */
> > > > +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > > > +		sg_cpu->flags = 0;
> > > > +		goto done;
> > > > +	}
> > > 
> > > Looks good for now. I'm just thinking that we will happen for DL, as a
> > > CPU that still "has" a sleeping task is not going to be really idle
> > > until the 0-lag time.
> > 
> > AFAIU, for the time being, DL already cannot really rely on this flag
> > for its behaviors to be correct. Indeed, flags are reset as soon as
> > a FAIR task wakes up and it's enqueued.
> 
> Right, and your flags ORing patch should help with this.
> 
> > 
> > Only once your DL integration patches are in, we do not depends on
> > flags anymore since DL will report a ceratain utilization up to the
> > 0-lag time, isn't it?
> 
> Utilization won't decrease until 0-lag time, correct.

Then IMO with your DL patches the DL class don't need the flags
anymore since schedutil will know (and account) for the
utlization required by the DL tasks. Isn't it?

> I was just wondering if resetting flags before that time (when a CPU
> enters idle) might be an issue.

If the above is correct, then flags will be used only for the RT class (and
IO boosting)... and thus this patch will still be useful as it is now:
meaning that once the idle task is selected we do not care anymore
about RT and IOBoosting (only).

> > If that's the case, I would say that the flags will be used only to
> > jump to the max OPP for RT tasks. Thus, this patch should still be valid.
> > 
> > > I guess we could move this at that point in time?
> > 
> > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag
> > is notified only by idle tasks. That's the only code path where we are
> > sure the CPU is entering IDLE.
> > 
> 
> W.r.t. the possible issue above, I was thinking that we might want to
> reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two
> distinct set of patches. Who gets in last will have to ponder the thing
> a little bit more. :)

Perhaps I'm still a bit confused but, to me, it seems that with your
patches we completely fix DL but we still can use this exact same
patch just for RT tasks.

> Best,
> 
> Juri

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 16:12       ` Juri Lelli
@ 2017-11-30 16:42         ` Patrick Bellasi
  2017-12-07  9:24           ` Viresh Kumar
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-11-30 16:42 UTC (permalink / raw)
  To: Juri Lelli
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-Nov 17:12, Juri Lelli wrote:
> On 30/11/17 16:02, Patrick Bellasi wrote:
> > On 30-Nov 14:41, Juri Lelli wrote:
> 
> [...]
> 
> > > If the DL changes (which I shall post again as soon as tip/sched/core is
> > > bumped up to 4.15-rc1) get in first, this is going to be useless (as the
> > > DL kthread gets ignored by the scheduling class itself). But, this looks
> > > good to me "in the meantime".
> > 
> > Just to better understand, you mean that the DL kthread does not send
> > out schedutil updates?
> 
> It doesn't have a "proper" bandwidth (utilization) :/. So it gets
> "unnoticed" and schedutil updates are not triggered.

Ah, right... that was the "let CBS ignore this DL task" solution! ;-)

> > 
> > If that's the case I agree we can discard this patch... that's also
> > one of the reasons why I move it at the end of this series.
> 
> Not sure about this though, not my call :). I guess this still helps
> until we get the DL changes in.

Yes, agree... well Viresh had some concerns about this patch.
Let see what he thinks.
 
> Best,
> 
> Juri

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 16:19         ` Patrick Bellasi
@ 2017-11-30 16:45           ` Juri Lelli
  0 siblings, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-11-30 16:45 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30/11/17 16:19, Patrick Bellasi wrote:
> On 30-Nov 17:02, Juri Lelli wrote:
> > On 30/11/17 15:41, Patrick Bellasi wrote:
> > > On 30-Nov 14:12, Juri Lelli wrote:
> > > > Hi,
> > > > 
> > > > On 30/11/17 11:47, Patrick Bellasi wrote:
> > > > 
> > > > [...]
> > > > 
> > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > > > index 2f52ec0f1539..67339ccb5595 100644
> > > > > --- a/kernel/sched/cpufreq_schedutil.c
> > > > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> > > > >  
> > > > >  	sg_cpu->util = util;
> > > > >  	sg_cpu->max = max;
> > > > > +
> > > > > +	/* CPU is entering IDLE, reset flags without triggering an update */
> > > > > +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > > > > +		sg_cpu->flags = 0;
> > > > > +		goto done;
> > > > > +	}
> > > > 
> > > > Looks good for now. I'm just thinking that we will happen for DL, as a
> > > > CPU that still "has" a sleeping task is not going to be really idle
> > > > until the 0-lag time.
> > > 
> > > AFAIU, for the time being, DL already cannot really rely on this flag
> > > for its behaviors to be correct. Indeed, flags are reset as soon as
> > > a FAIR task wakes up and it's enqueued.
> > 
> > Right, and your flags ORing patch should help with this.
> > 
> > > 
> > > Only once your DL integration patches are in, we do not depends on
> > > flags anymore since DL will report a ceratain utilization up to the
> > > 0-lag time, isn't it?
> > 
> > Utilization won't decrease until 0-lag time, correct.
> 
> Then IMO with your DL patches the DL class don't need the flags
> anymore since schedutil will know (and account) for the
> utlization required by the DL tasks. Isn't it?
> 
> > I was just wondering if resetting flags before that time (when a CPU
> > enters idle) might be an issue.
> 
> If the above is correct, then flags will be used only for the RT class (and
> IO boosting)... and thus this patch will still be useful as it is now:
> meaning that once the idle task is selected we do not care anymore
> about RT and IOBoosting (only).
> 
> > > If that's the case, I would say that the flags will be used only to
> > > jump to the max OPP for RT tasks. Thus, this patch should still be valid.
> > > 
> > > > I guess we could move this at that point in time?
> > > 
> > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag
> > > is notified only by idle tasks. That's the only code path where we are
> > > sure the CPU is entering IDLE.
> > > 
> > 
> > W.r.t. the possible issue above, I was thinking that we might want to
> > reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two
> > distinct set of patches. Who gets in last will have to ponder the thing
> > a little bit more. :)
> 
> Perhaps I'm still a bit confused but, to me, it seems that with your
> patches we completely fix DL but we still can use this exact same
> patch just for RT tasks.

We don't use the flags for bailing out during aggregation, so it should
be ok for DL yes.

Thanks,

Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  2017-11-30 11:47 ` [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled Patrick Bellasi
  2017-11-30 13:28   ` Juri Lelli
@ 2017-12-06  9:39   ` Vincent Guittot
  2017-12-06 11:38     ` Patrick Bellasi
  1 sibling, 1 reply; 80+ messages in thread
From: Vincent Guittot @ 2017-12-06  9:39 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

Hi Patrick,

On 30 November 2017 at 12:47, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> Currently schedutil updates are triggered for the RT class using a single
> call place, which is part of the rt::update_curr_rt() used in:
>
> - dequeue_task_rt:
>   but it does not make sense to set the schedutil's SCHED_CPUFREQ_RT in
>   case the next task should not be an RT one
>
> - put_prev_task_rt:
>   likewise, we set the SCHED_CPUFREQ_RT flag without knowing if required
>   by the next task
>
> - pick_next_task_rt:
>   likewise, the schedutil's SCHED_CPUFREQ_RT is set in case the prev task
>   was RT, while we don't yet know if the next will be RT
>
> - task_tick_rt:
>   that's the only really useful call, which can ramp up the frequency in
>   case a RT task started its execution without a chance to order a
>   frequency switch (e.g. because of the schedutil ratelimit)
>
> Apart from the last call in task_tick_rt, the others are at least useless.
> Thus, although being a simple solution, not all the call sites of that
> update_curr_rt() are interesting to trigger a frequency switch as well as
> some of the most interesting points are not covered by that call.
> For example, a task set to RT has to wait the next tick to get the
> frequency boost.
>
> This patch fixes these issues by placing explicitly the schedutils
> update calls in the only sensible places, which are:
> - when an RT task wakes up and it's enqueued in a CPU
> - when we actually pick a RT task for execution
> - at each tick time
> - when a task is set to be RT
>
> Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: linux-kernel@vger.kernel.org
> Cc: linux-pm@vger.kernel.org
>
> ---
> Changes from v2:
> - rebased on v4.15-rc1
> - use cpufreq_update_util() instead of cpufreq_update_this_cpu()
>
> Change-Id: I3794615819270fe175cb118eef3f7edd61f602ba
> ---
>  kernel/sched/rt.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4056c19ca3f0..6984032598a6 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq)
>         if (unlikely((s64)delta_exec <= 0))
>                 return;
>
> -       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> -       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> -
>         schedstat_set(curr->se.statistics.exec_max,
>                       max(curr->se.statistics.exec_max, delta_exec));
>
> @@ -1327,6 +1324,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>
>         if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
>                 enqueue_pushable_task(rq, p);
> +
> +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
>  }
>
>  static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
> @@ -1564,6 +1564,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>
>         p = _pick_next_task_rt(rq);
>
> +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */

p is null when there is no rt task to pick.
You should test this condition before calling cpufreq_update_util

> +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> +
>         /* The running task is never eligible for pushing */
>         dequeue_pushable_task(rq, p);
>
> @@ -2282,6 +2285,9 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
>  {
>         struct sched_rt_entity *rt_se = &p->rt;
>
> +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> +
>         update_curr_rt(rq);
>
>         watchdog(rq, p);
> @@ -2317,6 +2323,9 @@ static void set_curr_task_rt(struct rq *rq)
>
>         p->se.exec_start = rq_clock_task(rq);
>
> +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);

Is this change linked to the "- when a task is set to be RT" in the
commit message ?

I can't see a situation where this is call without the previous one.
AFAICT, enqueue_task_rt will be called before each call to this
function

> +
>         /* The running task is never eligible for pushing */
>         dequeue_pushable_task(rq, p);
>  }
> --
> 2.14.1
>

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  2017-12-06  9:39   ` Vincent Guittot
@ 2017-12-06 11:38     ` Patrick Bellasi
  2017-12-06 12:36       ` Vincent Guittot
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-06 11:38 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

Hi Vincent,

On 06-Dec 10:39, Vincent Guittot wrote:
> Hi Patrick,
> 
> On 30 November 2017 at 12:47, Patrick Bellasi <patrick.bellasi@arm.com> wrote:

[...]

> >  static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
> > @@ -1564,6 +1564,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >
> >         p = _pick_next_task_rt(rq);
> >
> > +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> 
> p is null when there is no rt task to pick.
> You should test this condition before calling cpufreq_update_util

Mmm... for what I see, from the above function's implementation,
_pick_next_task_rt() is always returning a valid pointer to an RT
task.

The above call does a:

   p->se.exec_start = rq_clock_task(rq);

right before returning, and there is also a BUG_ON(!rt_se) in the
previous RT tasks scanning loop.

Am I missing something?

> > +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> > +
> >         /* The running task is never eligible for pushing */
> >         dequeue_pushable_task(rq, p);

[...]

> > @@ -2317,6 +2323,9 @@ static void set_curr_task_rt(struct rq *rq)
> >
> >         p->se.exec_start = rq_clock_task(rq);
> >
> > +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> > +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> 
> Is this change linked to the "- when a task is set to be RT" in the
> commit message ?
> 
> I can't see a situation where this is call without the previous one.
> AFAICT, enqueue_task_rt will be called before each call to this
> function

Yeah, you right, in core.c the pattern seems to always be:

    if (queued)
            enqueue_task()
    if (running)
            set_curr_task()

I'll remove this chunk from the next version.

> 
> > +
> >         /* The running task is never eligible for pushing */
> >         dequeue_pushable_task(rq, p);
> >  }

Thanks for the review!

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled
  2017-12-06 11:38     ` Patrick Bellasi
@ 2017-12-06 12:36       ` Vincent Guittot
  0 siblings, 0 replies; 80+ messages in thread
From: Vincent Guittot @ 2017-12-06 12:36 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Viresh Kumar, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 6 December 2017 at 12:38, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
> Hi Vincent,
>
> On 06-Dec 10:39, Vincent Guittot wrote:
>> Hi Patrick,
>>
>> On 30 November 2017 at 12:47, Patrick Bellasi <patrick.bellasi@arm.com> wrote:
>
> [...]
>
>> >  static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
>> > @@ -1564,6 +1564,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
>> >
>> >         p = _pick_next_task_rt(rq);
>> >
>> > +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
>>
>> p is null when there is no rt task to pick.
>> You should test this condition before calling cpufreq_update_util
>
> Mmm... for what I see, from the above function's implementation,
> _pick_next_task_rt() is always returning a valid pointer to an RT
> task.
>
> The above call does a:
>
>    p->se.exec_start = rq_clock_task(rq);
>
> right before returning, and there is also a BUG_ON(!rt_se) in the
> previous RT tasks scanning loop.
>
> Am I missing something?

No you're right the return Null is done earlier if there is no task

>
>> > +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
>> > +
>> >         /* The running task is never eligible for pushing */
>> >         dequeue_pushable_task(rq, p);
>
> [...]
>
>> > @@ -2317,6 +2323,9 @@ static void set_curr_task_rt(struct rq *rq)
>> >
>> >         p->se.exec_start = rq_clock_task(rq);
>> >
>> > +       /* Kick cpufreq (see the comment in kernel/sched/sched.h). */
>> > +       cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
>>
>> Is this change linked to the "- when a task is set to be RT" in the
>> commit message ?
>>
>> I can't see a situation where this is call without the previous one.
>> AFAICT, enqueue_task_rt will be called before each call to this
>> function
>
> Yeah, you right, in core.c the pattern seems to always be:
>
>     if (queued)
>             enqueue_task()
>     if (running)
>             set_curr_task()
>
> I'll remove this chunk from the next version.
>
>>
>> > +
>> >         /* The running task is never eligible for pushing */
>> >         dequeue_pushable_task(rq, p);
>> >  }
>
> Thanks for the review!
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
  2017-11-30 13:12   ` Juri Lelli
@ 2017-12-07  5:01   ` Viresh Kumar
  2017-12-07 12:45     ` Patrick Bellasi
  2017-12-20 14:33   ` Peter Zijlstra
  2 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-07  5:01 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 30-11-17, 11:47, Patrick Bellasi wrote:
> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> index d1ad3d825561..bb5f778db023 100644
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -11,6 +11,7 @@
>  #define SCHED_CPUFREQ_RT	(1U << 0)
>  #define SCHED_CPUFREQ_DL	(1U << 1)
>  #define SCHED_CPUFREQ_IOWAIT	(1U << 2)
> +#define SCHED_CPUFREQ_IDLE	(1U << 3)
>  
>  #define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
>  
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2f52ec0f1539..67339ccb5595 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  
>  	sg_cpu->util = util;
>  	sg_cpu->max = max;
> +
> +	/* CPU is entering IDLE, reset flags without triggering an update */
> +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> +		sg_cpu->flags = 0;
> +		goto done;
> +	}
>  	sg_cpu->flags = flags;
>  
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
> @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  		sugov_update_commit(sg_policy, time, next_f);
>  	}
>  
> +done:
>  	raw_spin_unlock(&sg_policy->update_lock);
>  }
>  
> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index d518664cce4f..6e8ae2aa7a13 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
>  	put_prev_task(rq, prev);
>  	update_idle_core(rq);
>  	schedstat_inc(rq->sched_goidle);
> +
> +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);

We posted some comments on V2 for this particular patch suggesting
some improvements. The patch hasn't changed at all and you haven't
replied to few of those suggestions as well. Any particular reason for
that?

For example:
- I suggested to get rid of the conditional expression in
  cpufreq_schedutil.c file that you have added.
- And Joel suggested to clear the RT/DL flags from dequeue path to
  avoid adding SCHED_CPUFREQ_IDLE flag.

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-11-30 11:47 ` [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks Patrick Bellasi
  2017-11-30 13:17   ` Juri Lelli
@ 2017-12-07  5:05   ` Viresh Kumar
  2017-12-07 14:18     ` Patrick Bellasi
  1 sibling, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-07  5:05 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
	Steve Muckle

On 30-11-17, 11:47, Patrick Bellasi wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 67339ccb5595..448f49de5335 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -262,6 +262,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	struct cpufreq_policy *policy = sg_policy->policy;
>  	unsigned long util, max;
>  	unsigned int next_f;
> +	bool rt_mode;
>  	bool busy;
>  
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
> @@ -272,7 +273,15 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  
>  	busy = sugov_cpu_is_busy(sg_cpu);
>  
> -	if (flags & SCHED_CPUFREQ_RT_DL) {
> +	/*
> +	 * While RT/DL tasks are running we do not want FAIR tasks to
> +	 * overvrite this CPU's flags, still we can update utilization and
> +	 * frequency (if required/possible) to be fair with these tasks.
> +	 */
> +	rt_mode = task_has_dl_policy(current) ||
> +		  task_has_rt_policy(current) ||
> +		  (flags & SCHED_CPUFREQ_RT_DL);
> +	if (rt_mode) {
>  		next_f = policy->cpuinfo.max_freq;
>  	} else {
>  		sugov_get_util(&util, &max, sg_cpu->cpu);
> @@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>  	unsigned long util, max;
>  	unsigned int next_f;
> +	bool rt_mode;
>  
>  	sugov_get_util(&util, &max, sg_cpu->cpu);
>  
> @@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  		sg_cpu->flags = 0;
>  		goto done;
>  	}
> -	sg_cpu->flags = flags;
> +
> +	/*
> +	 * While RT/DL tasks are running we do not want FAIR tasks to
> +	 * overwrite this CPU's flags, still we can update utilization and
> +	 * frequency (if required/possible) to be fair with these tasks.
> +	 */
> +	rt_mode = task_has_dl_policy(current) ||
> +		  task_has_rt_policy(current) ||
> +		  (flags & SCHED_CPUFREQ_RT_DL);
> +	if (rt_mode)
> +		sg_cpu->flags |= flags;
> +	else
> +		sg_cpu->flags = flags;
>  
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  
>  	if (sugov_should_update_freq(sg_policy, time)) {
> -		if (flags & SCHED_CPUFREQ_RT_DL)
> -			next_f = sg_policy->policy->cpuinfo.max_freq;
> -		else
> -			next_f = sugov_next_freq_shared(sg_cpu, time);
> -
> +		next_f = rt_mode
> +			? sg_policy->policy->cpuinfo.max_freq
> +			: sugov_next_freq_shared(sg_cpu, time);
>  		sugov_update_commit(sg_policy, time, next_f);
>  	}

Same here. There are pending comments from V2 which no one objected to
and I was looking to see those modifications here.

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-11-30 15:57     ` Patrick Bellasi
@ 2017-12-07  5:15       ` Viresh Kumar
  2017-12-07 14:19         ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-07  5:15 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-11-17, 15:57, Patrick Bellasi wrote:
> Yes, that's a pretty trivial update with a confusing changelog.
> 
> If we think it's worth to keep (and correct as well) I'll update the
> commit message.

We also need to update the commit log based on feedback from Vikram on
V2. Which said that the utilization can't change around the lock here
as we are within rq lock section, though max can change (maybe). So
this patch only takes care of locking before reading max.

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-11-30 16:42         ` Patrick Bellasi
@ 2017-12-07  9:24           ` Viresh Kumar
  2017-12-07 15:47             ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-07  9:24 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Todd Kjos, Joel Fernandes

On 30-11-17, 16:42, Patrick Bellasi wrote:
> On 30-Nov 17:12, Juri Lelli wrote:
> > Not sure about this though, not my call :). I guess this still helps
> > until we get the DL changes in.
> 
> Yes, agree... well Viresh had some concerns about this patch.
> Let see what he thinks.

And the problem is that we never got to a conclusion in V2 for this
patch as well, as you never responded to the last set of queries I
had. Unless we finish the ongoing conversations, we will have the same
queries again.

IOW, can you explain the exact scenario where this patch will be
helpful ? I am not sure if this patch is that helpful at all :)

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-07  5:01   ` Viresh Kumar
@ 2017-12-07 12:45     ` Patrick Bellasi
  2017-12-07 15:55       ` Dietmar Eggemann
  2017-12-12 11:37       ` Viresh Kumar
  0 siblings, 2 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-07 12:45 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

Hi Viresh,

On 07-Dec 10:31, Viresh Kumar wrote:
> On 30-11-17, 11:47, Patrick Bellasi wrote:
> > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
> > index d1ad3d825561..bb5f778db023 100644
> > --- a/include/linux/sched/cpufreq.h
> > +++ b/include/linux/sched/cpufreq.h
> > @@ -11,6 +11,7 @@
> >  #define SCHED_CPUFREQ_RT	(1U << 0)
> >  #define SCHED_CPUFREQ_DL	(1U << 1)
> >  #define SCHED_CPUFREQ_IOWAIT	(1U << 2)
> > +#define SCHED_CPUFREQ_IDLE	(1U << 3)
> >  
> >  #define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
> >  
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 2f52ec0f1539..67339ccb5595 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  
> >  	sg_cpu->util = util;
> >  	sg_cpu->max = max;
> > +
> > +	/* CPU is entering IDLE, reset flags without triggering an update */
> > +	if (unlikely(flags & SCHED_CPUFREQ_IDLE)) {
> > +		sg_cpu->flags = 0;
> > +		goto done;
> > +	}
> >  	sg_cpu->flags = flags;
> >  
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  		sugov_update_commit(sg_policy, time, next_f);
> >  	}
> >  
> > +done:
> >  	raw_spin_unlock(&sg_policy->update_lock);
> >  }
> >  
> > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> > index d518664cce4f..6e8ae2aa7a13 100644
> > --- a/kernel/sched/idle_task.c
> > +++ b/kernel/sched/idle_task.c
> > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> >  	put_prev_task(rq, prev);
> >  	update_idle_core(rq);
> >  	schedstat_inc(rq->sched_goidle);
> > +
> > +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> > +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
> 
> We posted some comments on V2 for this particular patch suggesting
> some improvements. The patch hasn't changed at all and you haven't
> replied to few of those suggestions as well. Any particular reason for
> that?

You right, since the previous posting has been a long time ago, with
this one I mainly wanted to refresh the discussion. Thanks for
highlighting hereafter which one was the main discussion points.


> For example:
> - I suggested to get rid of the conditional expression in
>   cpufreq_schedutil.c file that you have added.

We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
them), however I think we still need an if condition somewhere.

Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger
an OPP change (reasons described in the changelog).

If that's still a goal, then we will need to check this flag and bail
out from sugov_update_shared straight away. That's why I've added a
check at the beginning and also defined it as unlikely to have not
impact on all cases where we call a schedutil update with runnable
tasks.

Does this makes sense?

> - And Joel suggested to clear the RT/DL flags from dequeue path to
>   avoid adding SCHED_CPUFREQ_IDLE flag.

I had a thought about Joel's proposal:

>> wouldn't another way be to just clear the flag from the RT scheduling
>> class with an extra call to cpufreq_update_util with flags = 0 during
>> dequeue_rt_entity?

The main concern for me was that the current API is completely
transparent about which scheduling class is calling schedutil for
updates.

Thus, at dequeue time of an RT task we cannot really clear
all the flags (e.g. IOWAIT of a fair task), we should clear only
the RT related flags.

This means that we likely need to implement Joel's idea by:

1. adding a new set of flags like:
   SCHED_CPUFREQ_RT_IDLE, SCHED_CPUFREQ_DL_IDLE, etc...

3. add an operation flag, e.g.
   SCHED_CPUFERQ_SET, SCHED_CPUFREQ_RESET to be ORed with the class
   flag, e.g.
   cpufreq_update_util(rq, SCHED_CPUFREQ_SET|SCHED_CPUFREQ_RT);

3. change the API to carry the operation required for a flag, e.g.:
   cpufreq_update_util(rq, flag, set={true, false});

To be honest I don't like any of those, especially compared to the
simplicity of the one proposed by this patch. :)

IMO, the only pitfall of this patch is that (as Juri pointed out in
v2) for DL it can happen that we do not want to reset the flag right
when a CPU enters IDLE. We need instead a specific call to reset the
DL flag at the 0-lag time.

However, AFAIU, this special case for DL will disappear as long as we
have last Juri's set [1]in.  Indeed, at this point, schedutil will
always and only need to know the utilization required by DL.

[1] https://lkml.org/lkml/2017/12/4/173

Cheers Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks
  2017-12-07  5:05   ` Viresh Kumar
@ 2017-12-07 14:18     ` Patrick Bellasi
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-07 14:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes,
	Steve Muckle

On 07-Dec 10:35, Viresh Kumar wrote:
> On 30-11-17, 11:47, Patrick Bellasi wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 67339ccb5595..448f49de5335 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -262,6 +262,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  	struct cpufreq_policy *policy = sg_policy->policy;
> >  	unsigned long util, max;
> >  	unsigned int next_f;
> > +	bool rt_mode;
> >  	bool busy;
> >  
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> > @@ -272,7 +273,15 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  
> >  	busy = sugov_cpu_is_busy(sg_cpu);
> >  
> > -	if (flags & SCHED_CPUFREQ_RT_DL) {
> > +	/*
> > +	 * While RT/DL tasks are running we do not want FAIR tasks to
> > +	 * overvrite this CPU's flags, still we can update utilization and
> > +	 * frequency (if required/possible) to be fair with these tasks.
> > +	 */
> > +	rt_mode = task_has_dl_policy(current) ||
> > +		  task_has_rt_policy(current) ||
> > +		  (flags & SCHED_CPUFREQ_RT_DL);
> > +	if (rt_mode) {
> >  		next_f = policy->cpuinfo.max_freq;
> >  	} else {
> >  		sugov_get_util(&util, &max, sg_cpu->cpu);
> > @@ -340,6 +349,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >  	unsigned long util, max;
> >  	unsigned int next_f;
> > +	bool rt_mode;
> >  
> >  	sugov_get_util(&util, &max, sg_cpu->cpu);
> >  
> > @@ -353,17 +363,27 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
> >  		sg_cpu->flags = 0;
> >  		goto done;
> >  	}
> > -	sg_cpu->flags = flags;
> > +
> > +	/*
> > +	 * While RT/DL tasks are running we do not want FAIR tasks to
> > +	 * overwrite this CPU's flags, still we can update utilization and
> > +	 * frequency (if required/possible) to be fair with these tasks.
> > +	 */
> > +	rt_mode = task_has_dl_policy(current) ||
> > +		  task_has_rt_policy(current) ||
> > +		  (flags & SCHED_CPUFREQ_RT_DL);
> > +	if (rt_mode)
> > +		sg_cpu->flags |= flags;
> > +	else
> > +		sg_cpu->flags = flags;
> >  
> >  	sugov_set_iowait_boost(sg_cpu, time, flags);
> >  	sg_cpu->last_update = time;
> >  
> >  	if (sugov_should_update_freq(sg_policy, time)) {
> > -		if (flags & SCHED_CPUFREQ_RT_DL)
> > -			next_f = sg_policy->policy->cpuinfo.max_freq;
> > -		else
> > -			next_f = sugov_next_freq_shared(sg_cpu, time);
> > -
> > +		next_f = rt_mode
> > +			? sg_policy->policy->cpuinfo.max_freq
> > +			: sugov_next_freq_shared(sg_cpu, time);
> >  		sugov_update_commit(sg_policy, time, next_f);
> >  	}
> 
> Same here. There are pending comments from V2 which no one objected to
> and I was looking to see those modifications here.

So, your proposal here was actually to add additional flags to clear
the RT and DL ones.

My past comment was instead that we never had a "clear bit" semantics
for flags updates. However, it seems that the most common optinion was
that we should try to add such flags.

Thus, I think I have to refresh this patch by adding in the new flags
as you proposed and give it a try.

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-12-07  5:15       ` Viresh Kumar
@ 2017-12-07 14:19         ` Patrick Bellasi
  2017-12-14  4:45           ` Viresh Kumar
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-07 14:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Todd Kjos, Joel Fernandes

On 07-Dec 10:45, Viresh Kumar wrote:
> On 30-11-17, 15:57, Patrick Bellasi wrote:
> > Yes, that's a pretty trivial update with a confusing changelog.
> > 
> > If we think it's worth to keep (and correct as well) I'll update the
> > commit message.
> 
> We also need to update the commit log based on feedback from Vikram on
> V2. Which said that the utilization can't change around the lock here
> as we are within rq lock section, though max can change (maybe). So
> this patch only takes care of locking before reading max.

Ok, right... will do.

Thus you are still of the opinion to keep this patch in the series?

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads
  2017-12-07  9:24           ` Viresh Kumar
@ 2017-12-07 15:47             ` Patrick Bellasi
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-07 15:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Todd Kjos, Joel Fernandes

On 07-Dec 14:54, Viresh Kumar wrote:
> On 30-11-17, 16:42, Patrick Bellasi wrote:
> > On 30-Nov 17:12, Juri Lelli wrote:
> > > Not sure about this though, not my call :). I guess this still helps
> > > until we get the DL changes in.
> > 
> > Yes, agree... well Viresh had some concerns about this patch.
> > Let see what he thinks.
> 
> And the problem is that we never got to a conclusion in V2 for this
> patch as well, as you never responded to the last set of queries I
> had. Unless we finish the ongoing conversations, we will have the same
> queries again.
> 
> IOW, can you explain the exact scenario where this patch will be
> helpful ? I am not sure if this patch is that helpful at all :)

My use case is considering a small FAIR task which triggers an OPP
change while running on a CPU which is different from the one running
the schedutil kthread.

If we go for a "clear flags semantics", where the RT class clears its
flag as soon as there are not more RT tasks runnable, then this patch
likely is not more required.

Otherwise, there can be corner cases in which the RT flag remain set in
the kthread CPU thus affecting OPP decisions even then there are not
RT tasks around. Consider for example this scenarios:


CPU0:               | SUGOV |            | SUGOV |
         small FAIR |       | small FAIR |       |
                      flags = RT          ^
                    ^                     |
                    | (A)                 | (B)
                    | sugov_update_util() |
                    |                     | sugov_update_util()
CPU1:    medium growing FAIR task | another FAIR enqueued


In this case:
- in A we set the RT flag
- in B we pick the MAX OPP even if there are only small tasks.


This will not happen with a "clear flags semantics" because we ensure
to release the RT flag every time an RT task exit the CPU.
Thus, if we go for that semantics, we can skip this patch.

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-07 12:45     ` Patrick Bellasi
@ 2017-12-07 15:55       ` Dietmar Eggemann
  2017-12-12 11:37       ` Viresh Kumar
  1 sibling, 0 replies; 80+ messages in thread
From: Dietmar Eggemann @ 2017-12-07 15:55 UTC (permalink / raw)
  To: Patrick Bellasi, Viresh Kumar
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Morten Rasmussen,
	Juri Lelli, Todd Kjos, Joel Fernandes

On 12/07/2017 01:45 PM, Patrick Bellasi wrote:
> Hi Viresh,
> 
> On 07-Dec 10:31, Viresh Kumar wrote:
>> On 30-11-17, 11:47, Patrick Bellasi wrote:

[...]

>> We posted some comments on V2 for this particular patch suggesting
>> some improvements. The patch hasn't changed at all and you haven't
>> replied to few of those suggestions as well. Any particular reason for
>> that?
> 
> You right, since the previous posting has been a long time ago, with
> this one I mainly wanted to refresh the discussion. Thanks for
> highlighting hereafter which one was the main discussion points.
> 
> 
>> For example:
>> - I suggested to get rid of the conditional expression in
>>    cpufreq_schedutil.c file that you have added.
> 
> We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
> them), however I think we still need an if condition somewhere.
> 
> Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger
> an OPP change (reasons described in the changelog).
> 
> If that's still a goal, then we will need to check this flag and bail
> out from sugov_update_shared straight away. That's why I've added a
> check at the beginning and also defined it as unlikely to have not
> impact on all cases where we call a schedutil update with runnable
> tasks.
> 
> Does this makes sense?

IIRC, there was also this question of doing this not only in the shared 
but also in the single case ...

[...]

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-07 12:45     ` Patrick Bellasi
  2017-12-07 15:55       ` Dietmar Eggemann
@ 2017-12-12 11:37       ` Viresh Kumar
  2017-12-12 13:38         ` Juri Lelli
  2017-12-12 15:16         ` Patrick Bellasi
  1 sibling, 2 replies; 80+ messages in thread
From: Viresh Kumar @ 2017-12-12 11:37 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 07-12-17, 12:45, Patrick Bellasi wrote:
> On 07-Dec 10:31, Viresh Kumar wrote:
> > We posted some comments on V2 for this particular patch suggesting
> > some improvements. The patch hasn't changed at all and you haven't
> > replied to few of those suggestions as well. Any particular reason for
> > that?
> 
> You right, since the previous posting has been a long time ago, with
> this one I mainly wanted to refresh the discussion.

But posting again without making proposed changes or finishing earlier
discussion is normally not appreciated, specially by the very busy maintainers
(I am not one of those though :)). For example, I had to go look at previous
discussions now to see what all comments I gave and what has changed in respect
to them. That wasted lots of time as nothing really changed.

If you want to refresh the discussion, then its probably better to finish
discussions on the earlier threads only and then only send a new version.

> > For example:
> > - I suggested to get rid of the conditional expression in
> >   cpufreq_schedutil.c file that you have added.
> 
> We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting
> them), however I think we still need an if condition somewhere.

Yeah, my point was to have similar behavior in single and shared policy cases.

> > - And Joel suggested to clear the RT/DL flags from dequeue path to
> >   avoid adding SCHED_CPUFREQ_IDLE flag.
> 
> I had a thought about Joel's proposal:
> 
> >> wouldn't another way be to just clear the flag from the RT scheduling
> >> class with an extra call to cpufreq_update_util with flags = 0 during
> >> dequeue_rt_entity?
> 
> The main concern for me was that the current API is completely
> transparent about which scheduling class is calling schedutil for
> updates.

I think its important to fix the basic mechanism of util update than fixing
corner cases with workarounds. I attempted a simpler approach (at least
according to me :)). Please share your feedback on it. You can include that as
part of your series, or I can send it separately if everyone finds it okay.

-- 
viresh

-------------------------8<-------------------------
From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Tue, 12 Dec 2017 15:43:26 +0530
Subject: [PATCH] sched: Keep track of cpufreq utilization update flags

Currently the schedutil governor overwrites the sg_cpu->flags field on
every call to the utilization handler. It was pretty good as the initial
implementation of utilization handlers, there are several drawbacks
though.

The biggest drawback is that the sg_cpu->flags field doesn't always
represent the correct type of tasks that are enqueued on a CPU's rq. For
example, if a fair task is enqueued while a RT or DL task is running, we
will overwrite the flags with value 0 and that may take the CPU to lower
OPPs unintentionally. There can be other corner cases as well which we
aren't aware of currently.

This patch changes the current implementation to keep track of all the
task types that are currently enqueued to the CPUs rq. There are two
flags for every scheduling class now, one to set the flag and other one
to clear it. The flag is set by the scheduling classes from the existing
set of calls to cpufreq_update_util(), and the flag is cleared when the
last task of the scheduling class is dequeued. For now, the util update
handlers return immediately if they were called to clear the flag.

We can add more optimizations over this patch separately.

The last parameter of sugov_set_iowait_boost() is also dropped as the
function can get it from sg_cpu anyway.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 include/linux/sched/cpufreq.h    |  7 ++++++-
 kernel/sched/cpufreq_schedutil.c | 32 +++++++++++++++++++++++---------
 kernel/sched/deadline.c          |  4 ++++
 kernel/sched/fair.c              |  8 ++++++--
 kernel/sched/rt.c                |  4 ++++
 5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d1ad3d825561..dc2470affea4 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -8,9 +8,14 @@
  * Interface between cpufreq drivers and the scheduler:
  */
 
+#define SCHED_CPUFREQ_CLEAR	(1U << 31)
 #define SCHED_CPUFREQ_RT	(1U << 0)
+#define SCHED_CPUFREQ_RT_CLEAR	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
 #define SCHED_CPUFREQ_DL	(1U << 1)
-#define SCHED_CPUFREQ_IOWAIT	(1U << 2)
+#define SCHED_CPUFREQ_DL_CLEAR	(SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)
+#define SCHED_CPUFREQ_CFS	(1U << 2)
+#define SCHED_CPUFREQ_CFS_CLEAR	(SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
+#define SCHED_CPUFREQ_IOWAIT	(1U << 3)
 
 #define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
 
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 2f52ec0f1539..7edfdc59ee8f 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -187,10 +187,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
 	*max = cfs_max;
 }
 
-static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
-				   unsigned int flags)
+static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
 {
-	if (flags & SCHED_CPUFREQ_IOWAIT) {
+	if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
+		sg_cpu->flags &= ~SCHED_CPUFREQ_IOWAIT;
+
 		if (sg_cpu->iowait_boost_pending)
 			return;
 
@@ -264,7 +265,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	unsigned int next_f;
 	bool busy;
 
-	sugov_set_iowait_boost(sg_cpu, time, flags);
+	if (unlikely(flags & SCHED_CPUFREQ_CLEAR)) {
+		sg_cpu->flags &= ~flags;
+		return;
+	}
+
+	sg_cpu->flags |= flags;
+
+	sugov_set_iowait_boost(sg_cpu, time);
 	sg_cpu->last_update = time;
 
 	if (!sugov_should_update_freq(sg_policy, time))
@@ -272,7 +280,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	if (flags & SCHED_CPUFREQ_RT_DL) {
+	if (sg_cpu->flags & SCHED_CPUFREQ_RT_DL) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
 		sugov_get_util(&util, &max, sg_cpu->cpu);
@@ -345,15 +353,20 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 
 	raw_spin_lock(&sg_policy->update_lock);
 
+	if (unlikely(flags & SCHED_CPUFREQ_CLEAR)) {
+		sg_cpu->flags &= ~flags;
+		goto unlock;
+	}
+
 	sg_cpu->util = util;
 	sg_cpu->max = max;
-	sg_cpu->flags = flags;
+	sg_cpu->flags |= flags;
 
-	sugov_set_iowait_boost(sg_cpu, time, flags);
+	sugov_set_iowait_boost(sg_cpu, time);
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT_DL)
+		if (sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
 			next_f = sg_policy->policy->cpuinfo.max_freq;
 		else
 			next_f = sugov_next_freq_shared(sg_cpu, time);
@@ -361,6 +374,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
+unlock:
 	raw_spin_unlock(&sg_policy->update_lock);
 }
 
@@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
 		memset(sg_cpu, 0, sizeof(*sg_cpu));
 		sg_cpu->cpu = cpu;
 		sg_cpu->sg_policy = sg_policy;
-		sg_cpu->flags = SCHED_CPUFREQ_RT;
+		sg_cpu->flags = 0;
 		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
 	}
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 2473736c7616..d9c7c6887493 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1472,6 +1472,10 @@ static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 	 */
 	if (flags & DEQUEUE_SLEEP)
 		task_non_contending(p);
+
+	/* Clear cpufreq flags after last deadline task is dequeued */
+	if (!rq->dl.dl_nr_running)
+		cpufreq_update_util(rq, SCHED_CPUFREQ_DL_CLEAR);
 }
 
 /*
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2915c0d95107..492188c3ee2d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3033,7 +3033,7 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 		 *
 		 * See cpu_util().
 		 */
-		cpufreq_update_util(rq, 0);
+		cpufreq_update_util(rq, SCHED_CPUFREQ_CFS);
 	}
 }
 
@@ -5214,7 +5214,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	 * passed.
 	 */
 	if (p->in_iowait)
-		cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
+		cpufreq_update_util(rq, SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_IOWAIT);
 
 	for_each_sched_entity(se) {
 		if (se->on_rq)
@@ -5309,6 +5309,10 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		sub_nr_running(rq, 1);
 
 	hrtick_update(rq);
+
+	/* Clear cpufreq flags after last CFS task is dequeued */
+	if (!rq->cfs.nr_running)
+		cpufreq_update_util(rq, SCHED_CPUFREQ_CFS_CLEAR);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4056c19ca3f0..73131abd9df6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1337,6 +1337,10 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
 	dequeue_rt_entity(rt_se, flags);
 
 	dequeue_pushable_task(rq, p);
+
+	/* Clear cpufreq flags after last rt task is dequeued */
+	if (!rq->rt.rt_nr_running)
+		cpufreq_update_util(rq, SCHED_CPUFREQ_RT_CLEAR);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 11:37       ` Viresh Kumar
@ 2017-12-12 13:38         ` Juri Lelli
  2017-12-12 14:40           ` Viresh Kumar
  2017-12-12 15:16         ` Patrick Bellasi
  1 sibling, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-12-12 13:38 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Peter Zijlstra, Rafael J . Wysocki, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi Viresh,

On 12/12/17 17:07, Viresh Kumar wrote:

[...]

> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Tue, 12 Dec 2017 15:43:26 +0530
> Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> 
> Currently the schedutil governor overwrites the sg_cpu->flags field on
> every call to the utilization handler. It was pretty good as the initial
> implementation of utilization handlers, there are several drawbacks
> though.
> 
> The biggest drawback is that the sg_cpu->flags field doesn't always
> represent the correct type of tasks that are enqueued on a CPU's rq. For
> example, if a fair task is enqueued while a RT or DL task is running, we
> will overwrite the flags with value 0 and that may take the CPU to lower
> OPPs unintentionally. There can be other corner cases as well which we
> aren't aware of currently.
> 
> This patch changes the current implementation to keep track of all the
> task types that are currently enqueued to the CPUs rq. There are two
> flags for every scheduling class now, one to set the flag and other one
> to clear it. The flag is set by the scheduling classes from the existing
> set of calls to cpufreq_update_util(), and the flag is cleared when the
> last task of the scheduling class is dequeued. For now, the util update
> handlers return immediately if they were called to clear the flag.
> 
> We can add more optimizations over this patch separately.
> 
> The last parameter of sugov_set_iowait_boost() is also dropped as the
> function can get it from sg_cpu anyway.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

[...]

> @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
>  		memset(sg_cpu, 0, sizeof(*sg_cpu));
>  		sg_cpu->cpu = cpu;
>  		sg_cpu->sg_policy = sg_policy;
> -		sg_cpu->flags = SCHED_CPUFREQ_RT;
> +		sg_cpu->flags = 0;
>  		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
>  	}

Why this change during initialization?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 13:38         ` Juri Lelli
@ 2017-12-12 14:40           ` Viresh Kumar
  2017-12-12 14:56             ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-12 14:40 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Peter Zijlstra, Rafael J . Wysocki, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 12-12-17, 14:38, Juri Lelli wrote:
> Hi Viresh,
> 
> On 12/12/17 17:07, Viresh Kumar wrote:
> 
> [...]
> 
> > From: Viresh Kumar <viresh.kumar@linaro.org>
> > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> > 
> > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > every call to the utilization handler. It was pretty good as the initial
> > implementation of utilization handlers, there are several drawbacks
> > though.
> > 
> > The biggest drawback is that the sg_cpu->flags field doesn't always
> > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > example, if a fair task is enqueued while a RT or DL task is running, we
> > will overwrite the flags with value 0 and that may take the CPU to lower
> > OPPs unintentionally. There can be other corner cases as well which we
> > aren't aware of currently.
> > 
> > This patch changes the current implementation to keep track of all the
> > task types that are currently enqueued to the CPUs rq. There are two
> > flags for every scheduling class now, one to set the flag and other one
> > to clear it. The flag is set by the scheduling classes from the existing
> > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > last task of the scheduling class is dequeued. For now, the util update
> > handlers return immediately if they were called to clear the flag.
> > 
> > We can add more optimizations over this patch separately.
> > 
> > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > function can get it from sg_cpu anyway.
> > 
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> 
> [...]
> 
> > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> >  		memset(sg_cpu, 0, sizeof(*sg_cpu));
> >  		sg_cpu->cpu = cpu;
> >  		sg_cpu->sg_policy = sg_policy;
> > -		sg_cpu->flags = SCHED_CPUFREQ_RT;
> > +		sg_cpu->flags = 0;
> >  		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> >  	}
> 
> Why this change during initialization?

Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
change the frequency until the first time the util handler is called. And once
that is called we were updating the flag anyway. So, unless I misunderstood its
purpose, it was doing anything helpful.

I need to remove it otherwise the RT flag may remain set for a very long time
unnecessarily. That would be until the time the last RT task is not dequeued.
Consider this for example: we are at max freq when sugov_start() is called and
it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
tasks but we always keep running at max because of the flag. Even the schedutil
RT thread doesn't get a chance to run/deququed, because we never want a freq
change with the RT flag and stay at max.

Makes sense ?

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 14:40           ` Viresh Kumar
@ 2017-12-12 14:56             ` Juri Lelli
  2017-12-12 15:18               ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-12-12 14:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Peter Zijlstra, Rafael J . Wysocki, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 12/12/17 20:10, Viresh Kumar wrote:
> On 12-12-17, 14:38, Juri Lelli wrote:
> > Hi Viresh,
> > 
> > On 12/12/17 17:07, Viresh Kumar wrote:
> > 
> > [...]
> > 
> > > From: Viresh Kumar <viresh.kumar@linaro.org>
> > > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> > > 
> > > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > > every call to the utilization handler. It was pretty good as the initial
> > > implementation of utilization handlers, there are several drawbacks
> > > though.
> > > 
> > > The biggest drawback is that the sg_cpu->flags field doesn't always
> > > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > > example, if a fair task is enqueued while a RT or DL task is running, we
> > > will overwrite the flags with value 0 and that may take the CPU to lower
> > > OPPs unintentionally. There can be other corner cases as well which we
> > > aren't aware of currently.
> > > 
> > > This patch changes the current implementation to keep track of all the
> > > task types that are currently enqueued to the CPUs rq. There are two
> > > flags for every scheduling class now, one to set the flag and other one
> > > to clear it. The flag is set by the scheduling classes from the existing
> > > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > > last task of the scheduling class is dequeued. For now, the util update
> > > handlers return immediately if they were called to clear the flag.
> > > 
> > > We can add more optimizations over this patch separately.
> > > 
> > > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > > function can get it from sg_cpu anyway.
> > > 
> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > 
> > [...]
> > 
> > > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> > >  		memset(sg_cpu, 0, sizeof(*sg_cpu));
> > >  		sg_cpu->cpu = cpu;
> > >  		sg_cpu->sg_policy = sg_policy;
> > > -		sg_cpu->flags = SCHED_CPUFREQ_RT;
> > > +		sg_cpu->flags = 0;
> > >  		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> > >  	}
> > 
> > Why this change during initialization?
> 
> Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
> change the frequency until the first time the util handler is called. And once
> that is called we were updating the flag anyway. So, unless I misunderstood its
> purpose, it was doing anything helpful.

That was actually my understanding as well. Your patch made me notice
it.

> 
> I need to remove it otherwise the RT flag may remain set for a very long time
> unnecessarily. That would be until the time the last RT task is not dequeued.
> Consider this for example: we are at max freq when sugov_start() is called and
> it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
> tasks but we always keep running at max because of the flag. Even the schedutil
> RT thread doesn't get a chance to run/deququed, because we never want a freq
> change with the RT flag and stay at max.
> 
> Makes sense ?

Yes. I guess it's working ok for now because of the problem this patch,
and Patrick's, address (always overwriting).

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 11:37       ` Viresh Kumar
  2017-12-12 13:38         ` Juri Lelli
@ 2017-12-12 15:16         ` Patrick Bellasi
  2017-12-13  9:06           ` Viresh Kumar
  1 sibling, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-12 15:16 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

Hi Viresh,

On 12-Dec 17:07, Viresh Kumar wrote:
> On 07-12-17, 12:45, Patrick Bellasi wrote:
> > On 07-Dec 10:31, Viresh Kumar wrote:

[...]

> I think its important to fix the basic mechanism of util update than fixing
> corner cases with workarounds. I attempted a simpler approach (at least
> according to me :)). Please share your feedback on it. You can include that as
> part of your series, or I can send it separately if everyone finds it okay.

please go on and post this patch on the list, all other patches from
my series can follow on top, later.

Hereafter inline are some comments on your patch...

> 
> -- 
> viresh
> 
> -------------------------8<-------------------------
> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Tue, 12 Dec 2017 15:43:26 +0530
> Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> 
> Currently the schedutil governor overwrites the sg_cpu->flags field on
> every call to the utilization handler. It was pretty good as the initial
> implementation of utilization handlers, there are several drawbacks
> though.
> 
> The biggest drawback is that the sg_cpu->flags field doesn't always
> represent the correct type of tasks that are enqueued on a CPU's rq. For
> example, if a fair task is enqueued while a RT or DL task is running, we
> will overwrite the flags with value 0 and that may take the CPU to lower
> OPPs unintentionally. There can be other corner cases as well which we
> aren't aware of currently.
> 
> This patch changes the current implementation to keep track of all the
> task types that are currently enqueued to the CPUs rq. There are two
> flags for every scheduling class now, one to set the flag and other one
> to clear it.

nit-pick: that's not completely correct, there is only one CLEAR flag
which is used to clear whatever other flags are passed in.

> The flag is set by the scheduling classes from the existing
> set of calls to cpufreq_update_util(), and the flag is cleared when the
> last task of the scheduling class is dequeued. For now, the util update
> handlers return immediately if they were called to clear the flag.
> 
> We can add more optimizations over this patch separately.
> 
> The last parameter of sugov_set_iowait_boost() is also dropped as the
> function can get it from sg_cpu anyway.

As I comment below, this should be on a different patch IMO.

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

[...]

> @@ -8,9 +8,14 @@
>   * Interface between cpufreq drivers and the scheduler:
>   */
>  
> +#define SCHED_CPUFREQ_CLEAR	(1U << 31)
>  #define SCHED_CPUFREQ_RT	(1U << 0)
> +#define SCHED_CPUFREQ_RT_CLEAR	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
>  #define SCHED_CPUFREQ_DL	(1U << 1)
> -#define SCHED_CPUFREQ_IOWAIT	(1U << 2)
> +#define SCHED_CPUFREQ_DL_CLEAR	(SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)
> +#define SCHED_CPUFREQ_CFS	(1U << 2)
> +#define SCHED_CPUFREQ_CFS_CLEAR	(SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
> +#define SCHED_CPUFREQ_IOWAIT	(1U << 3)
>  
>  #define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)

Since you are already changing some flags position, maybe we can have
a better organization by using lower flags for "general bits" and
higher ones for class specific, i.e.

#define SCHED_CPUFREQ_CLEAR	(1U <<  0)
#define SCHED_CPUFREQ_IOWAIT	(1U <<  1)

#define SCHED_CPUFREQ_CFS	(1U <<  8)
#define SCHED_CPUFREQ_RT	(1U <<  9)
#define SCHED_CPUFREQ_DL	(1U << 10)
#define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)

#define SCHED_CPUFREQ_CFS_CLEAR	(SCHED_CPUFREQ_CFS | SCHED_CPUFREQ_CLEAR)
#define SCHED_CPUFREQ_RT_CLEAR	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_CLEAR)
#define SCHED_CPUFREQ_DL_CLEAR	(SCHED_CPUFREQ_DL | SCHED_CPUFREQ_CLEAR)

>  
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 2f52ec0f1539..7edfdc59ee8f 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -187,10 +187,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
>  	*max = cfs_max;
>  }
>  
> -static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
> -				   unsigned int flags)
> +static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
>  {
> -	if (flags & SCHED_CPUFREQ_IOWAIT) {
> +	if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
> +		sg_cpu->flags &= ~SCHED_CPUFREQ_IOWAIT;
> +

This function should still work if we pass in flags as a parameter.
Thus, this looks like an change/optimization of the
sugov_set_iowait_boost API, which maybe should be better moved into a
separate patch on top of this one.

>  		if (sg_cpu->iowait_boost_pending)
>  			return;
>  

[...]

> @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
>  		memset(sg_cpu, 0, sizeof(*sg_cpu));
>  		sg_cpu->cpu = cpu;
>  		sg_cpu->sg_policy = sg_policy;
> -		sg_cpu->flags = SCHED_CPUFREQ_RT;
> +		sg_cpu->flags = 0;

Juri already pointed out this change, why it's needed?
Perhaps a note in the changelog can be useful.

>  		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
>  	}
>  

[...]

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 14:56             ` Juri Lelli
@ 2017-12-12 15:18               ` Patrick Bellasi
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-12 15:18 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Viresh Kumar, linux-kernel, linux-pm, Ingo Molnar,
	Peter Zijlstra, Rafael J . Wysocki, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 12-Dec 15:56, Juri Lelli wrote:
> On 12/12/17 20:10, Viresh Kumar wrote:
> > On 12-12-17, 14:38, Juri Lelli wrote:
> > > Hi Viresh,
> > > 
> > > On 12/12/17 17:07, Viresh Kumar wrote:
> > > 
> > > [...]
> > > 
> > > > From: Viresh Kumar <viresh.kumar@linaro.org>
> > > > Date: Tue, 12 Dec 2017 15:43:26 +0530
> > > > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags
> > > > 
> > > > Currently the schedutil governor overwrites the sg_cpu->flags field on
> > > > every call to the utilization handler. It was pretty good as the initial
> > > > implementation of utilization handlers, there are several drawbacks
> > > > though.
> > > > 
> > > > The biggest drawback is that the sg_cpu->flags field doesn't always
> > > > represent the correct type of tasks that are enqueued on a CPU's rq. For
> > > > example, if a fair task is enqueued while a RT or DL task is running, we
> > > > will overwrite the flags with value 0 and that may take the CPU to lower
> > > > OPPs unintentionally. There can be other corner cases as well which we
> > > > aren't aware of currently.
> > > > 
> > > > This patch changes the current implementation to keep track of all the
> > > > task types that are currently enqueued to the CPUs rq. There are two
> > > > flags for every scheduling class now, one to set the flag and other one
> > > > to clear it. The flag is set by the scheduling classes from the existing
> > > > set of calls to cpufreq_update_util(), and the flag is cleared when the
> > > > last task of the scheduling class is dequeued. For now, the util update
> > > > handlers return immediately if they were called to clear the flag.
> > > > 
> > > > We can add more optimizations over this patch separately.
> > > > 
> > > > The last parameter of sugov_set_iowait_boost() is also dropped as the
> > > > function can get it from sg_cpu anyway.
> > > > 
> > > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > > 
> > > [...]
> > > 
> > > > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy)
> > > >  		memset(sg_cpu, 0, sizeof(*sg_cpu));
> > > >  		sg_cpu->cpu = cpu;
> > > >  		sg_cpu->sg_policy = sg_policy;
> > > > -		sg_cpu->flags = SCHED_CPUFREQ_RT;
> > > > +		sg_cpu->flags = 0;
> > > >  		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
> > > >  	}
> > > 
> > > Why this change during initialization?
> > 
> > Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't
> > change the frequency until the first time the util handler is called. And once
> > that is called we were updating the flag anyway. So, unless I misunderstood its
> > purpose, it was doing anything helpful.
> 
> That was actually my understanding as well. Your patch made me notice
> it.
> 
> > 
> > I need to remove it otherwise the RT flag may remain set for a very long time
> > unnecessarily. That would be until the time the last RT task is not dequeued.
> > Consider this for example: we are at max freq when sugov_start() is called and
> > it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS
> > tasks but we always keep running at max because of the flag. Even the schedutil
> > RT thread doesn't get a chance to run/deququed, because we never want a freq
> > change with the RT flag and stay at max.
> > 
> > Makes sense ?
> 
> Yes. I guess it's working ok for now because of the problem this patch,
> and Patrick's, address (always overwriting).

Yes, makes sens to me too ;-)

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-12 15:16         ` Patrick Bellasi
@ 2017-12-13  9:06           ` Viresh Kumar
  0 siblings, 0 replies; 80+ messages in thread
From: Viresh Kumar @ 2017-12-13  9:06 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 12-12-17, 15:16, Patrick Bellasi wrote:
> Since you are already changing some flags position, maybe we can have
> a better organization by using lower flags for "general bits" and
> higher ones for class specific, i.e.
> 
> #define SCHED_CPUFREQ_CLEAR	(1U <<  0)
> #define SCHED_CPUFREQ_IOWAIT	(1U <<  1)
> 
> #define SCHED_CPUFREQ_CFS	(1U <<  8)
> #define SCHED_CPUFREQ_RT	(1U <<  9)
> #define SCHED_CPUFREQ_DL	(1U << 10)

Since we aren't normally going to add any more of class specific ones, I would
like to keep them as 0, 1 and 2. And using the top most bit for CLEAR seems
better to me somehow, just like the signed bit for 32 bit integers.

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used
  2017-12-07 14:19         ` Patrick Bellasi
@ 2017-12-14  4:45           ` Viresh Kumar
  0 siblings, 0 replies; 80+ messages in thread
From: Viresh Kumar @ 2017-12-14  4:45 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar, Peter Zijlstra,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Todd Kjos, Joel Fernandes

On 07-12-17, 14:19, Patrick Bellasi wrote:
> On 07-Dec 10:45, Viresh Kumar wrote:
> > On 30-11-17, 15:57, Patrick Bellasi wrote:
> > > Yes, that's a pretty trivial update with a confusing changelog.
> > > 
> > > If we think it's worth to keep (and correct as well) I'll update the
> > > commit message.
> > 
> > We also need to update the commit log based on feedback from Vikram on
> > V2. Which said that the utilization can't change around the lock here
> > as we are within rq lock section, though max can change (maybe). So
> > this patch only takes care of locking before reading max.

I have more doubts on the max reason as well. Max isn't protected by the
sg_policy lock of schedutil and it can change any time. So even after moving
code around with this patch, we wouldn't fix any race and so I am not sure this
patch helps at all. But, I have sent the same diff for another reason now in my
series. Maybe I should have kept you as the author of that patch, but I forgot.
Will do that if I need to send a V2.

> Ok, right... will do.
> 
> Thus you are still of the opinion to keep this patch in the series?

Yes, but we need a good reason for that :)

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
  2017-11-30 13:12   ` Juri Lelli
  2017-12-07  5:01   ` Viresh Kumar
@ 2017-12-20 14:33   ` Peter Zijlstra
  2017-12-20 14:51     ` Patrick Bellasi
  2 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-20 14:33 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Thu, Nov 30, 2017 at 11:47:18AM +0000, Patrick Bellasi wrote:
> Currently, sg_cpu's flags are set to the value defined by the last call
> of the cpufreq_update_util(); for RT/DL classes this corresponds to the
> SCHED_CPUFREQ_{RT/DL} flags always being set.
> 
> When multiple CPUs share the same frequency domain it might happen that
> a CPU which executed an RT task, right before entering IDLE, has one of
> the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE.
> 
> Although such an idle CPU is _going to be_ ignored by the
> sugov_next_freq_shared():
>   1. this kind of "useless RT requests" are ignored only if more then
>      TICK_NSEC have elapsed since the last update
>   2. we can still potentially trigger an already too late switch to
>      MAX, which starts also a new throttling interval
>   3. the internal state machine is not consistent with what the
>      scheduler knows, i.e. the CPU is now actually idle

So I _really_ hate having to clutter the idle path for this shared case
:/

1, can obviously be fixed by short-circuiting the timeout when idle.

2. not sure how if you do 1; anybody doing a switch will go through
   sugov_next_freq_shared() which will poll all relevant CPUs and per 1
   will see its idle, no?

Not sure what that leaves for 3.

> diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> index d518664cce4f..6e8ae2aa7a13 100644
> --- a/kernel/sched/idle_task.c
> +++ b/kernel/sched/idle_task.c
> @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
>  	put_prev_task(rq, prev);
>  	update_idle_core(rq);
>  	schedstat_inc(rq->sched_goidle);
> +
> +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
> +
>  	return rq->idle;
>  }
>  
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter
  2017-12-20 14:33   ` Peter Zijlstra
@ 2017-12-20 14:51     ` Patrick Bellasi
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-20 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 20-Dec 15:33, Peter Zijlstra wrote:
> On Thu, Nov 30, 2017 at 11:47:18AM +0000, Patrick Bellasi wrote:
> > Currently, sg_cpu's flags are set to the value defined by the last call
> > of the cpufreq_update_util(); for RT/DL classes this corresponds to the
> > SCHED_CPUFREQ_{RT/DL} flags always being set.
> > 
> > When multiple CPUs share the same frequency domain it might happen that
> > a CPU which executed an RT task, right before entering IDLE, has one of
> > the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE.
> > 
> > Although such an idle CPU is _going to be_ ignored by the
> > sugov_next_freq_shared():
> >   1. this kind of "useless RT requests" are ignored only if more then
> >      TICK_NSEC have elapsed since the last update
> >   2. we can still potentially trigger an already too late switch to
> >      MAX, which starts also a new throttling interval
> >   3. the internal state machine is not consistent with what the
> >      scheduler knows, i.e. the CPU is now actually idle
> 
> So I _really_ hate having to clutter the idle path for this shared case
> :/

:)

We would like to have per-CPU frequency domains... but the HW guys
always complain that's too costly from an HW/power standpoint...
and they are likely right :-/

So, here are are just at trying hard to have a SW status matching
the HW status... which is just another pain :-/

> 1, can obviously be fixed by short-circuiting the timeout when idle.

Mmm.. right... it should be possible for schedutil to detect that a
certain CPU is currently idle.

Can we use core.c::idle_cpu() from cpufreq_schedutil?

> 2. not sure how if you do 1; anybody doing a switch will go through
>    sugov_next_freq_shared() which will poll all relevant CPUs and per 1
>    will see its idle, no?

Right, that should work...

> Not sure what that leaves for 3.

When a CPU is detected idle, perhaps we can still clear the RT flags...
... just for "consistency" of current status representation.

> 
> > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
> > index d518664cce4f..6e8ae2aa7a13 100644
> > --- a/kernel/sched/idle_task.c
> > +++ b/kernel/sched/idle_task.c
> > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
> >  	put_prev_task(rq, prev);
> >  	update_idle_core(rq);
> >  	schedstat_inc(rq->sched_goidle);
> > +
> > +	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> > +	cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE);
> > +
> >  	return rq->idle;
> >  }
> >  
> > -- 
> > 2.14.1
> > 

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
                   ` (5 preceding siblings ...)
  2017-11-30 11:47 ` [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads Patrick Bellasi
@ 2017-12-20 15:30 ` Peter Zijlstra
  2017-12-20 15:43   ` Peter Zijlstra
                     ` (4 more replies)
  6 siblings, 5 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-20 15:30 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes


So I ended up with the below (on top of Juri's cpufreq-dl patches).

It compiles, but that's about all the testing it had.

--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -8,9 +8,7 @@
  * Interface between cpufreq drivers and the scheduler:
  */
 
-#define SCHED_CPUFREQ_RT	(1U << 0)
-#define SCHED_CPUFREQ_DL	(1U << 1)
-#define SCHED_CPUFREQ_IOWAIT	(1U << 2)
+#define SCHED_CPUFREQ_IOWAIT	(1U << 0)
 
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -63,7 +63,6 @@ struct sugov_cpu {
 	unsigned long util_cfs;
 	unsigned long util_dl;
 	unsigned long max;
-	unsigned int flags;
 
 	/* The field below is for single-CPU policies only. */
 #ifdef CONFIG_NO_HZ_COMMON
@@ -188,17 +187,23 @@ static void sugov_get_util(struct sugov_
 
 static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 {
+	unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
+	struct rq *rq = cpu_rq(sg_cpu->cpu);
+
+	if (rq->rt.rt_nr_running)
+		util = sg_cpu->max;
+
 	/*
 	 * Ideally we would like to set util_dl as min/guaranteed freq and
 	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 	 * ready for such an interface. So, we only do the latter for now.
 	 */
-	return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
+	return min(util, sg_cpu->max);
 }
 
-static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
+static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
 {
-	if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
+	if (flags & SCHED_CPUFREQ_IOWAIT) {
 		if (sg_cpu->iowait_boost_pending)
 			return;
 
@@ -267,12 +272,11 @@ static void sugov_update_single(struct u
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
-	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned long util, max;
 	unsigned int next_f;
 	bool busy;
 
-	sugov_set_iowait_boost(sg_cpu, time);
+	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
 	if (!sugov_should_update_freq(sg_policy, time))
@@ -280,25 +284,22 @@ static void sugov_update_single(struct u
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	if (flags & SCHED_CPUFREQ_RT) {
-		next_f = policy->cpuinfo.max_freq;
-	} else {
-		sugov_get_util(sg_cpu);
-		max = sg_cpu->max;
-		util = sugov_aggregate_util(sg_cpu);
-		sugov_iowait_boost(sg_cpu, &util, &max);
-		next_f = get_next_freq(sg_policy, util, max);
-		/*
-		 * Do not reduce the frequency if the CPU has not been idle
-		 * recently, as the reduction is likely to be premature then.
-		 */
-		if (busy && next_f < sg_policy->next_freq) {
-			next_f = sg_policy->next_freq;
+	sugov_get_util(sg_cpu);
+	max = sg_cpu->max;
+	util = sugov_aggregate_util(sg_cpu);
+	sugov_iowait_boost(sg_cpu, &util, &max);
+	next_f = get_next_freq(sg_policy, util, max);
+	/*
+	 * Do not reduce the frequency if the CPU has not been idle
+	 * recently, as the reduction is likely to be premature then.
+	 */
+	if (busy && next_f < sg_policy->next_freq) {
+		next_f = sg_policy->next_freq;
 
-			/* Reset cached freq as next_freq has changed */
-			sg_policy->cached_raw_freq = 0;
-		}
+		/* Reset cached freq as next_freq has changed */
+		sg_policy->cached_raw_freq = 0;
 	}
+
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
@@ -314,6 +315,9 @@ static unsigned int sugov_next_freq_shar
 		unsigned long j_util, j_max;
 		s64 delta_ns;
 
+		if (j_sg_cpu != sg_cpu)
+			sugov_get_util(j_sg_cpu);
+
 		/*
 		 * If the CFS CPU utilization was last updated before the
 		 * previous frequency update and the time elapsed between the
@@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
 		if (delta_ns > TICK_NSEC) {
 			j_sg_cpu->iowait_boost = 0;
 			j_sg_cpu->iowait_boost_pending = false;
-			j_sg_cpu->util_cfs = 0;
-			if (j_sg_cpu->util_dl == 0)
-				continue;
 		}
-		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
-			return policy->cpuinfo.max_freq;
 
 		j_max = j_sg_cpu->max;
 		j_util = sugov_aggregate_util(j_sg_cpu);
@@ -357,17 +356,11 @@ static void sugov_update_shared(struct u
 	raw_spin_lock(&sg_policy->update_lock);
 
 	sugov_get_util(sg_cpu);
-	sg_cpu->flags = flags;
-
-	sugov_set_iowait_boost(sg_cpu, time);
+	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT)
-			next_f = sg_policy->policy->cpuinfo.max_freq;
-		else
-			next_f = sugov_next_freq_shared(sg_cpu, time);
-
+		next_f = sugov_next_freq_shared(sg_cpu, time);
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
@@ -678,7 +671,6 @@ static int sugov_start(struct cpufreq_po
 		memset(sg_cpu, 0, sizeof(*sg_cpu));
 		sg_cpu->cpu = cpu;
 		sg_cpu->sg_policy = sg_policy;
-		sg_cpu->flags = 0;
 		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
 	}
 
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -87,7 +87,7 @@ void __add_running_bw(u64 dl_bw, struct
 	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
 	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
+	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
 }
 
 static inline
@@ -101,7 +101,7 @@ void __sub_running_bw(u64 dl_bw, struct
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
+	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
 }
 
 static inline
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq
 	if (unlikely((s64)delta_exec <= 0))
 		return;
 
-	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
-
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
 
@@ -1003,6 +1000,9 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
 
 	sub_nr_running(rq, rt_rq->rt_nr_running);
 	rt_rq->rt_queued = 0;
+
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, 0);
 }
 
 static void
@@ -1019,6 +1019,9 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq)
 
 	add_nr_running(rq, rt_rq->rt_nr_running);
 	rt_rq->rt_queued = 1;
+
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_util(rq, 0);
 }
 
 #if defined CONFIG_SMP

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
@ 2017-12-20 15:43   ` Peter Zijlstra
  2017-12-21  9:15     ` Viresh Kumar
  2017-12-20 15:56   ` Peter Zijlstra
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-20 15:43 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:
> @@ -314,6 +315,9 @@ static unsigned int sugov_next_freq_shar
>  		unsigned long j_util, j_max;
>  		s64 delta_ns;
>  
> +		if (j_sg_cpu != sg_cpu)
> +			sugov_get_util(j_sg_cpu);
> +
>  		/*
>  		 * If the CFS CPU utilization was last updated before the
>  		 * previous frequency update and the time elapsed between the
> @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
>  		if (delta_ns > TICK_NSEC) {
>  			j_sg_cpu->iowait_boost = 0;
>  			j_sg_cpu->iowait_boost_pending = false;
> -			j_sg_cpu->util_cfs = 0;
> -			if (j_sg_cpu->util_dl == 0)
> -				continue;
>  		}
> -		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
> -			return policy->cpuinfo.max_freq;
>  
>  		j_max = j_sg_cpu->max;
>  		j_util = sugov_aggregate_util(j_sg_cpu);

The below makes more sense to me too; hmm?

@@ -335,12 +335,11 @@ static unsigned int sugov_next_freq_shar
 
 		j_max = j_sg_cpu->max;
 		j_util = sugov_aggregate_util(j_sg_cpu);
+		sugov_iowait_boost(j_sg_cpu, &util, &max);
 		if (j_util * max > j_max * util) {
 			util = j_util;
 			max = j_max;
 		}
-
-		sugov_iowait_boost(j_sg_cpu, &util, &max);
 	}
 
 	return get_next_freq(sg_policy, util, max);

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
  2017-12-20 15:43   ` Peter Zijlstra
@ 2017-12-20 15:56   ` Peter Zijlstra
  2017-12-31  9:43     ` Claudio Scordino
  2017-12-20 17:38   ` Juri Lelli
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-20 15:56 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:
> 
> So I ended up with the below (on top of Juri's cpufreq-dl patches).
> 
> It compiles, but that's about all the testing it had.

Should all be available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

compile tested only, I suppose the 0day bot will let me know soon enough
:-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
  2017-12-20 15:43   ` Peter Zijlstra
  2017-12-20 15:56   ` Peter Zijlstra
@ 2017-12-20 17:38   ` Juri Lelli
  2017-12-20 18:16     ` Peter Zijlstra
  2017-12-22 10:06     ` Peter Zijlstra
  2017-12-21  7:34   ` Viresh Kumar
  2018-02-06 10:55   ` Claudio Scordino
  4 siblings, 2 replies; 80+ messages in thread
From: Juri Lelli @ 2017-12-20 17:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 20/12/17 16:30, Peter Zijlstra wrote:

[...]

> @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
>  		if (delta_ns > TICK_NSEC) {
>  			j_sg_cpu->iowait_boost = 0;
>  			j_sg_cpu->iowait_boost_pending = false;
> -			j_sg_cpu->util_cfs = 0;
> -			if (j_sg_cpu->util_dl == 0)
> -				continue;
>  		}

This goes away because with Brendan/Vincent fix we don't need the
workaround for stale CFS util contribution for idle CPUs anymore?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 17:38   ` Juri Lelli
@ 2017-12-20 18:16     ` Peter Zijlstra
  2017-12-22 10:06     ` Peter Zijlstra
  1 sibling, 0 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-20 18:16 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Wed, Dec 20, 2017 at 06:38:14PM +0100, Juri Lelli wrote:
> On 20/12/17 16:30, Peter Zijlstra wrote:
> 
> [...]
> 
> > @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
> >  		if (delta_ns > TICK_NSEC) {
> >  			j_sg_cpu->iowait_boost = 0;
> >  			j_sg_cpu->iowait_boost_pending = false;
> > -			j_sg_cpu->util_cfs = 0;
> > -			if (j_sg_cpu->util_dl == 0)
> > -				continue;
> >  		}
> 
> This goes away because with Brendan/Vincent fix we don't need the
> workaround for stale CFS util contribution for idle CPUs anymore?

Oh, good point, no I took that out because of:

@@ -314,6 +315,9 @@ static unsigned int sugov_next_freq_shar

                unsigned long j_util, j_max;
                s64 delta_ns;

+               if (j_sg_cpu != sg_cpu)
+                       sugov_get_util(j_sg_cpu);
+
                /*
                 * If the CFS CPU utilization was last updated before the
                 * previous frequency update and the time elapsed between the


which recomputes the util value all the time. But yes, that still needs
those other patches to stay relevant.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
                     ` (2 preceding siblings ...)
  2017-12-20 17:38   ` Juri Lelli
@ 2017-12-21  7:34   ` Viresh Kumar
  2018-02-06 10:55   ` Claudio Scordino
  4 siblings, 0 replies; 80+ messages in thread
From: Viresh Kumar @ 2017-12-21  7:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 20-12-17, 16:30, Peter Zijlstra wrote:
> 
> So I ended up with the below (on top of Juri's cpufreq-dl patches).

Nice :)

There are two things that I noticed in your tree.

Firstly, there is no need of the following patch as we shouldn't have
the problem mentioned in the commit anymore:

38e19dbe1286 cpufreq: schedutil: ignore sugov kthreads

And maybe we can apply the below patch after that (only compile
tested).

-------------------------8<-------------------------

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Thu, 21 Dec 2017 12:52:50 +0530
Subject: [PATCH] cpufreq/schedutil: Call sugov_get_util() only when required

sugov_update_shared() doesn't use the updated values of max, util_cfs
and util_dl, unless we try to find the next frequency by calling
sugov_next_freq_shared() and so there is no need to call it directly
from sugov_update_shared(). Rather postpone it until the time
sugov_next_freq_shared() is called for one of the CPUs that share the
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ab84d2261554..f2f4df26b954 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -315,8 +315,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		unsigned long j_util, j_max;
 		s64 delta_ns;
 
-		if (j_sg_cpu != sg_cpu)
-			sugov_get_util(j_sg_cpu);
+		sugov_get_util(j_sg_cpu);
 
 		/*
 		 * If the CFS CPU utilization was last updated before the
@@ -354,7 +353,6 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 
 	raw_spin_lock(&sg_policy->update_lock);
 
-	sugov_get_util(sg_cpu);
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:43   ` Peter Zijlstra
@ 2017-12-21  9:15     ` Viresh Kumar
  2017-12-21 10:25       ` Peter Zijlstra
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-21  9:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 20-12-17, 16:43, Peter Zijlstra wrote:
> The below makes more sense to me too; hmm?
> 
> @@ -335,12 +335,11 @@ static unsigned int sugov_next_freq_shar
>  
>  		j_max = j_sg_cpu->max;
>  		j_util = sugov_aggregate_util(j_sg_cpu);
> +		sugov_iowait_boost(j_sg_cpu, &util, &max);
>  		if (j_util * max > j_max * util) {
>  			util = j_util;
>  			max = j_max;
>  		}
> -
> -		sugov_iowait_boost(j_sg_cpu, &util, &max);

Sorry if I am being a fool here, I had 3 different interpretations of
the results after this change in the last 15 minutes. It was confusing
for somehow..

Why do you think above change matters ? I think nothing changed after
this diff at all.

We have three different values here:

util/max, j_util/j_max, and j_boost_util/j_boost_max.

And we are trying to find the max among them and changing the order of
comparisons doesn't change anything.

Am I reading the code correctly ?

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-21  9:15     ` Viresh Kumar
@ 2017-12-21 10:25       ` Peter Zijlstra
  2017-12-21 10:30         ` Viresh Kumar
  0 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-21 10:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Thu, Dec 21, 2017 at 02:45:02PM +0530, Viresh Kumar wrote:
> On 20-12-17, 16:43, Peter Zijlstra wrote:
> > The below makes more sense to me too; hmm?
> > 
> > @@ -335,12 +335,11 @@ static unsigned int sugov_next_freq_shar
> >  
> >  		j_max = j_sg_cpu->max;
> >  		j_util = sugov_aggregate_util(j_sg_cpu);
> > +		sugov_iowait_boost(j_sg_cpu, &util, &max);

This should 'obviously' have been:

		sugov_iowait_boost(j_sg_cpu, &j_util, *j_max);

> >  		if (j_util * max > j_max * util) {
> >  			util = j_util;
> >  			max = j_max;
> >  		}
> > -
> > -		sugov_iowait_boost(j_sg_cpu, &util, &max);

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-21 10:25       ` Peter Zijlstra
@ 2017-12-21 10:30         ` Viresh Kumar
  2017-12-21 10:39           ` Peter Zijlstra
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-21 10:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 21-12-17, 11:25, Peter Zijlstra wrote:
> On Thu, Dec 21, 2017 at 02:45:02PM +0530, Viresh Kumar wrote:
> > On 20-12-17, 16:43, Peter Zijlstra wrote:
> > > The below makes more sense to me too; hmm?
> > > 
> > > @@ -335,12 +335,11 @@ static unsigned int sugov_next_freq_shar
> > >  
> > >  		j_max = j_sg_cpu->max;
> > >  		j_util = sugov_aggregate_util(j_sg_cpu);
> > > +		sugov_iowait_boost(j_sg_cpu, &util, &max);
> 
> This should 'obviously' have been:
> 
> 		sugov_iowait_boost(j_sg_cpu, &j_util, *j_max);

Actually it should be:

 		sugov_iowait_boost(j_sg_cpu, &j_util, &j_max);

and this is how it was in the commit I reviewed from your tree. But my query
still stands, what difference will it make ?

> > >  		if (j_util * max > j_max * util) {
> > >  			util = j_util;
> > >  			max = j_max;
> > >  		}
> > > -
> > > -		sugov_iowait_boost(j_sg_cpu, &util, &max);

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-21 10:30         ` Viresh Kumar
@ 2017-12-21 10:39           ` Peter Zijlstra
  2017-12-21 10:43             ` Viresh Kumar
  0 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-21 10:39 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Thu, Dec 21, 2017 at 04:00:22PM +0530, Viresh Kumar wrote:
> On 21-12-17, 11:25, Peter Zijlstra wrote:
> > On Thu, Dec 21, 2017 at 02:45:02PM +0530, Viresh Kumar wrote:
> > > On 20-12-17, 16:43, Peter Zijlstra wrote:
> > > > The below makes more sense to me too; hmm?
> > > > 
> > > > @@ -335,12 +335,11 @@ static unsigned int sugov_next_freq_shar
> > > >  
> > > >  		j_max = j_sg_cpu->max;
> > > >  		j_util = sugov_aggregate_util(j_sg_cpu);
> > > > +		sugov_iowait_boost(j_sg_cpu, &util, &max);
> > 
> > This should 'obviously' have been:
> > 
> > 		sugov_iowait_boost(j_sg_cpu, &j_util, *j_max);
> 
> Actually it should be:
> 
>  		sugov_iowait_boost(j_sg_cpu, &j_util, &j_max);

Yes, clearly I cannot type much ;-)

> and this is how it was in the commit I reviewed from your tree. But my query
> still stands, what difference will it make ?
> 
> > > >  		if (j_util * max > j_max * util) {
> > > >  			util = j_util;
> > > >  			max = j_max;
> > > >  		}
> > > > -
> > > > -		sugov_iowait_boost(j_sg_cpu, &util, &max);
> 

The difference is that we apply the per-cpu boost on the per-cpu util
value and _then_ find the overall maximum.

Instead of finding the overall maximum and then apply the per-cpu boost
to that.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-21 10:39           ` Peter Zijlstra
@ 2017-12-21 10:43             ` Viresh Kumar
  2017-12-22  8:30               ` Peter Zijlstra
  0 siblings, 1 reply; 80+ messages in thread
From: Viresh Kumar @ 2017-12-21 10:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On 21-12-17, 11:39, Peter Zijlstra wrote:
> The difference is that we apply the per-cpu boost on the per-cpu util
> value and _then_ find the overall maximum.
> 
> Instead of finding the overall maximum and then apply the per-cpu boost
> to that.

Okay, so it is just about the right sequencing of these comparisons but the
outcome will still be same, i.e. max of the 3 util/max values. Thanks.

-- 
viresh

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-21 10:43             ` Viresh Kumar
@ 2017-12-22  8:30               ` Peter Zijlstra
  0 siblings, 0 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22  8:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

On Thu, Dec 21, 2017 at 04:13:17PM +0530, Viresh Kumar wrote:
> On 21-12-17, 11:39, Peter Zijlstra wrote:
> > The difference is that we apply the per-cpu boost on the per-cpu util
> > value and _then_ find the overall maximum.
> > 
> > Instead of finding the overall maximum and then apply the per-cpu boost
> > to that.
> 
> Okay, so it is just about the right sequencing of these comparisons but the
> outcome will still be same, i.e. max of the 3 util/max values. Thanks.

Aah, you're right. I was thinking we have relative boost, and in that
case the ordering matters much more.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 17:38   ` Juri Lelli
  2017-12-20 18:16     ` Peter Zijlstra
@ 2017-12-22 10:06     ` Peter Zijlstra
  2017-12-22 11:02       ` Patrick Bellasi
  1 sibling, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22 10:06 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Wed, Dec 20, 2017 at 06:38:14PM +0100, Juri Lelli wrote:
> On 20/12/17 16:30, Peter Zijlstra wrote:
> 
> [...]
> 
> > @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
> >  		if (delta_ns > TICK_NSEC) {
> >  			j_sg_cpu->iowait_boost = 0;
> >  			j_sg_cpu->iowait_boost_pending = false;
> > -			j_sg_cpu->util_cfs = 0;
> > -			if (j_sg_cpu->util_dl == 0)
> > -				continue;
> >  		}
> 
> This goes away because with Brendan/Vincent fix we don't need the
> workaround for stale CFS util contribution for idle CPUs anymore?

An easy fix would be something like the below I suppose (also folded a
change from Viresh).

This way it completely ignores the demand from idle CPUs. Which I
suppose is exactly what you want, no?

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ab84d2261554..9736b537386a 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		unsigned long j_util, j_max;
 		s64 delta_ns;
 
-		if (j_sg_cpu != sg_cpu)
-			sugov_get_util(j_sg_cpu);
+		if (idle_cpu(j))
+			continue;
 
 		/*
 		 * If the CFS CPU utilization was last updated before the
@@ -354,7 +354,6 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 
 	raw_spin_lock(&sg_policy->update_lock);
 
-	sugov_get_util(sg_cpu);
 	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 10:06     ` Peter Zijlstra
@ 2017-12-22 11:02       ` Patrick Bellasi
  2017-12-22 11:46         ` Peter Zijlstra
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 11:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 11:06, Peter Zijlstra wrote:
> On Wed, Dec 20, 2017 at 06:38:14PM +0100, Juri Lelli wrote:
> > On 20/12/17 16:30, Peter Zijlstra wrote:
> > 
> > [...]
> > 
> > > @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
> > >  		if (delta_ns > TICK_NSEC) {
> > >  			j_sg_cpu->iowait_boost = 0;
> > >  			j_sg_cpu->iowait_boost_pending = false;
> > > -			j_sg_cpu->util_cfs = 0;
> > > -			if (j_sg_cpu->util_dl == 0)
> > > -				continue;
> > >  		}
> > 
> > This goes away because with Brendan/Vincent fix we don't need the
> > workaround for stale CFS util contribution for idle CPUs anymore?
> 
> An easy fix would be something like the below I suppose (also folded a
> change from Viresh).
> 
> This way it completely ignores the demand from idle CPUs. Which I
> suppose is exactly what you want, no?
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index ab84d2261554..9736b537386a 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  		unsigned long j_util, j_max;
>  		s64 delta_ns;
>  
> -		if (j_sg_cpu != sg_cpu)
> -			sugov_get_util(j_sg_cpu);
> +		if (idle_cpu(j))
> +			continue;

That should work to skip IDLE CPUs... however I'm missing where now we
get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved
somewhere else I guess...

Moreover, that way don't we completely disregard CFS blocked load for
IDLE CPUs... as well as DL reserved utilization, which should be
released only at the 0-lag time?

>  
>  		/*
>  		 * If the CFS CPU utilization was last updated before the
> @@ -354,7 +354,6 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  
>  	raw_spin_lock(&sg_policy->update_lock);
>  
> -	sugov_get_util(sg_cpu);
>  	sugov_set_iowait_boost(sg_cpu, time, flags);
>  	sg_cpu->last_update = time;
>  

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 11:02       ` Patrick Bellasi
@ 2017-12-22 11:46         ` Peter Zijlstra
  2017-12-22 12:07           ` Juri Lelli
                             ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22 11:46 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote:
> > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> >  		unsigned long j_util, j_max;
> >  		s64 delta_ns;
> >  
> > -		if (j_sg_cpu != sg_cpu)
> > -			sugov_get_util(j_sg_cpu);
> > +		if (idle_cpu(j))
> > +			continue;
> 
> That should work to skip IDLE CPUs... however I'm missing where now we
> get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved
> somewhere else I guess...

No, I'm just an idiot... lemme fix that.

> Moreover, that way don't we completely disregard CFS blocked load for
> IDLE CPUs... as well as DL reserved utilization, which should be
> released only at the 0-lag time?

I was thinking that since dl is a 'global' scheduler the reservation
would be too and thus the freq just needs a single CPU to be observed;
but I suppose there's nothing stopping anybody from splitting a clock
domain down the middle scheduling wise. So yes, good point.

Blergh that'd make a mess of things again.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 11:46         ` Peter Zijlstra
@ 2017-12-22 12:07           ` Juri Lelli
  2017-12-22 12:14             ` Patrick Bellasi
  2017-12-22 12:07           ` Patrick Bellasi
  2017-12-22 12:10           ` Peter Zijlstra
  2 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-12-22 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

Hi Peter,

On 22/12/17 12:46, Peter Zijlstra wrote:
> On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote:
> > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> > >  		unsigned long j_util, j_max;
> > >  		s64 delta_ns;
> > >  
> > > -		if (j_sg_cpu != sg_cpu)
> > > -			sugov_get_util(j_sg_cpu);
> > > +		if (idle_cpu(j))
> > > +			continue;
> > 
> > That should work to skip IDLE CPUs... however I'm missing where now we
> > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved
> > somewhere else I guess...
> 
> No, I'm just an idiot... lemme fix that.
> 
> > Moreover, that way don't we completely disregard CFS blocked load for
> > IDLE CPUs... as well as DL reserved utilization, which should be
> > released only at the 0-lag time?
> 
> I was thinking that since dl is a 'global' scheduler the reservation
> would be too and thus the freq just needs a single CPU to be observed;
> but I suppose there's nothing stopping anybody from splitting a clock
> domain down the middle scheduling wise. So yes, good point.

Also, for CFS current behaviour is to start ignoring contributions after
TICK_NS. It seems that your change might introduce regressions?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 11:46         ` Peter Zijlstra
  2017-12-22 12:07           ` Juri Lelli
@ 2017-12-22 12:07           ` Patrick Bellasi
  2017-12-22 12:19             ` Peter Zijlstra
  2017-12-22 12:10           ` Peter Zijlstra
  2 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 12:46, Peter Zijlstra wrote:
> On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote:
> > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> > >  		unsigned long j_util, j_max;
> > >  		s64 delta_ns;
> > >  
> > > -		if (j_sg_cpu != sg_cpu)
> > > -			sugov_get_util(j_sg_cpu);
> > > +		if (idle_cpu(j))
> > > +			continue;
> > 
> > That should work to skip IDLE CPUs... however I'm missing where now we
> > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved
> > somewhere else I guess...
> 
> No, I'm just an idiot... lemme fix that.

Then you just missed a call to sugov_get_util(j_sg_cpu) after the
above if... right, actually that was Viresh proposal...

> > Moreover, that way don't we completely disregard CFS blocked load for
> > IDLE CPUs... as well as DL reserved utilization, which should be
> > released only at the 0-lag time?
> 
> I was thinking that since dl is a 'global' scheduler the reservation
> would be too and thus the freq just needs a single CPU to be observed;

AFAIU global is only the admission control (which is something worth a
thread by itself...) while the dl_se->dl_bw are aggregated into the
dl_rq->running_bw, which ultimately represents the DL bandwidth
required for just a CPU.

> but I suppose there's nothing stopping anybody from splitting a clock
> domain down the middle scheduling wise. So yes, good point.

That makes sense... moreover, using the global utilization, we would
end up asking for capacities which cannot be provided by a single CPU.

> Blergh that'd make a mess of things again.

Actually, looking better at your patch: are we not just ok with that?

I mean, we don't need this check on idle_cpu since in
sugov_aggregate_util we already skip the util=sg_cpu->max in case of
!rq->rt.rt_nr_running, while we aggregate just CFS and DL requests.

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 11:46         ` Peter Zijlstra
  2017-12-22 12:07           ` Juri Lelli
  2017-12-22 12:07           ` Patrick Bellasi
@ 2017-12-22 12:10           ` Peter Zijlstra
  2017-12-22 12:25             ` Patrick Bellasi
  2 siblings, 1 reply; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22 12:10 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Fri, Dec 22, 2017 at 12:46:18PM +0100, Peter Zijlstra wrote:
> Blergh that'd make a mess of things again.

Something like so then..

--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -187,11 +187,16 @@ static void sugov_get_util(struct sugov_
 
 static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 {
-	unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
+	unsigned long util;
 
-	if (rq->rt.rt_nr_running)
+	if (rq->rt.rt_nr_running) {
 		util = sg_cpu->max;
+	} else {
+		util = sg_cpu->util_dl;
+		if (rq->cfs.h_nr_running)
+			util += sg_cpu->util_cfs;
+	}
 
 	/*
 	 * Ideally we would like to set util_dl as min/guaranteed freq and

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:07           ` Juri Lelli
@ 2017-12-22 12:14             ` Patrick Bellasi
  2017-12-22 12:22               ` Peter Zijlstra
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 12:14 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 13:07, Juri Lelli wrote:
> Hi Peter,
> 
> On 22/12/17 12:46, Peter Zijlstra wrote:
> > On Fri, Dec 22, 2017 at 11:02:06AM +0000, Patrick Bellasi wrote:
> > > > @@ -315,8 +315,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> > > >  		unsigned long j_util, j_max;
> > > >  		s64 delta_ns;
> > > >  
> > > > -		if (j_sg_cpu != sg_cpu)
> > > > -			sugov_get_util(j_sg_cpu);
> > > > +		if (idle_cpu(j))
> > > > +			continue;
> > > 
> > > That should work to skip IDLE CPUs... however I'm missing where now we
> > > get the sugov_get_util(j_sg_cpu) for active CPUs. It has been moved
> > > somewhere else I guess...
> > 
> > No, I'm just an idiot... lemme fix that.
> > 
> > > Moreover, that way don't we completely disregard CFS blocked load for
> > > IDLE CPUs... as well as DL reserved utilization, which should be
> > > released only at the 0-lag time?
> > 
> > I was thinking that since dl is a 'global' scheduler the reservation
> > would be too and thus the freq just needs a single CPU to be observed;
> > but I suppose there's nothing stopping anybody from splitting a clock
> > domain down the middle scheduling wise. So yes, good point.
> 
> Also, for CFS current behaviour is to start ignoring contributions after
> TICK_NS. It seems that your change might introduce regressions?

Good point, an energy regression I guess you mean...

I think that check is already gone for CFS in the current PeterZ tree.
It seems we use TICK_NS just for the reset of iowait_boost, isn't it?

However, if the remote updates of CFS works as expected,
the removal of the TICK_NS for CFS is not intentional?

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:07           ` Patrick Bellasi
@ 2017-12-22 12:19             ` Peter Zijlstra
  2017-12-22 12:27               ` Juri Lelli
  2017-12-22 12:38               ` Patrick Bellasi
  0 siblings, 2 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22 12:19 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > I was thinking that since dl is a 'global' scheduler the reservation
> > would be too and thus the freq just needs a single CPU to be observed;
> 
> AFAIU global is only the admission control (which is something worth a
> thread by itself...) while the dl_se->dl_bw are aggregated into the
> dl_rq->running_bw, which ultimately represents the DL bandwidth
> required for just a CPU.

Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
I think. But yes, that's another thread.

> > but I suppose there's nothing stopping anybody from splitting a clock
> > domain down the middle scheduling wise. So yes, good point.
> 
> That makes sense... moreover, using the global utilization, we would
> end up asking for capacities which cannot be provided by a single CPU.

Yes, but that _should_ not be a problem if you clock them all high
enough. But this gets to be complicated real fast I think.

> > Blergh that'd make a mess of things again.
> 
> Actually, looking better at your patch: are we not just ok with that?
> 
> I mean, we don't need this check on idle_cpu since in
> sugov_aggregate_util we already skip the util=sg_cpu->max in case of
> !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests.

Right, well, I don't actually have an environment to test this sanely,
so someone will have to go play with the various variations and see what
works.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:14             ` Patrick Bellasi
@ 2017-12-22 12:22               ` Peter Zijlstra
  0 siblings, 0 replies; 80+ messages in thread
From: Peter Zijlstra @ 2017-12-22 12:22 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On Fri, Dec 22, 2017 at 12:14:45PM +0000, Patrick Bellasi wrote:
> I think that check is already gone for CFS in the current PeterZ tree.
> It seems we use TICK_NS just for the reset of iowait_boost, isn't it?

Easy enough to bring back though.

> However, if the remote updates of CFS works as expected,
> the removal of the TICK_NS for CFS is not intentional?

So I killed that because I now do get_util for all CPUs and figured
get_util provides up-to-date numbers. So we don't need no artificial
aging.

Esp. once we get that whole blocked load stuff sorted.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:10           ` Peter Zijlstra
@ 2017-12-22 12:25             ` Patrick Bellasi
  0 siblings, 0 replies; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 12:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 13:10, Peter Zijlstra wrote:
> On Fri, Dec 22, 2017 at 12:46:18PM +0100, Peter Zijlstra wrote:
> > Blergh that'd make a mess of things again.
> 
> Something like so then..
> 
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -187,11 +187,16 @@ static void sugov_get_util(struct sugov_
>  
>  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>  {
> -	unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
>  	struct rq *rq = cpu_rq(sg_cpu->cpu);
> +	unsigned long util;
>  
> -	if (rq->rt.rt_nr_running)
> +	if (rq->rt.rt_nr_running) {
>  		util = sg_cpu->max;
> +	} else {
> +		util = sg_cpu->util_dl;
> +		if (rq->cfs.h_nr_running)
> +			util += sg_cpu->util_cfs;

Since sugov_aggregate_util always follow sugov_get_util, maybe we
can move these checks into the latter and remove the first one?

That way, sg_cpu->util_{dl,rt,cfs} will always report exactly the
requests of each class considering also which tasks are RUNNABLE at
sugov_get_util time.

Still the observation of Juri is valid: do we wanna really disregard
all the CFS blocked load as soon as there are not CFS tasks?

> +	}
>  
>  	/*
>  	 * Ideally we would like to set util_dl as min/guaranteed freq and

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:19             ` Peter Zijlstra
@ 2017-12-22 12:27               ` Juri Lelli
  2017-12-22 12:38               ` Patrick Bellasi
  1 sibling, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-12-22 12:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes,
	Claudio Scordino, Luca Abeni

On 22/12/17 13:19, Peter Zijlstra wrote:
> On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > > I was thinking that since dl is a 'global' scheduler the reservation
> > > would be too and thus the freq just needs a single CPU to be observed;
> > 
> > AFAIU global is only the admission control (which is something worth a
> > thread by itself...) while the dl_se->dl_bw are aggregated into the
> > dl_rq->running_bw, which ultimately represents the DL bandwidth
> > required for just a CPU.
> 
> Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
> I think. But yes, that's another thread.
> 
> > > but I suppose there's nothing stopping anybody from splitting a clock
> > > domain down the middle scheduling wise. So yes, good point.
> > 
> > That makes sense... moreover, using the global utilization, we would
> > end up asking for capacities which cannot be provided by a single CPU.
> 
> Yes, but that _should_ not be a problem if you clock them all high
> enough. But this gets to be complicated real fast I think.
> 
> > > Blergh that'd make a mess of things again.
> > 
> > Actually, looking better at your patch: are we not just ok with that?
> > 
> > I mean, we don't need this check on idle_cpu since in
> > sugov_aggregate_util we already skip the util=sg_cpu->max in case of
> > !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests.
> 
> Right, well, I don't actually have an environment to test this sanely,
> so someone will have to go play with the various variations and see what
> works.

Adding Claudio and Luca to the thread (as I don't have a testing
platform myself ATM). ;)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:19             ` Peter Zijlstra
  2017-12-22 12:27               ` Juri Lelli
@ 2017-12-22 12:38               ` Patrick Bellasi
  2017-12-22 12:43                 ` Juri Lelli
  1 sibling, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 12:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 13:19, Peter Zijlstra wrote:
> On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > > I was thinking that since dl is a 'global' scheduler the reservation
> > > would be too and thus the freq just needs a single CPU to be observed;
> > 
> > AFAIU global is only the admission control (which is something worth a
> > thread by itself...) while the dl_se->dl_bw are aggregated into the
> > dl_rq->running_bw, which ultimately represents the DL bandwidth
> > required for just a CPU.
> 
> Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
> I think. But yes, that's another thread.

Mmm... maybe I don't get your point... I was referring to the global
admission control of DL. If you have for example 3 60% DL tasks on a
2CPU system, AFAIU the CBS will allow the tasks in the system (since
the overall utilization is 180 < 200 * 0.95) although that workload is
not necessarily schedule (for example if the tasks wakeups at the
same time one of them will miss its deadline).

But, yeah... maybe I'm completely wrong or, in any case, it's for a
different thread...

> > > but I suppose there's nothing stopping anybody from splitting a clock
> > > domain down the middle scheduling wise. So yes, good point.
> > 
> > That makes sense... moreover, using the global utilization, we would
> > end up asking for capacities which cannot be provided by a single CPU.
> 
> Yes, but that _should_ not be a problem if you clock them all high
> enough. But this gets to be complicated real fast I think.

IMO the current solution with Juri's patches is working as expected:
we know how many DL tasks are runnable on a CPU and we properly
account for their utilization.

The only "issue/limitation" is (eventually) the case described above.
Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case
we will ask for 120% Utilization?

> > > Blergh that'd make a mess of things again.
> > 
> > Actually, looking better at your patch: are we not just ok with that?
> > 
> > I mean, we don't need this check on idle_cpu since in
> > sugov_aggregate_util we already skip the util=sg_cpu->max in case of
> > !rq->rt.rt_nr_running, while we aggregate just CFS and DL requests.
> 
> Right, well, I don't actually have an environment to test this sanely,
> so someone will have to go play with the various variations and see what
> works.

Definitively, we have some synthetics for mainline... as well as we
can easily backport this series to v4.9 and test for power/perf using
a full Android stack. But, give today is the 22th, I guess we can do
that after holidays (in ~2 weeks).

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:38               ` Patrick Bellasi
@ 2017-12-22 12:43                 ` Juri Lelli
  2017-12-22 12:50                   ` Patrick Bellasi
  0 siblings, 1 reply; 80+ messages in thread
From: Juri Lelli @ 2017-12-22 12:43 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22/12/17 12:38, Patrick Bellasi wrote:
> On 22-Dec 13:19, Peter Zijlstra wrote:
> > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > > > I was thinking that since dl is a 'global' scheduler the reservation
> > > > would be too and thus the freq just needs a single CPU to be observed;
> > > 
> > > AFAIU global is only the admission control (which is something worth a
> > > thread by itself...) while the dl_se->dl_bw are aggregated into the
> > > dl_rq->running_bw, which ultimately represents the DL bandwidth
> > > required for just a CPU.
> > 
> > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
> > I think. But yes, that's another thread.
> 
> Mmm... maybe I don't get your point... I was referring to the global
> admission control of DL. If you have for example 3 60% DL tasks on a
> 2CPU system, AFAIU the CBS will allow the tasks in the system (since
> the overall utilization is 180 < 200 * 0.95) although that workload is
> not necessarily schedule (for example if the tasks wakeups at the
> same time one of them will miss its deadline).
> 
> But, yeah... maybe I'm completely wrong or, in any case, it's for a
> different thread...
> 
> > > > but I suppose there's nothing stopping anybody from splitting a clock
> > > > domain down the middle scheduling wise. So yes, good point.
> > > 
> > > That makes sense... moreover, using the global utilization, we would
> > > end up asking for capacities which cannot be provided by a single CPU.
> > 
> > Yes, but that _should_ not be a problem if you clock them all high
> > enough. But this gets to be complicated real fast I think.
> 
> IMO the current solution with Juri's patches is working as expected:
> we know how many DL tasks are runnable on a CPU and we properly
> account for their utilization.
> 
> The only "issue/limitation" is (eventually) the case described above.
> Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case
> we will ask for 120% Utilization?

In general it depends on the other parameters, deadline and period.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:43                 ` Juri Lelli
@ 2017-12-22 12:50                   ` Patrick Bellasi
  2017-12-22 13:01                     ` Juri Lelli
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2017-12-22 12:50 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22-Dec 13:43, Juri Lelli wrote:
> On 22/12/17 12:38, Patrick Bellasi wrote:
> > On 22-Dec 13:19, Peter Zijlstra wrote:
> > > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > > > > I was thinking that since dl is a 'global' scheduler the reservation
> > > > > would be too and thus the freq just needs a single CPU to be observed;
> > > > 
> > > > AFAIU global is only the admission control (which is something worth a
> > > > thread by itself...) while the dl_se->dl_bw are aggregated into the
> > > > dl_rq->running_bw, which ultimately represents the DL bandwidth
> > > > required for just a CPU.
> > > 
> > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
> > > I think. But yes, that's another thread.
> > 
> > Mmm... maybe I don't get your point... I was referring to the global
> > admission control of DL. If you have for example 3 60% DL tasks on a
> > 2CPU system, AFAIU the CBS will allow the tasks in the system (since
> > the overall utilization is 180 < 200 * 0.95) although that workload is
> > not necessarily schedule (for example if the tasks wakeups at the
> > same time one of them will miss its deadline).
> > 
> > But, yeah... maybe I'm completely wrong or, in any case, it's for a
> > different thread...
> > 
> > > > > but I suppose there's nothing stopping anybody from splitting a clock
> > > > > domain down the middle scheduling wise. So yes, good point.
> > > > 
> > > > That makes sense... moreover, using the global utilization, we would
> > > > end up asking for capacities which cannot be provided by a single CPU.
> > > 
> > > Yes, but that _should_ not be a problem if you clock them all high
> > > enough. But this gets to be complicated real fast I think.
> > 
> > IMO the current solution with Juri's patches is working as expected:
> > we know how many DL tasks are runnable on a CPU and we properly
> > account for their utilization.
> > 
> > The only "issue/limitation" is (eventually) the case described above.
> > Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case
> > we will ask for 120% Utilization?
> 
> In general it depends on the other parameters, deadline and period.

Right, but what about the case dealdine==period, with 60% utilization?
AFAIU, 3 DL tasks with same parameters like above will be accepted on
a 2 CPU system, isn't it?

And thus, in that case, we can end up with a 120% utlization request
from DL for a single CPU... but, considering it's lunch o'clock,
I'm likely missing something...

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-22 12:50                   ` Patrick Bellasi
@ 2017-12-22 13:01                     ` Juri Lelli
  0 siblings, 0 replies; 80+ messages in thread
From: Juri Lelli @ 2017-12-22 13:01 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Todd Kjos, Joel Fernandes

On 22/12/17 12:50, Patrick Bellasi wrote:
> On 22-Dec 13:43, Juri Lelli wrote:
> > On 22/12/17 12:38, Patrick Bellasi wrote:
> > > On 22-Dec 13:19, Peter Zijlstra wrote:
> > > > On Fri, Dec 22, 2017 at 12:07:37PM +0000, Patrick Bellasi wrote:
> > > > > > I was thinking that since dl is a 'global' scheduler the reservation
> > > > > > would be too and thus the freq just needs a single CPU to be observed;
> > > > > 
> > > > > AFAIU global is only the admission control (which is something worth a
> > > > > thread by itself...) while the dl_se->dl_bw are aggregated into the
> > > > > dl_rq->running_bw, which ultimately represents the DL bandwidth
> > > > > required for just a CPU.
> > > > 
> > > > Oh urgh yes, forgot that.. then the dl freq stuff isn't strictly correct
> > > > I think. But yes, that's another thread.
> > > 
> > > Mmm... maybe I don't get your point... I was referring to the global
> > > admission control of DL. If you have for example 3 60% DL tasks on a
> > > 2CPU system, AFAIU the CBS will allow the tasks in the system (since
> > > the overall utilization is 180 < 200 * 0.95) although that workload is
> > > not necessarily schedule (for example if the tasks wakeups at the
> > > same time one of them will miss its deadline).
> > > 
> > > But, yeah... maybe I'm completely wrong or, in any case, it's for a
> > > different thread...
> > > 
> > > > > > but I suppose there's nothing stopping anybody from splitting a clock
> > > > > > domain down the middle scheduling wise. So yes, good point.
> > > > > 
> > > > > That makes sense... moreover, using the global utilization, we would
> > > > > end up asking for capacities which cannot be provided by a single CPU.
> > > > 
> > > > Yes, but that _should_ not be a problem if you clock them all high
> > > > enough. But this gets to be complicated real fast I think.
> > > 
> > > IMO the current solution with Juri's patches is working as expected:
> > > we know how many DL tasks are runnable on a CPU and we properly
> > > account for their utilization.
> > > 
> > > The only "issue/limitation" is (eventually) the case described above.
> > > Dunno if we can enqueue 2 60% DL tasks on the same CPU... in that case
> > > we will ask for 120% Utilization?
> > 
> > In general it depends on the other parameters, deadline and period.
> 
> Right, but what about the case dealdine==period, with 60% utilization?
> AFAIU, 3 DL tasks with same parameters like above will be accepted on
> a 2 CPU system, isn't it?
> 
> And thus, in that case, we can end up with a 120% utlization request
> from DL for a single CPU... but, considering it's lunch o'clock,
> I'm likely missing something...

Nope. CBS on SMP only gives you bounded tardiness (at least with the AC
kernel does). Some deadlines might be missed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:56   ` Peter Zijlstra
@ 2017-12-31  9:43     ` Claudio Scordino
  2018-01-02 13:31       ` Claudio Scordino
  0 siblings, 1 reply; 80+ messages in thread
From: Claudio Scordino @ 2017-12-31  9:43 UTC (permalink / raw)
  To: Peter Zijlstra, Patrick Bellasi
  Cc: linux-kernel, linux-pm, Ingo Molnar, Rafael J . Wysocki,
	Viresh Kumar, Vincent Guittot, Dietmar Eggemann,
	Morten Rasmussen, Juri Lelli, Todd Kjos, Joel Fernandes

Hi all,

Il 20/12/2017 16:56, Peter Zijlstra ha scritto:
> On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:
>>
>> So I ended up with the below (on top of Juri's cpufreq-dl patches).
>>
>> It compiles, but that's about all the testing it had.
> 
> Should all be available at:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
> 
> compile tested only, I suppose the 0day bot will let me know soon enough
> :-)

I'm going to do some testing from Tuesday.

Happy new year,

            Claudio

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-31  9:43     ` Claudio Scordino
@ 2018-01-02 13:31       ` Claudio Scordino
  0 siblings, 0 replies; 80+ messages in thread
From: Claudio Scordino @ 2018-01-02 13:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli, Todd Kjos,
	Joel Fernandes

Hi Peter,

Il 31/12/2017 10:43, Claudio Scordino ha scritto:
> Hi all,
> 
> Il 20/12/2017 16:56, Peter Zijlstra ha scritto:
>> On Wed, Dec 20, 2017 at 04:30:29PM +0100, Peter Zijlstra wrote:
>>>
>>> So I ended up with the below (on top of Juri's cpufreq-dl patches).
>>>
>>> It compiles, but that's about all the testing it had.
>>
>> Should all be available at:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core
>>
>> compile tested only, I suppose the 0day bot will let me know soon enough
>> :-)
> 
> I'm going to do some testing from Tuesday.

The rebase looks good to me.

Tested on Odroid XU4 (ARM big.LITTLE), no regressions found.

Many thanks and best regards,

           Claudio

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
                     ` (3 preceding siblings ...)
  2017-12-21  7:34   ` Viresh Kumar
@ 2018-02-06 10:55   ` Claudio Scordino
  2018-02-06 15:43     ` Patrick Bellasi
  4 siblings, 1 reply; 80+ messages in thread
From: Claudio Scordino @ 2018-02-06 10:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Patrick Bellasi, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli, Todd Kjos,
	Joel Fernandes

Hi Peter,

Il 20/12/2017 16:30, Peter Zijlstra ha scritto:
> 
> So I ended up with the below (on top of Juri's cpufreq-dl patches).
> 
> It compiles, but that's about all the testing it had.
> 
> --- a/include/linux/sched/cpufreq.h
> +++ b/include/linux/sched/cpufreq.h
> @@ -8,9 +8,7 @@
>    * Interface between cpufreq drivers and the scheduler:
>    */
>   
> -#define SCHED_CPUFREQ_RT	(1U << 0)
> -#define SCHED_CPUFREQ_DL	(1U << 1)
> -#define SCHED_CPUFREQ_IOWAIT	(1U << 2)
> +#define SCHED_CPUFREQ_IOWAIT	(1U << 0)
>   
>   #ifdef CONFIG_CPU_FREQ
>   struct update_util_data {
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -63,7 +63,6 @@ struct sugov_cpu {
>   	unsigned long util_cfs;
>   	unsigned long util_dl;
>   	unsigned long max;
> -	unsigned int flags;
>   
>   	/* The field below is for single-CPU policies only. */
>   #ifdef CONFIG_NO_HZ_COMMON
> @@ -188,17 +187,23 @@ static void sugov_get_util(struct sugov_
>   
>   static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>   {
> +	unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
> +	struct rq *rq = cpu_rq(sg_cpu->cpu);
> +
> +	if (rq->rt.rt_nr_running)
> +		util = sg_cpu->max;
> +
>   	/*
>   	 * Ideally we would like to set util_dl as min/guaranteed freq and
>   	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
>   	 * ready for such an interface. So, we only do the latter for now.
>   	 */
> -	return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
> +	return min(util, sg_cpu->max);
>   }
>   
> -static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time)
> +static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
>   {
> -	if (sg_cpu->flags & SCHED_CPUFREQ_IOWAIT) {
> +	if (flags & SCHED_CPUFREQ_IOWAIT) {
>   		if (sg_cpu->iowait_boost_pending)
>   			return;
>   
> @@ -267,12 +272,11 @@ static void sugov_update_single(struct u
>   {
>   	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
>   	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> -	struct cpufreq_policy *policy = sg_policy->policy;
>   	unsigned long util, max;
>   	unsigned int next_f;
>   	bool busy;
>   
> -	sugov_set_iowait_boost(sg_cpu, time);
> +	sugov_set_iowait_boost(sg_cpu, time, flags);
>   	sg_cpu->last_update = time;
>   
>   	if (!sugov_should_update_freq(sg_policy, time))
> @@ -280,25 +284,22 @@ static void sugov_update_single(struct u
>   
>   	busy = sugov_cpu_is_busy(sg_cpu);
>   
> -	if (flags & SCHED_CPUFREQ_RT) {
> -		next_f = policy->cpuinfo.max_freq;
> -	} else {
> -		sugov_get_util(sg_cpu);
> -		max = sg_cpu->max;
> -		util = sugov_aggregate_util(sg_cpu);
> -		sugov_iowait_boost(sg_cpu, &util, &max);
> -		next_f = get_next_freq(sg_policy, util, max);
> -		/*
> -		 * Do not reduce the frequency if the CPU has not been idle
> -		 * recently, as the reduction is likely to be premature then.
> -		 */
> -		if (busy && next_f < sg_policy->next_freq) {
> -			next_f = sg_policy->next_freq;
> +	sugov_get_util(sg_cpu);
> +	max = sg_cpu->max;
> +	util = sugov_aggregate_util(sg_cpu);
> +	sugov_iowait_boost(sg_cpu, &util, &max);
> +	next_f = get_next_freq(sg_policy, util, max);
> +	/*
> +	 * Do not reduce the frequency if the CPU has not been idle
> +	 * recently, as the reduction is likely to be premature then.
> +	 */
> +	if (busy && next_f < sg_policy->next_freq) {
> +		next_f = sg_policy->next_freq;
>   
> -			/* Reset cached freq as next_freq has changed */
> -			sg_policy->cached_raw_freq = 0;
> -		}
> +		/* Reset cached freq as next_freq has changed */
> +		sg_policy->cached_raw_freq = 0;
>   	}
> +
>   	sugov_update_commit(sg_policy, time, next_f);
>   }
>   
> @@ -314,6 +315,9 @@ static unsigned int sugov_next_freq_shar
>   		unsigned long j_util, j_max;
>   		s64 delta_ns;
>   
> +		if (j_sg_cpu != sg_cpu)
> +			sugov_get_util(j_sg_cpu);
> +
>   		/*
>   		 * If the CFS CPU utilization was last updated before the
>   		 * previous frequency update and the time elapsed between the
> @@ -327,12 +331,7 @@ static unsigned int sugov_next_freq_shar
>   		if (delta_ns > TICK_NSEC) {
>   			j_sg_cpu->iowait_boost = 0;
>   			j_sg_cpu->iowait_boost_pending = false;
> -			j_sg_cpu->util_cfs = 0;
> -			if (j_sg_cpu->util_dl == 0)
> -				continue;
>   		}
> -		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
> -			return policy->cpuinfo.max_freq;
>   
>   		j_max = j_sg_cpu->max;
>   		j_util = sugov_aggregate_util(j_sg_cpu);
> @@ -357,17 +356,11 @@ static void sugov_update_shared(struct u
>   	raw_spin_lock(&sg_policy->update_lock);
>   
>   	sugov_get_util(sg_cpu);
> -	sg_cpu->flags = flags;
> -
> -	sugov_set_iowait_boost(sg_cpu, time);
> +	sugov_set_iowait_boost(sg_cpu, time, flags);
>   	sg_cpu->last_update = time;
>   
>   	if (sugov_should_update_freq(sg_policy, time)) {
> -		if (flags & SCHED_CPUFREQ_RT)
> -			next_f = sg_policy->policy->cpuinfo.max_freq;
> -		else
> -			next_f = sugov_next_freq_shared(sg_cpu, time);
> -
> +		next_f = sugov_next_freq_shared(sg_cpu, time);
>   		sugov_update_commit(sg_policy, time, next_f);
>   	}
>   
> @@ -678,7 +671,6 @@ static int sugov_start(struct cpufreq_po
>   		memset(sg_cpu, 0, sizeof(*sg_cpu));
>   		sg_cpu->cpu = cpu;
>   		sg_cpu->sg_policy = sg_policy;
> -		sg_cpu->flags = 0;
>   		sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
>   	}
>   
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -87,7 +87,7 @@ void __add_running_bw(u64 dl_bw, struct
>   	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
>   	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
>   	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> -	cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
> +	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
>   }
>   
>   static inline
> @@ -101,7 +101,7 @@ void __sub_running_bw(u64 dl_bw, struct
>   	if (dl_rq->running_bw > old)
>   		dl_rq->running_bw = 0;
>   	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
> -	cpufreq_update_util(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
> +	cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
>   }
>   
>   static inline
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -959,9 +959,6 @@ static void update_curr_rt(struct rq *rq
>   	if (unlikely((s64)delta_exec <= 0))
>   		return;
>   
> -	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> -	cpufreq_update_util(rq, SCHED_CPUFREQ_RT);
> -
>   	schedstat_set(curr->se.statistics.exec_max,
>   		      max(curr->se.statistics.exec_max, delta_exec));
>   
> @@ -1003,6 +1000,9 @@ dequeue_top_rt_rq(struct rt_rq *rt_rq)
>   
>   	sub_nr_running(rq, rt_rq->rt_nr_running);
>   	rt_rq->rt_queued = 0;
> +
> +	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> +	cpufreq_update_util(rq, 0);
>   }
>   
>   static void
> @@ -1019,6 +1019,9 @@ enqueue_top_rt_rq(struct rt_rq *rt_rq)
>   
>   	add_nr_running(rq, rt_rq->rt_nr_running);
>   	rt_rq->rt_queued = 1;
> +
> +	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
> +	cpufreq_update_util(rq, 0);
>   }
>   
>   #if defined CONFIG_SMP
> 

What is the status of this patch ? I couldn't find it on the tip/queue repositories.

BTW, I wonder if we actually want to remove also the information about the scheduling class who triggered the frequency change.
This prevents us from adopting class-specific behaviors.
For example, we might want to skip the rate limits when deadline asks for an increase of frequency, as shown in the patch below.
In this case, we could just remove the flags from sugov_cpu, but leave the defines and the argument for sugov_update_*()

Best regards,

                 Claudio



 From ed13fa5a8f93a43f8ff8f7d354b18c0031df482c Mon Sep 17 00:00:00 2001
From: Claudio Scordino <claudio@evidence.eu.com>
Date: Wed, 27 Sep 2017 17:16:36 +0200
Subject: [PATCH RFC] cpufreq: schedutil: rate limits for SCHED_DEADLINE

When the SCHED_DEADLINE scheduling class asks to increase CPU frequency,
we should not wait the rate limit, otherwise we may miss some deadline.
The patch just ignores the limit whenever SCHED_DEADLINE asks for a
higher CPU frequency.

Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
---
  kernel/sched/cpufreq_schedutil.c | 24 +++++++++++++-----------
  1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index dd062a1..5027ab1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -75,7 +75,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
  
  /************************ Governor internals ***********************/
  
-static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
+static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,
+				     unsigned int next_freq, unsigned int flags)
  {
  	s64 delta_ns;
  
@@ -112,6 +113,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
  		return true;
  	}
  
+	/* Ignore rate limit if DL asked to increase CPU frequency */
+	if ((flags & SCHED_CPUFREQ_DL) && (next_freq > sg_policy->next_freq))
+		return true;
+
  	delta_ns = time - sg_policy->last_freq_update_time;
  	return delta_ns >= sg_policy->freq_update_delay_ns;
  }
@@ -275,9 +280,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
  	sugov_set_iowait_boost(sg_cpu, time);
  	sg_cpu->last_update = time;
  
-	if (!sugov_should_update_freq(sg_policy, time))
-		return;
-
  	busy = sugov_cpu_is_busy(sg_cpu);
  
  	if (flags & SCHED_CPUFREQ_RT) {
@@ -299,7 +301,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
  			sg_policy->cached_raw_freq = 0;
  		}
  	}
-	sugov_update_commit(sg_policy, time, next_f);
+	if (sugov_should_update_freq(sg_policy, time, next_f, flags))
+		sugov_update_commit(sg_policy, time, next_f);
  }
  
  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
@@ -362,14 +365,13 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
  	sugov_set_iowait_boost(sg_cpu, time);
  	sg_cpu->last_update = time;
  
-	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT)
-			next_f = sg_policy->policy->cpuinfo.max_freq;
-		else
-			next_f = sugov_next_freq_shared(sg_cpu, time);
+	if (flags & SCHED_CPUFREQ_RT)
+		next_f = sg_policy->policy->cpuinfo.max_freq;
+	else
+		next_f = sugov_next_freq_shared(sg_cpu, time);
  
+	if (sugov_should_update_freq(sg_policy, time, next_f, flags))
  		sugov_update_commit(sg_policy, time, next_f);
-	}
  
  	raw_spin_unlock(&sg_policy->update_lock);
  }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2018-02-06 10:55   ` Claudio Scordino
@ 2018-02-06 15:43     ` Patrick Bellasi
       [not found]       ` <68afaf84-893e-6770-b9f1-619cd2b6dc9c@evidence.eu.com>
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2018-02-06 15:43 UTC (permalink / raw)
  To: Claudio Scordino
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli, Todd Kjos,
	Joel Fernandes

Hi Claudio,

On 06-Feb 11:55, Claudio Scordino wrote:
> Hi Peter,
> 
> Il 20/12/2017 16:30, Peter Zijlstra ha scritto:
> >
> >So I ended up with the below (on top of Juri's cpufreq-dl patches).
> >
> >It compiles, but that's about all the testing it had.
> >
> >--- a/include/linux/sched/cpufreq.h
> >+++ b/include/linux/sched/cpufreq.h

[..]

> >@@ -188,17 +187,23 @@ static void sugov_get_util(struct sugov_
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> >+	unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
> >+	struct rq *rq = cpu_rq(sg_cpu->cpu);
> >+
> >+	if (rq->rt.rt_nr_running)
> >+		util = sg_cpu->max;
> >+
> >  	/*
> >  	 * Ideally we would like to set util_dl as min/guaranteed freq and
> >  	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  	 * ready for such an interface. So, we only do the latter for now.
> >  	 */
> >-	return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
> >+	return min(util, sg_cpu->max);
> >  }

[...]

> 
> What is the status of this patch ? I couldn't find it on the
> tip/queue repositories.
> 
> BTW, I wonder if we actually want to remove also the information
> about the scheduling class who triggered the frequency change.

Removing flags was the main goal of the patch, since they represents
mainly duplicated information which scheduling classes already know.

This was making flags update error prone and difficult to keep
aligned with existing scheduling classes info.

> This prevents us from adopting class-specific behaviors.

In Peter's proposal he replaces flags with checks like:

   if (rq->rt.rt_nr_running)

> For example, we might want to skip the rate limits when deadline
> asks for an increase of frequency, as shown in the patch below.
> In this case, we could just remove the flags from sugov_cpu, but
> leave the defines and the argument for sugov_update_*()

At first glance, your proposal below makes to make sense.

However, I'm wondering if we cannot get it working using
rq->dl's provided information instead of flags?

> Best regards,
> 
>                 Claudio
>
> 
> From ed13fa5a8f93a43f8ff8f7d354b18c0031df482c Mon Sep 17 00:00:00 2001
> From: Claudio Scordino <claudio@evidence.eu.com>
> Date: Wed, 27 Sep 2017 17:16:36 +0200
> Subject: [PATCH RFC] cpufreq: schedutil: rate limits for SCHED_DEADLINE
> 
> When the SCHED_DEADLINE scheduling class asks to increase CPU frequency,
> we should not wait the rate limit, otherwise we may miss some deadline.
> The patch just ignores the limit whenever SCHED_DEADLINE asks for a
> higher CPU frequency.
> 
> Signed-off-by: Claudio Scordino <claudio@evidence.eu.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 24 +++++++++++++-----------
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index dd062a1..5027ab1 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -75,7 +75,8 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
>  /************************ Governor internals ***********************/
> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,
> +				     unsigned int next_freq, unsigned int flags)
>  {
>  	s64 delta_ns;
> @@ -112,6 +113,10 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>  		return true;
>  	}
> +	/* Ignore rate limit if DL asked to increase CPU frequency */
> +	if ((flags & SCHED_CPUFREQ_DL) && (next_freq > sg_policy->next_freq))
> +		return true;


static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
{
        unsigned long util = sg_cpu->util_cfs + sg_cpu->util_dl;
        struct rq *rq = cpu_rq(sg_cpu->cpu);

        if (rq->dl.dl_nr_running)


> +
>  	delta_ns = time - sg_policy->last_freq_update_time;
>  	return delta_ns >= sg_policy->freq_update_delay_ns;
>  }
> @@ -275,9 +280,6 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  	sugov_set_iowait_boost(sg_cpu, time);
>  	sg_cpu->last_update = time;
> -	if (!sugov_should_update_freq(sg_policy, time))
> -		return;
> -
>  	busy = sugov_cpu_is_busy(sg_cpu);
>  	if (flags & SCHED_CPUFREQ_RT) {
> @@ -299,7 +301,8 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>  			sg_policy->cached_raw_freq = 0;
>  		}
>  	}
> -	sugov_update_commit(sg_policy, time, next_f);
> +	if (sugov_should_update_freq(sg_policy, time, next_f, flags))
> +		sugov_update_commit(sg_policy, time, next_f);
>  }
>  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> @@ -362,14 +365,13 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>  	sugov_set_iowait_boost(sg_cpu, time);
>  	sg_cpu->last_update = time;
> -	if (sugov_should_update_freq(sg_policy, time)) {
> -		if (flags & SCHED_CPUFREQ_RT)
> -			next_f = sg_policy->policy->cpuinfo.max_freq;
> -		else
> -			next_f = sugov_next_freq_shared(sg_cpu, time);
> +	if (flags & SCHED_CPUFREQ_RT)
> +		next_f = sg_policy->policy->cpuinfo.max_freq;
> +	else
> +		next_f = sugov_next_freq_shared(sg_cpu, time);
> +	if (sugov_should_update_freq(sg_policy, time, next_f, flags))
>  		sugov_update_commit(sg_policy, time, next_f);
> -	}
>  	raw_spin_unlock(&sg_policy->update_lock);
>  }
> -- 
> 2.7.4
> 
> 

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
       [not found]       ` <68afaf84-893e-6770-b9f1-619cd2b6dc9c@evidence.eu.com>
@ 2018-02-06 18:36         ` Patrick Bellasi
  2018-02-08 16:14           ` Claudio Scordino
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick Bellasi @ 2018-02-06 18:36 UTC (permalink / raw)
  To: Claudio Scordino
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli, Todd Kjos,
	Joel Fernandes

On 06-Feb 19:14, Claudio Scordino wrote:
> Hi Patrick,

> >At first glance, your proposal below makes to make sense.
> >
> >However, I'm wondering if we cannot get it working using
> >rq->dl's provided information instead of flags?
> 
> Yes, we can use the value of rq->dl to check if there has been an increase of the deadline utilization.
> Even if schedutil might have been triggered by a different scheduling class, the effect should be almost the same.
> 
> Below a potential patch. I've kept all frequency update decisions in a single point (i.e. sugov_should_update_freq).
> Not yet tested (waiting for further comments).

I have a todo list entry to backport/test Peter's series on Android...
will add this patch too. Thanks.

-- 
#include <best/regards.h>

Patrick Bellasi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates
  2018-02-06 18:36         ` Patrick Bellasi
@ 2018-02-08 16:14           ` Claudio Scordino
  0 siblings, 0 replies; 80+ messages in thread
From: Claudio Scordino @ 2018-02-08 16:14 UTC (permalink / raw)
  To: Patrick Bellasi
  Cc: Peter Zijlstra, linux-kernel, linux-pm, Ingo Molnar,
	Rafael J . Wysocki, Viresh Kumar, Vincent Guittot,
	Dietmar Eggemann, Morten Rasmussen, Juri Lelli, Todd Kjos,
	Joel Fernandes

Hi Patrick,

Il 06/02/2018 19:36, Patrick Bellasi ha scritto:
> On 06-Feb 19:14, Claudio Scordino wrote:
>> Hi Patrick,
> 
>>> At first glance, your proposal below makes to make sense.
>>>
>>> However, I'm wondering if we cannot get it working using
>>> rq->dl's provided information instead of flags?
>>
>> Yes, we can use the value of rq->dl to check if there has been an increase of the deadline utilization.
>> Even if schedutil might have been triggered by a different scheduling class, the effect should be almost the same.
>>
>> Below a potential patch. I've kept all frequency update decisions in a single point (i.e. sugov_should_update_freq).
>> Not yet tested (waiting for further comments).
> 
> I have a todo list entry to backport/test Peter's series on Android...
> will add this patch too. Thanks.

Please discard my latest patch, as the tests on Odroid have shown performance regressions
(likely due to sugov_next_freq_shared being called more often)

Never mind. I have already tested another (even simpler) patch.
Sending soon as a new thread on LKML.

Many thanks and best regards,

                 Claudio

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2018-02-08 16:15 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 11:47 [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Patrick Bellasi
2017-11-30 11:47 ` [PATCH v3 1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter Patrick Bellasi
2017-11-30 13:12   ` Juri Lelli
2017-11-30 15:41     ` Patrick Bellasi
2017-11-30 16:02       ` Juri Lelli
2017-11-30 16:19         ` Patrick Bellasi
2017-11-30 16:45           ` Juri Lelli
2017-12-07  5:01   ` Viresh Kumar
2017-12-07 12:45     ` Patrick Bellasi
2017-12-07 15:55       ` Dietmar Eggemann
2017-12-12 11:37       ` Viresh Kumar
2017-12-12 13:38         ` Juri Lelli
2017-12-12 14:40           ` Viresh Kumar
2017-12-12 14:56             ` Juri Lelli
2017-12-12 15:18               ` Patrick Bellasi
2017-12-12 15:16         ` Patrick Bellasi
2017-12-13  9:06           ` Viresh Kumar
2017-12-20 14:33   ` Peter Zijlstra
2017-12-20 14:51     ` Patrick Bellasi
2017-11-30 11:47 ` [PATCH v3 2/6] cpufreq: schedutil: ensure max frequency while running RT/DL tasks Patrick Bellasi
2017-11-30 13:17   ` Juri Lelli
2017-11-30 15:45     ` Patrick Bellasi
2017-11-30 16:03       ` Juri Lelli
2017-12-07  5:05   ` Viresh Kumar
2017-12-07 14:18     ` Patrick Bellasi
2017-11-30 11:47 ` [PATCH v3 3/6] cpufreq: schedutil: update CFS util only if used Patrick Bellasi
2017-11-30 13:22   ` Juri Lelli
2017-11-30 15:57     ` Patrick Bellasi
2017-12-07  5:15       ` Viresh Kumar
2017-12-07 14:19         ` Patrick Bellasi
2017-12-14  4:45           ` Viresh Kumar
2017-11-30 11:47 ` [PATCH v3 4/6] sched/rt: fast switch to maximum frequency when RT tasks are scheduled Patrick Bellasi
2017-11-30 13:28   ` Juri Lelli
2017-12-06  9:39   ` Vincent Guittot
2017-12-06 11:38     ` Patrick Bellasi
2017-12-06 12:36       ` Vincent Guittot
2017-11-30 11:47 ` [PATCH v3 5/6] cpufreq: schedutil: relax rate-limiting while running RT/DL tasks Patrick Bellasi
2017-11-30 13:36   ` Juri Lelli
2017-11-30 15:54     ` Patrick Bellasi
2017-11-30 16:06       ` Juri Lelli
2017-11-30 11:47 ` [PATCH v3 6/6] cpufreq: schedutil: ignore sugov kthreads Patrick Bellasi
2017-11-30 13:41   ` Juri Lelli
2017-11-30 16:02     ` Patrick Bellasi
2017-11-30 16:12       ` Juri Lelli
2017-11-30 16:42         ` Patrick Bellasi
2017-12-07  9:24           ` Viresh Kumar
2017-12-07 15:47             ` Patrick Bellasi
2017-12-20 15:30 ` [PATCH v3 0/6] cpufreq: schedutil: fixes for flags updates Peter Zijlstra
2017-12-20 15:43   ` Peter Zijlstra
2017-12-21  9:15     ` Viresh Kumar
2017-12-21 10:25       ` Peter Zijlstra
2017-12-21 10:30         ` Viresh Kumar
2017-12-21 10:39           ` Peter Zijlstra
2017-12-21 10:43             ` Viresh Kumar
2017-12-22  8:30               ` Peter Zijlstra
2017-12-20 15:56   ` Peter Zijlstra
2017-12-31  9:43     ` Claudio Scordino
2018-01-02 13:31       ` Claudio Scordino
2017-12-20 17:38   ` Juri Lelli
2017-12-20 18:16     ` Peter Zijlstra
2017-12-22 10:06     ` Peter Zijlstra
2017-12-22 11:02       ` Patrick Bellasi
2017-12-22 11:46         ` Peter Zijlstra
2017-12-22 12:07           ` Juri Lelli
2017-12-22 12:14             ` Patrick Bellasi
2017-12-22 12:22               ` Peter Zijlstra
2017-12-22 12:07           ` Patrick Bellasi
2017-12-22 12:19             ` Peter Zijlstra
2017-12-22 12:27               ` Juri Lelli
2017-12-22 12:38               ` Patrick Bellasi
2017-12-22 12:43                 ` Juri Lelli
2017-12-22 12:50                   ` Patrick Bellasi
2017-12-22 13:01                     ` Juri Lelli
2017-12-22 12:10           ` Peter Zijlstra
2017-12-22 12:25             ` Patrick Bellasi
2017-12-21  7:34   ` Viresh Kumar
2018-02-06 10:55   ` Claudio Scordino
2018-02-06 15:43     ` Patrick Bellasi
     [not found]       ` <68afaf84-893e-6770-b9f1-619cd2b6dc9c@evidence.eu.com>
2018-02-06 18:36         ` Patrick Bellasi
2018-02-08 16:14           ` Claudio Scordino

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).