linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/11] track CPU utilization
@ 2018-06-08 12:09 Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 01/11] sched/pelt: Move pelt related code in a dedicated file Vincent Guittot
                   ` (10 more replies)
  0 siblings, 11 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot

This patchset initially tracked only the utilization of RT rq. During
OSPM summit, it has been discussed the opportunity to extend it in order
to get an estimate of the utilization of the CPU.

- Patches 1,2 move pelt code in a dedicated file and remove some blank lines
  
- Patches 3-4 add utilization tracking for rt_rq.

When both cfs and rt tasks compete to run on a CPU, we can see some frequency
drops with schedutil governor. In such case, the cfs_rq's utilization doesn't
reflect anymore the utilization of cfs tasks but only the remaining part that
is not used by rt tasks. We should monitor the stolen utilization and take
it into account when selecting OPP. This patchset doesn't change the OPP
selection policy for RT tasks but only for CFS tasks

A rt-app use case which creates an always running cfs thread and a rt threads
that wakes up periodically with both threads pinned on same CPU, show lot of 
frequency switches of the CPU whereas the CPU never goes idles during the 
test. I can share the json file that I used for the test if someone is
interested in.

For a 15 seconds long test on a hikey 6220 (octo core cortex A53 platfrom),
the cpufreq statistics outputs (stats are reset just before the test) : 
$ cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans
without patchset : 1230
with patchset : 14

If we replace the cfs thread of rt-app by a sysbench cpu test, we can see
performance improvements:

- Without patchset :
Test execution summary:
    total time:                          15.0009s
    total number of events:              4903
    total time taken by event execution: 14.9972
    per-request statistics:
         min:                                  1.23ms
         avg:                                  3.06ms
         max:                                 13.16ms
         approx.  95 percentile:              12.73ms

Threads fairness:
    events (avg/stddev):           4903.0000/0.00
    execution time (avg/stddev):   14.9972/0.00

- With patchset:
Test execution summary:
    total time:                          15.0014s
    total number of events:              7694
    total time taken by event execution: 14.9979
    per-request statistics:
         min:                                  1.23ms
         avg:                                  1.95ms
         max:                                 10.49ms
         approx.  95 percentile:              10.39ms

Threads fairness:
    events (avg/stddev):           7694.0000/0.00
    execution time (avg/stddev):   14.9979/0.00

The performance improvement is 56% for this use case.

- Patches 5-6 add utilization tracking for dl_rq in order to solve similar
  problem as with rt_rq. Nevertheless, we keep using dl bandwidth as default
  level of requirement for dl tasks. The dl utilization is used to check that
  the CPU is not overloaded which is not always reflected when using dl
  bandwidth

- Patches 7-8 add utilization tracking for interrupt and use it select OPP
  A test with iperf on hikey 6220 gives: 
    w/o patchset	    w/ patchset
    Tx 276 Mbits/sec        304 Mbits/sec +10%
    Rx 299 Mbits/sec        328 Mbits/sec +09%
    
    8 iterations of iperf -c server_address -r -t 5
    stdev is lower than 1%
    Only WFI idle state is enable (shallowest arm idle state)

- Patches 9 uses rt, dl and interrupt utilization in the scale_rt_capacity()
  and remove  the use of sched_rt_avg_update.

- Patches 10 removes the unused sched_avg_update code

- Patch 11 removes the unused sched_time_avg_ms

Change since v4:
- add support of periodic update of blocked utilization
- rebase on lastest tip/sched/core

Change since v3:
- add support of periodic update of blocked utilization
- rebase on lastest tip/sched/core

Change since v2:
- move pelt code into a dedicated pelt.c file
- rebase on load tracking changes

Change since v1:
- Only a rebase. I have addressed the comments on previous version in
  patch 1/2


Vincent Guittot (11):
  sched/pelt: Move pelt related code in a dedicated file
  sched/pelt: remove blank line
  sched/rt: add rt_rq utilization tracking
  cpufreq/schedutil: use rt utilization tracking
  sched/dl: add dl_rq utilization tracking
  cpufreq/schedutil: use dl utilization tracking
  sched/irq: add irq utilization tracking
  cpufreq/schedutil: take into account interrupt
  sched: use pelt for scale_rt_capacity()
  sched: remove rt_avg code
  proc/sched: remove unused sched_time_avg_ms

 include/linux/sched/sysctl.h     |   1 -
 kernel/sched/Makefile            |   2 +-
 kernel/sched/core.c              |  38 +---
 kernel/sched/cpufreq_schedutil.c |  46 ++++-
 kernel/sched/deadline.c          |   8 +-
 kernel/sched/fair.c              | 403 +++++----------------------------------
 kernel/sched/pelt.c              | 393 ++++++++++++++++++++++++++++++++++++++
 kernel/sched/pelt.h              |  72 +++++++
 kernel/sched/rt.c                |  15 +-
 kernel/sched/sched.h             |  68 +++++--
 kernel/sysctl.c                  |   8 -
 11 files changed, 621 insertions(+), 433 deletions(-)
 create mode 100644 kernel/sched/pelt.c
 create mode 100644 kernel/sched/pelt.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v6 01/11] sched/pelt: Move pelt related code in a dedicated file
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 02/11] sched/pelt: remove blank line Vincent Guittot
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

We want to track rt_rq's utilization as a part of the estimation of the
whole rq's utilization. This is necessary because rt tasks can steal
utilization to cfs tasks and make them lighter than they are.
As we want to use the same load tracking mecanism for both and prevent
useless dependency between cfs and rt code, pelt code is moved in a
dedicated file.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/Makefile |   2 +-
 kernel/sched/fair.c   | 333 +-------------------------------------------------
 kernel/sched/pelt.c   | 311 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/pelt.h   |  43 +++++++
 kernel/sched/sched.h  |  19 +++
 5 files changed, 375 insertions(+), 333 deletions(-)
 create mode 100644 kernel/sched/pelt.c
 create mode 100644 kernel/sched/pelt.h

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index d9a02b3..7fe1834 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,7 +20,7 @@ obj-y += core.o loadavg.o clock.o cputime.o
 obj-y += idle.o fair.o rt.o deadline.o
 obj-y += wait.o wait_bit.o swait.o completion.o
 
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e497c05..6390c66 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -255,9 +255,6 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
 	return cfs_rq->rq;
 }
 
-/* An entity is a task if it doesn't "own" a runqueue */
-#define entity_is_task(se)	(!se->my_q)
-
 static inline struct task_struct *task_of(struct sched_entity *se)
 {
 	SCHED_WARN_ON(!entity_is_task(se));
@@ -419,7 +416,6 @@ static inline struct rq *rq_of(struct cfs_rq *cfs_rq)
 	return container_of(cfs_rq, struct rq, cfs);
 }
 
-#define entity_is_task(se)	1
 
 #define for_each_sched_entity(se) \
 		for (; se; se = NULL)
@@ -692,7 +688,7 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 }
 
 #ifdef CONFIG_SMP
-
+#include "pelt.h"
 #include "sched-pelt.h"
 
 static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
@@ -2749,19 +2745,6 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
 } while (0)
 
 #ifdef CONFIG_SMP
-/*
- * XXX we want to get rid of these helpers and use the full load resolution.
- */
-static inline long se_weight(struct sched_entity *se)
-{
-	return scale_load_down(se->load.weight);
-}
-
-static inline long se_runnable(struct sched_entity *se)
-{
-	return scale_load_down(se->runnable_weight);
-}
-
 static inline void
 enqueue_runnable_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
@@ -3062,314 +3045,6 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq, int flags)
 }
 
 #ifdef CONFIG_SMP
-/*
- * Approximate:
- *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
- */
-static u64 decay_load(u64 val, u64 n)
-{
-	unsigned int local_n;
-
-	if (unlikely(n > LOAD_AVG_PERIOD * 63))
-		return 0;
-
-	/* after bounds checking we can collapse to 32-bit */
-	local_n = n;
-
-	/*
-	 * As y^PERIOD = 1/2, we can combine
-	 *    y^n = 1/2^(n/PERIOD) * y^(n%PERIOD)
-	 * With a look-up table which covers y^n (n<PERIOD)
-	 *
-	 * To achieve constant time decay_load.
-	 */
-	if (unlikely(local_n >= LOAD_AVG_PERIOD)) {
-		val >>= local_n / LOAD_AVG_PERIOD;
-		local_n %= LOAD_AVG_PERIOD;
-	}
-
-	val = mul_u64_u32_shr(val, runnable_avg_yN_inv[local_n], 32);
-	return val;
-}
-
-static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
-{
-	u32 c1, c2, c3 = d3; /* y^0 == 1 */
-
-	/*
-	 * c1 = d1 y^p
-	 */
-	c1 = decay_load((u64)d1, periods);
-
-	/*
-	 *            p-1
-	 * c2 = 1024 \Sum y^n
-	 *            n=1
-	 *
-	 *              inf        inf
-	 *    = 1024 ( \Sum y^n - \Sum y^n - y^0 )
-	 *              n=0        n=p
-	 */
-	c2 = LOAD_AVG_MAX - decay_load(LOAD_AVG_MAX, periods) - 1024;
-
-	return c1 + c2 + c3;
-}
-
-/*
- * Accumulate the three separate parts of the sum; d1 the remainder
- * of the last (incomplete) period, d2 the span of full periods and d3
- * the remainder of the (incomplete) current period.
- *
- *           d1          d2           d3
- *           ^           ^            ^
- *           |           |            |
- *         |<->|<----------------->|<--->|
- * ... |---x---|------| ... |------|-----x (now)
- *
- *                           p-1
- * u' = (u + d1) y^p + 1024 \Sum y^n + d3 y^0
- *                           n=1
- *
- *    = u y^p +					(Step 1)
- *
- *                     p-1
- *      d1 y^p + 1024 \Sum y^n + d3 y^0		(Step 2)
- *                     n=1
- */
-static __always_inline u32
-accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
-	       unsigned long load, unsigned long runnable, int running)
-{
-	unsigned long scale_freq, scale_cpu;
-	u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
-	u64 periods;
-
-	scale_freq = arch_scale_freq_capacity(cpu);
-	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
-
-	delta += sa->period_contrib;
-	periods = delta / 1024; /* A period is 1024us (~1ms) */
-
-	/*
-	 * Step 1: decay old *_sum if we crossed period boundaries.
-	 */
-	if (periods) {
-		sa->load_sum = decay_load(sa->load_sum, periods);
-		sa->runnable_load_sum =
-			decay_load(sa->runnable_load_sum, periods);
-		sa->util_sum = decay_load((u64)(sa->util_sum), periods);
-
-		/*
-		 * Step 2
-		 */
-		delta %= 1024;
-		contrib = __accumulate_pelt_segments(periods,
-				1024 - sa->period_contrib, delta);
-	}
-	sa->period_contrib = delta;
-
-	contrib = cap_scale(contrib, scale_freq);
-	if (load)
-		sa->load_sum += load * contrib;
-	if (runnable)
-		sa->runnable_load_sum += runnable * contrib;
-	if (running)
-		sa->util_sum += contrib * scale_cpu;
-
-	return periods;
-}
-
-/*
- * We can represent the historical contribution to runnable average as the
- * coefficients of a geometric series.  To do this we sub-divide our runnable
- * history into segments of approximately 1ms (1024us); label the segment that
- * occurred N-ms ago p_N, with p_0 corresponding to the current period, e.g.
- *
- * [<- 1024us ->|<- 1024us ->|<- 1024us ->| ...
- *      p0            p1           p2
- *     (now)       (~1ms ago)  (~2ms ago)
- *
- * Let u_i denote the fraction of p_i that the entity was runnable.
- *
- * We then designate the fractions u_i as our co-efficients, yielding the
- * following representation of historical load:
- *   u_0 + u_1*y + u_2*y^2 + u_3*y^3 + ...
- *
- * We choose y based on the with of a reasonably scheduling period, fixing:
- *   y^32 = 0.5
- *
- * This means that the contribution to load ~32ms ago (u_32) will be weighted
- * approximately half as much as the contribution to load within the last ms
- * (u_0).
- *
- * When a period "rolls over" and we have new u_0`, multiplying the previous
- * sum again by y is sufficient to update:
- *   load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
- *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
- */
-static __always_inline int
-___update_load_sum(u64 now, int cpu, struct sched_avg *sa,
-		  unsigned long load, unsigned long runnable, int running)
-{
-	u64 delta;
-
-	delta = now - sa->last_update_time;
-	/*
-	 * This should only happen when time goes backwards, which it
-	 * unfortunately does during sched clock init when we swap over to TSC.
-	 */
-	if ((s64)delta < 0) {
-		sa->last_update_time = now;
-		return 0;
-	}
-
-	/*
-	 * Use 1024ns as the unit of measurement since it's a reasonable
-	 * approximation of 1us and fast to compute.
-	 */
-	delta >>= 10;
-	if (!delta)
-		return 0;
-
-	sa->last_update_time += delta << 10;
-
-	/*
-	 * running is a subset of runnable (weight) so running can't be set if
-	 * runnable is clear. But there are some corner cases where the current
-	 * se has been already dequeued but cfs_rq->curr still points to it.
-	 * This means that weight will be 0 but not running for a sched_entity
-	 * but also for a cfs_rq if the latter becomes idle. As an example,
-	 * this happens during idle_balance() which calls
-	 * update_blocked_averages()
-	 */
-	if (!load)
-		runnable = running = 0;
-
-	/*
-	 * Now we know we crossed measurement unit boundaries. The *_avg
-	 * accrues by two steps:
-	 *
-	 * Step 1: accumulate *_sum since last_update_time. If we haven't
-	 * crossed period boundaries, finish.
-	 */
-	if (!accumulate_sum(delta, cpu, sa, load, runnable, running))
-		return 0;
-
-	return 1;
-}
-
-static __always_inline void
-___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runnable)
-{
-	u32 divider = LOAD_AVG_MAX - 1024 + sa->period_contrib;
-
-	/*
-	 * Step 2: update *_avg.
-	 */
-	sa->load_avg = div_u64(load * sa->load_sum, divider);
-	sa->runnable_load_avg =	div_u64(runnable * sa->runnable_load_sum, divider);
-	sa->util_avg = sa->util_sum / divider;
-}
-
-/*
- * When a task is dequeued, its estimated utilization should not be update if
- * its util_avg has not been updated at least once.
- * This flag is used to synchronize util_avg updates with util_est updates.
- * We map this information into the LSB bit of the utilization saved at
- * dequeue time (i.e. util_est.dequeued).
- */
-#define UTIL_AVG_UNCHANGED 0x1
-
-static inline void cfs_se_util_change(struct sched_avg *avg)
-{
-	unsigned int enqueued;
-
-	if (!sched_feat(UTIL_EST))
-		return;
-
-	/* Avoid store if the flag has been already set */
-	enqueued = avg->util_est.enqueued;
-	if (!(enqueued & UTIL_AVG_UNCHANGED))
-		return;
-
-	/* Reset flag to report util_avg has been updated */
-	enqueued &= ~UTIL_AVG_UNCHANGED;
-	WRITE_ONCE(avg->util_est.enqueued, enqueued);
-}
-
-/*
- * sched_entity:
- *
- *   task:
- *     se_runnable() == se_weight()
- *
- *   group: [ see update_cfs_group() ]
- *     se_weight()   = tg->weight * grq->load_avg / tg->load_avg
- *     se_runnable() = se_weight(se) * grq->runnable_load_avg / grq->load_avg
- *
- *   load_sum := runnable_sum
- *   load_avg = se_weight(se) * runnable_avg
- *
- *   runnable_load_sum := runnable_sum
- *   runnable_load_avg = se_runnable(se) * runnable_avg
- *
- * XXX collapse load_sum and runnable_load_sum
- *
- * cfq_rs:
- *
- *   load_sum = \Sum se_weight(se) * se->avg.load_sum
- *   load_avg = \Sum se->avg.load_avg
- *
- *   runnable_load_sum = \Sum se_runnable(se) * se->avg.runnable_load_sum
- *   runnable_load_avg = \Sum se->avg.runable_load_avg
- */
-
-static int
-__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
-{
-	if (entity_is_task(se))
-		se->runnable_weight = se->load.weight;
-
-	if (___update_load_sum(now, cpu, &se->avg, 0, 0, 0)) {
-		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
-		return 1;
-	}
-
-	return 0;
-}
-
-static int
-__update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
-{
-	if (entity_is_task(se))
-		se->runnable_weight = se->load.weight;
-
-	if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
-				cfs_rq->curr == se)) {
-
-		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
-		cfs_se_util_change(&se->avg);
-		return 1;
-	}
-
-	return 0;
-}
-
-static int
-__update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
-{
-	if (___update_load_sum(now, cpu, &cfs_rq->avg,
-				scale_load_down(cfs_rq->load.weight),
-				scale_load_down(cfs_rq->runnable_weight),
-				cfs_rq->curr != NULL)) {
-
-		___update_load_avg(&cfs_rq->avg, 1, 1);
-		return 1;
-	}
-
-	return 0;
-}
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
 /**
  * update_tg_load_avg - update the tg's load avg
@@ -4045,12 +3720,6 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 
 #else /* CONFIG_SMP */
 
-static inline int
-update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
-{
-	return 0;
-}
-
 #define UPDATE_TG	0x0
 #define SKIP_AGE_LOAD	0x0
 #define DO_ATTACH	0x0
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
new file mode 100644
index 0000000..e6ecbb2
--- /dev/null
+++ b/kernel/sched/pelt.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Per Entity Load Tracking
+ *
+ *  Copyright (C) 2007 Red Hat, Inc., Ingo Molnar <mingo@redhat.com>
+ *
+ *  Interactivity improvements by Mike Galbraith
+ *  (C) 2007 Mike Galbraith <efault@gmx.de>
+ *
+ *  Various enhancements by Dmitry Adamushko.
+ *  (C) 2007 Dmitry Adamushko <dmitry.adamushko@gmail.com>
+ *
+ *  Group scheduling enhancements by Srivatsa Vaddagiri
+ *  Copyright IBM Corporation, 2007
+ *  Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
+ *
+ *  Scaled math optimizations by Thomas Gleixner
+ *  Copyright (C) 2007, Thomas Gleixner <tglx@linutronix.de>
+ *
+ *  Adaptive scheduling granularity, math enhancements by Peter Zijlstra
+ *  Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
+ *
+ *  Move PELT related code from fair.c into this pelt.c file
+ *  Author: Vincent Guittot <vincent.guittot@linaro.org>
+ */
+
+#include <linux/sched.h>
+#include "sched.h"
+#include "sched-pelt.h"
+#include "pelt.h"
+
+/*
+ * Approximate:
+ *   val * y^n,    where y^32 ~= 0.5 (~1 scheduling period)
+ */
+static u64 decay_load(u64 val, u64 n)
+{
+	unsigned int local_n;
+
+	if (unlikely(n > LOAD_AVG_PERIOD * 63))
+		return 0;
+
+	/* after bounds checking we can collapse to 32-bit */
+	local_n = n;
+
+	/*
+	 * As y^PERIOD = 1/2, we can combine
+	 *    y^n = 1/2^(n/PERIOD) * y^(n%PERIOD)
+	 * With a look-up table which covers y^n (n<PERIOD)
+	 *
+	 * To achieve constant time decay_load.
+	 */
+	if (unlikely(local_n >= LOAD_AVG_PERIOD)) {
+		val >>= local_n / LOAD_AVG_PERIOD;
+		local_n %= LOAD_AVG_PERIOD;
+	}
+
+	val = mul_u64_u32_shr(val, runnable_avg_yN_inv[local_n], 32);
+	return val;
+}
+
+static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
+{
+	u32 c1, c2, c3 = d3; /* y^0 == 1 */
+
+	/*
+	 * c1 = d1 y^p
+	 */
+	c1 = decay_load((u64)d1, periods);
+
+	/*
+	 *            p-1
+	 * c2 = 1024 \Sum y^n
+	 *            n=1
+	 *
+	 *              inf        inf
+	 *    = 1024 ( \Sum y^n - \Sum y^n - y^0 )
+	 *              n=0        n=p
+	 */
+	c2 = LOAD_AVG_MAX - decay_load(LOAD_AVG_MAX, periods) - 1024;
+
+	return c1 + c2 + c3;
+}
+
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+
+/*
+ * Accumulate the three separate parts of the sum; d1 the remainder
+ * of the last (incomplete) period, d2 the span of full periods and d3
+ * the remainder of the (incomplete) current period.
+ *
+ *           d1          d2           d3
+ *           ^           ^            ^
+ *           |           |            |
+ *         |<->|<----------------->|<--->|
+ * ... |---x---|------| ... |------|-----x (now)
+ *
+ *                           p-1
+ * u' = (u + d1) y^p + 1024 \Sum y^n + d3 y^0
+ *                           n=1
+ *
+ *    = u y^p +					(Step 1)
+ *
+ *                     p-1
+ *      d1 y^p + 1024 \Sum y^n + d3 y^0		(Step 2)
+ *                     n=1
+ */
+static __always_inline u32
+accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
+	       unsigned long load, unsigned long runnable, int running)
+{
+	unsigned long scale_freq, scale_cpu;
+	u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
+	u64 periods;
+
+	scale_freq = arch_scale_freq_capacity(cpu);
+	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+	delta += sa->period_contrib;
+	periods = delta / 1024; /* A period is 1024us (~1ms) */
+
+	/*
+	 * Step 1: decay old *_sum if we crossed period boundaries.
+	 */
+	if (periods) {
+		sa->load_sum = decay_load(sa->load_sum, periods);
+		sa->runnable_load_sum =
+			decay_load(sa->runnable_load_sum, periods);
+		sa->util_sum = decay_load((u64)(sa->util_sum), periods);
+
+		/*
+		 * Step 2
+		 */
+		delta %= 1024;
+		contrib = __accumulate_pelt_segments(periods,
+				1024 - sa->period_contrib, delta);
+	}
+	sa->period_contrib = delta;
+
+	contrib = cap_scale(contrib, scale_freq);
+	if (load)
+		sa->load_sum += load * contrib;
+	if (runnable)
+		sa->runnable_load_sum += runnable * contrib;
+	if (running)
+		sa->util_sum += contrib * scale_cpu;
+
+	return periods;
+}
+
+/*
+ * We can represent the historical contribution to runnable average as the
+ * coefficients of a geometric series.  To do this we sub-divide our runnable
+ * history into segments of approximately 1ms (1024us); label the segment that
+ * occurred N-ms ago p_N, with p_0 corresponding to the current period, e.g.
+ *
+ * [<- 1024us ->|<- 1024us ->|<- 1024us ->| ...
+ *      p0            p1           p2
+ *     (now)       (~1ms ago)  (~2ms ago)
+ *
+ * Let u_i denote the fraction of p_i that the entity was runnable.
+ *
+ * We then designate the fractions u_i as our co-efficients, yielding the
+ * following representation of historical load:
+ *   u_0 + u_1*y + u_2*y^2 + u_3*y^3 + ...
+ *
+ * We choose y based on the with of a reasonably scheduling period, fixing:
+ *   y^32 = 0.5
+ *
+ * This means that the contribution to load ~32ms ago (u_32) will be weighted
+ * approximately half as much as the contribution to load within the last ms
+ * (u_0).
+ *
+ * When a period "rolls over" and we have new u_0`, multiplying the previous
+ * sum again by y is sufficient to update:
+ *   load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
+ *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
+ */
+static __always_inline int
+___update_load_sum(u64 now, int cpu, struct sched_avg *sa,
+		  unsigned long load, unsigned long runnable, int running)
+{
+	u64 delta;
+
+	delta = now - sa->last_update_time;
+	/*
+	 * This should only happen when time goes backwards, which it
+	 * unfortunately does during sched clock init when we swap over to TSC.
+	 */
+	if ((s64)delta < 0) {
+		sa->last_update_time = now;
+		return 0;
+	}
+
+	/*
+	 * Use 1024ns as the unit of measurement since it's a reasonable
+	 * approximation of 1us and fast to compute.
+	 */
+	delta >>= 10;
+	if (!delta)
+		return 0;
+
+	sa->last_update_time += delta << 10;
+
+	/*
+	 * running is a subset of runnable (weight) so running can't be set if
+	 * runnable is clear. But there are some corner cases where the current
+	 * se has been already dequeued but cfs_rq->curr still points to it.
+	 * This means that weight will be 0 but not running for a sched_entity
+	 * but also for a cfs_rq if the latter becomes idle. As an example,
+	 * this happens during idle_balance() which calls
+	 * update_blocked_averages()
+	 */
+	if (!load)
+		runnable = running = 0;
+
+	/*
+	 * Now we know we crossed measurement unit boundaries. The *_avg
+	 * accrues by two steps:
+	 *
+	 * Step 1: accumulate *_sum since last_update_time. If we haven't
+	 * crossed period boundaries, finish.
+	 */
+	if (!accumulate_sum(delta, cpu, sa, load, runnable, running))
+		return 0;
+
+	return 1;
+}
+
+static __always_inline void
+___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runnable)
+{
+	u32 divider = LOAD_AVG_MAX - 1024 + sa->period_contrib;
+
+	/*
+	 * Step 2: update *_avg.
+	 */
+	sa->load_avg = div_u64(load * sa->load_sum, divider);
+	sa->runnable_load_avg =	div_u64(runnable * sa->runnable_load_sum, divider);
+	sa->util_avg = sa->util_sum / divider;
+}
+
+/*
+ * sched_entity:
+ *
+ *   task:
+ *     se_runnable() == se_weight()
+ *
+ *   group: [ see update_cfs_group() ]
+ *     se_weight()   = tg->weight * grq->load_avg / tg->load_avg
+ *     se_runnable() = se_weight(se) * grq->runnable_load_avg / grq->load_avg
+ *
+ *   load_sum := runnable_sum
+ *   load_avg = se_weight(se) * runnable_avg
+ *
+ *   runnable_load_sum := runnable_sum
+ *   runnable_load_avg = se_runnable(se) * runnable_avg
+ *
+ * XXX collapse load_sum and runnable_load_sum
+ *
+ * cfq_rq:
+ *
+ *   load_sum = \Sum se_weight(se) * se->avg.load_sum
+ *   load_avg = \Sum se->avg.load_avg
+ *
+ *   runnable_load_sum = \Sum se_runnable(se) * se->avg.runnable_load_sum
+ *   runnable_load_avg = \Sum se->avg.runable_load_avg
+ */
+
+int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
+{
+	if (entity_is_task(se))
+		se->runnable_weight = se->load.weight;
+
+	if (___update_load_sum(now, cpu, &se->avg, 0, 0, 0)) {
+		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
+		return 1;
+	}
+
+	return 0;
+}
+
+int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	if (entity_is_task(se))
+		se->runnable_weight = se->load.weight;
+
+	if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
+				cfs_rq->curr == se)) {
+
+		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
+		cfs_se_util_change(&se->avg);
+		return 1;
+	}
+
+	return 0;
+}
+
+int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
+{
+	if (___update_load_sum(now, cpu, &cfs_rq->avg,
+				scale_load_down(cfs_rq->load.weight),
+				scale_load_down(cfs_rq->runnable_weight),
+				cfs_rq->curr != NULL)) {
+
+		___update_load_avg(&cfs_rq->avg, 1, 1);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
new file mode 100644
index 0000000..9cac73e
--- /dev/null
+++ b/kernel/sched/pelt.h
@@ -0,0 +1,43 @@
+#ifdef CONFIG_SMP
+
+int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se);
+int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se);
+int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq);
+
+/*
+ * When a task is dequeued, its estimated utilization should not be update if
+ * its util_avg has not been updated at least once.
+ * This flag is used to synchronize util_avg updates with util_est updates.
+ * We map this information into the LSB bit of the utilization saved at
+ * dequeue time (i.e. util_est.dequeued).
+ */
+#define UTIL_AVG_UNCHANGED 0x1
+
+static inline void cfs_se_util_change(struct sched_avg *avg)
+{
+	unsigned int enqueued;
+
+	if (!sched_feat(UTIL_EST))
+		return;
+
+	/* Avoid store if the flag has been already set */
+	enqueued = avg->util_est.enqueued;
+	if (!(enqueued & UTIL_AVG_UNCHANGED))
+		return;
+
+	/* Reset flag to report util_avg has been updated */
+	enqueued &= ~UTIL_AVG_UNCHANGED;
+	WRITE_ONCE(avg->util_est.enqueued, enqueued);
+}
+
+#else
+
+static inline int
+update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
+{
+	return 0;
+}
+
+#endif
+
+
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 67702b4..757a3ee 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -666,7 +666,26 @@ struct dl_rq {
 	u64			bw_ratio;
 };
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+/* An entity is a task if it doesn't "own" a runqueue */
+#define entity_is_task(se)	(!se->my_q)
+#else
+#define entity_is_task(se)	1
+#endif
+
 #ifdef CONFIG_SMP
+/*
+ * XXX we want to get rid of these helpers and use the full load resolution.
+ */
+static inline long se_weight(struct sched_entity *se)
+{
+	return scale_load_down(se->load.weight);
+}
+
+static inline long se_runnable(struct sched_entity *se)
+{
+	return scale_load_down(se->runnable_weight);
+}
 
 static inline bool sched_asym_prefer(int a, int b)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 02/11] sched/pelt: remove blank line
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 01/11] sched/pelt: Move pelt related code in a dedicated file Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-21 14:33   ` Peter Zijlstra
  2018-06-08 12:09 ` [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking Vincent Guittot
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

remove some blank lines in __update_load_avg_se() and __update_load_avg_cfs_rq

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/pelt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index e6ecbb2..4174582 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -287,7 +287,6 @@ int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_e
 
 	if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
 				cfs_rq->curr == se)) {
-
 		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
 		cfs_se_util_change(&se->avg);
 		return 1;
@@ -302,7 +301,6 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
 				scale_load_down(cfs_rq->load.weight),
 				scale_load_down(cfs_rq->runnable_weight),
 				cfs_rq->curr != NULL)) {
-
 		___update_load_avg(&cfs_rq->avg, 1, 1);
 		return 1;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 01/11] sched/pelt: Move pelt related code in a dedicated file Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 02/11] sched/pelt: remove blank line Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-15 11:52   ` Dietmar Eggemann
  2018-06-21 18:50   ` Peter Zijlstra
  2018-06-08 12:09 ` [PATCH v6 04/11] cpufreq/schedutil: use rt " Vincent Guittot
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

schedutil governor relies on cfs_rq's util_avg to choose the OPP when cfs
tasks are running. When the CPU is overloaded by cfs and rt tasks, cfs tasks
are preempted by rt tasks and in this case util_avg reflects the remaining
capacity but not what cfs want to use. In such case, schedutil can select a
lower OPP whereas the CPU is overloaded. In order to have a more accurate
view of the utilization of the CPU, we track the utilization of rt tasks.

rt_rq uses rq_clock_task and cfs_rq uses cfs_rq_clock_task but they are
the same at the root group level, so the PELT windows of the util_sum are
aligned.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 15 ++++++++++++++-
 kernel/sched/pelt.c  | 22 ++++++++++++++++++++++
 kernel/sched/pelt.h  |  7 +++++++
 kernel/sched/rt.c    | 13 +++++++++++++
 kernel/sched/sched.h |  7 +++++++
 5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6390c66..e471fae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7290,6 +7290,14 @@ static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq)
 	return false;
 }
 
+static inline bool rt_rq_has_blocked(struct rq *rq)
+{
+	if (READ_ONCE(rq->avg_rt.util_avg))
+		return true;
+
+	return false;
+}
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
 static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
@@ -7349,6 +7357,10 @@ static void update_blocked_averages(int cpu)
 		if (cfs_rq_has_blocked(cfs_rq))
 			done = false;
 	}
+	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+	/* Don't need periodic decay once load/util_avg are null */
+	if (rt_rq_has_blocked(rq))
+		done = false;
 
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
@@ -7414,9 +7426,10 @@ static inline void update_blocked_averages(int cpu)
 	rq_lock_irqsave(rq, &rf);
 	update_rq_clock(rq);
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
-	if (!cfs_rq_has_blocked(cfs_rq))
+	if (!cfs_rq_has_blocked(cfs_rq) && !rt_rq_has_blocked(rq))
 		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 4174582..81c0d7e 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -307,3 +307,25 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
 
 	return 0;
 }
+
+/*
+ * rt_rq:
+ *
+ *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ *   util_sum = cpu_scale * load_sum
+ *   runnable_load_sum = load_sum
+ *
+ */
+
+int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+	if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
+				running,
+				running,
+				running)) {
+		___update_load_avg(&rq->avg_rt, 1, 1);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 9cac73e..b2983b7 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -3,6 +3,7 @@
 int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se);
 int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se);
 int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq);
+int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
 
 /*
  * When a task is dequeued, its estimated utilization should not be update if
@@ -38,6 +39,12 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 	return 0;
 }
 
+static inline int
+update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+	return 0;
+}
+
 #endif
 
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ef3c4e6..e8c08a8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -5,6 +5,8 @@
  */
 #include "sched.h"
 
+#include "pelt.h"
+
 int sched_rr_timeslice = RR_TIMESLICE;
 int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
 
@@ -1572,6 +1574,14 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	rt_queue_push_tasks(rq);
 
+	/*
+	 * If prev task was rt, put_prev_task() has already updated the
+	 * utilization. We only care of the case where we start to schedule a
+	 * rt task
+	 */
+	if (rq->curr->sched_class != &rt_sched_class)
+		update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+
 	return p;
 }
 
@@ -1579,6 +1589,8 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 {
 	update_curr_rt(rq);
 
+	update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
+
 	/*
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
@@ -2308,6 +2320,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	struct sched_rt_entity *rt_se = &p->rt;
 
 	update_curr_rt(rq);
+	update_rt_rq_load_avg(rq_clock_task(rq), rq, 1);
 
 	watchdog(rq, p);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 757a3ee..7a16de9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -592,6 +592,7 @@ struct rt_rq {
 	unsigned long		rt_nr_total;
 	int			overloaded;
 	struct plist_head	pushable_tasks;
+
 #endif /* CONFIG_SMP */
 	int			rt_queued;
 
@@ -847,6 +848,7 @@ struct rq {
 
 	u64			rt_avg;
 	u64			age_stamp;
+	struct sched_avg	avg_rt;
 	u64			idle_stamp;
 	u64			avg_idle;
 
@@ -2205,4 +2207,9 @@ static inline unsigned long cpu_util_cfs(struct rq *rq)
 
 	return util;
 }
+
+static inline unsigned long cpu_util_rt(struct rq *rq)
+{
+	return rq->avg_rt.util_avg;
+}
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (2 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-18  9:00   ` Dietmar Eggemann
  2018-06-21 18:45   ` Peter Zijlstra
  2018-06-08 12:09 ` [PATCH v6 05/11] sched/dl: add dl_rq " Vincent Guittot
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

Take into account rt utilization when selecting an OPP for cfs tasks in order
to reflect the utilization of the CPU.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 28592b6..32f97fb 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -56,6 +56,7 @@ struct sugov_cpu {
 	/* The fields below are only needed when sharing a policy: */
 	unsigned long		util_cfs;
 	unsigned long		util_dl;
+	unsigned long		util_rt;
 	unsigned long		max;
 
 	/* The field below is for single-CPU policies only: */
@@ -178,15 +179,21 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
 	sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
 	sg_cpu->util_cfs = cpu_util_cfs(rq);
 	sg_cpu->util_dl  = cpu_util_dl(rq);
+	sg_cpu->util_rt  = cpu_util_rt(rq);
 }
 
 static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
+	unsigned long util;
 
 	if (rq->rt.rt_nr_running)
 		return sg_cpu->max;
 
+	util = sg_cpu->util_dl;
+	util += sg_cpu->util_cfs;
+	util += sg_cpu->util_rt;
+
 	/*
 	 * Utilization required by DEADLINE must always be granted while, for
 	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
@@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 	 * ready for such an interface. So, we only do the latter for now.
 	 */
-	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
+	return min(sg_cpu->max, util);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 05/11] sched/dl: add dl_rq utilization tracking
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (3 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 04/11] cpufreq/schedutil: use rt " Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 06/11] cpufreq/schedutil: use dl " Vincent Guittot
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

Similarly to what happens with rt tasks, cfs tasks can be preempted by dl
tasks and the cfs's utilization might no longer describes the real
utilization level.
Current dl bandwidth reflects the requirements to meet deadline when tasks are
enqueued but not the current utilization of the dl sched class. We track
dl class utilization to help to estimate the system utilization.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/deadline.c |  6 ++++++
 kernel/sched/fair.c     | 11 ++++++++---
 kernel/sched/pelt.c     | 22 ++++++++++++++++++++++
 kernel/sched/pelt.h     |  6 ++++++
 kernel/sched/sched.h    |  1 +
 5 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1356afd..596097f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -16,6 +16,7 @@
  *                    Fabio Checconi <fchecconi@gmail.com>
  */
 #include "sched.h"
+#include "pelt.h"
 
 struct dl_bandwidth def_dl_bandwidth;
 
@@ -1761,6 +1762,9 @@ pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	deadline_queue_push_tasks(rq);
 
+	if (rq->curr->sched_class != &dl_sched_class)
+		update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+
 	return p;
 }
 
@@ -1768,6 +1772,7 @@ static void put_prev_task_dl(struct rq *rq, struct task_struct *p)
 {
 	update_curr_dl(rq);
 
+	update_dl_rq_load_avg(rq_clock_task(rq), rq, 1);
 	if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_dl_task(rq, p);
 }
@@ -1784,6 +1789,7 @@ static void task_tick_dl(struct rq *rq, struct task_struct *p, int queued)
 {
 	update_curr_dl(rq);
 
+	update_dl_rq_load_avg(rq_clock_task(rq), rq, 1);
 	/*
 	 * Even when we have runtime, update_curr_dl() might have resulted in us
 	 * not being the leftmost task anymore. In that case NEED_RESCHED will
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e471fae..71fe74a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7290,11 +7290,14 @@ static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq)
 	return false;
 }
 
-static inline bool rt_rq_has_blocked(struct rq *rq)
+static inline bool others_rqs_have_blocked(struct rq *rq)
 {
 	if (READ_ONCE(rq->avg_rt.util_avg))
 		return true;
 
+	if (READ_ONCE(rq->avg_dl.util_avg))
+		return true;
+
 	return false;
 }
 
@@ -7358,8 +7361,9 @@ static void update_blocked_averages(int cpu)
 			done = false;
 	}
 	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
 	/* Don't need periodic decay once load/util_avg are null */
-	if (rt_rq_has_blocked(rq))
+	if (others_rqs_have_blocked(rq))
 		done = false;
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -7427,9 +7431,10 @@ static inline void update_blocked_averages(int cpu)
 	update_rq_clock(rq);
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
 	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
+	update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
-	if (!cfs_rq_has_blocked(cfs_rq) && !rt_rq_has_blocked(rq))
+	if (!cfs_rq_has_blocked(cfs_rq) && !others_rqs_have_blocked(rq))
 		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 81c0d7e..b86405e 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -329,3 +329,25 @@ int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
 
 	return 0;
 }
+
+/*
+ * dl_rq:
+ *
+ *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ *   util_sum = cpu_scale * load_sum
+ *   runnable_load_sum = load_sum
+ *
+ */
+
+int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+	if (___update_load_sum(now, rq->cpu, &rq->avg_dl,
+				running,
+				running,
+				running)) {
+		___update_load_avg(&rq->avg_dl, 1, 1);
+		return 1;
+	}
+
+	return 0;
+}
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index b2983b7..0e4f912 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -4,6 +4,7 @@ int __update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se);
 int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se);
 int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq);
 int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
+int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
 
 /*
  * When a task is dequeued, its estimated utilization should not be update if
@@ -45,6 +46,11 @@ update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
 	return 0;
 }
 
+static inline int
+update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
+{
+	return 0;
+}
 #endif
 
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7a16de9..4526ba6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -849,6 +849,7 @@ struct rq {
 	u64			rt_avg;
 	u64			age_stamp;
 	struct sched_avg	avg_rt;
+	struct sched_avg	avg_dl;
 	u64			idle_stamp;
 	u64			avg_idle;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (4 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 05/11] sched/dl: add dl_rq " Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:39   ` Juri Lelli
  2018-06-22 15:24   ` Peter Zijlstra
  2018-06-08 12:09 ` [PATCH v6 07/11] sched/irq: add irq " Vincent Guittot
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

Now that we have both the dl class bandwidth requirement and the dl class
utilization, we can detect when CPU is fully used so we should run at max.
Otherwise, we keep using the dl bandwidth requirement to define the
utilization of the CPU

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 24 +++++++++++++++---------
 kernel/sched/sched.h             |  7 ++++++-
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 32f97fb..25cee59 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -56,6 +56,7 @@ struct sugov_cpu {
 	/* The fields below are only needed when sharing a policy: */
 	unsigned long		util_cfs;
 	unsigned long		util_dl;
+	unsigned long		bw_dl;
 	unsigned long		util_rt;
 	unsigned long		max;
 
@@ -179,6 +180,7 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
 	sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
 	sg_cpu->util_cfs = cpu_util_cfs(rq);
 	sg_cpu->util_dl  = cpu_util_dl(rq);
+	sg_cpu->bw_dl    = cpu_bw_dl(rq);
 	sg_cpu->util_rt  = cpu_util_rt(rq);
 }
 
@@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 	if (rq->rt.rt_nr_running)
 		return sg_cpu->max;
 
-	util = sg_cpu->util_dl;
-	util += sg_cpu->util_cfs;
+	util = sg_cpu->util_cfs;
 	util += sg_cpu->util_rt;
 
+	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
+		return sg_cpu->max;
+
 	/*
-	 * Utilization required by DEADLINE must always be granted while, for
-	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
-	 * gracefully reduce the frequency when no tasks show up for longer
+	 * As there is still idle time on the CPU, we need to compute the
+	 * utilization level of the CPU.
+	 * Bandwidth required by DEADLINE must always be granted while, for
+	 * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
+	 * to gracefully reduce the frequency when no tasks show up for longer
 	 * periods of time.
-	 *
-	 * Ideally we would like to set util_dl as min/guaranteed freq and
-	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
-	 * ready for such an interface. So, we only do the latter for now.
 	 */
+
+	/* Add DL bandwidth requirement */
+	util += sg_cpu->bw_dl;
+
 	return min(sg_cpu->max, util);
 }
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4526ba6..bc4305f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2192,11 +2192,16 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 #endif
 
 #ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
-static inline unsigned long cpu_util_dl(struct rq *rq)
+static inline unsigned long cpu_bw_dl(struct rq *rq)
 {
 	return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
 }
 
+static inline unsigned long cpu_util_dl(struct rq *rq)
+{
+	return READ_ONCE(rq->avg_dl.util_avg);
+}
+
 static inline unsigned long cpu_util_cfs(struct rq *rq)
 {
 	unsigned long util = READ_ONCE(rq->cfs.avg.util_avg);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 07/11] sched/irq: add irq utilization tracking
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (5 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 06/11] cpufreq/schedutil: use dl " Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt Vincent Guittot
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

interrupt and steal time are the only remaining activities tracked by
rt_avg. Like for sched classes, we can use PELT to track their average
utilization of the CPU. But unlike sched class, we don't track when
entering/leaving interrupt; Instead, we take into account the time spent
under interrupt context when we update rqs' clock (rq_clock_task).
This also means that we have to decay the normal context time and account
for interrupt time during the update.

That's also important to note that because
  rq_clock == rq_clock_task + interrupt time
and rq_clock_task is used by a sched class to compute its utilization, the
util_avg of a sched class only reflects the utilization of the time spent
in normal context and not of the whole time of the CPU. Adding the utilization
of interrupt gives an more accurate estimate of utilization of CPU.
The CPU utilization is :
  avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq

Most of the time, avg_irq is small and neglictible so the use of the
approximation CPU utilization = /Sum avg_rq was enough

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c  |  4 +++-
 kernel/sched/fair.c  | 13 ++++++++++---
 kernel/sched/pelt.c  | 40 ++++++++++++++++++++++++++++++++++++++++
 kernel/sched/pelt.h  | 16 ++++++++++++++++
 kernel/sched/sched.h |  3 +++
 5 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d155518..ab58288 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -16,6 +16,8 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#include "pelt.h"
+
 #define CREATE_TRACE_POINTS
 #include <trace/events/sched.h>
 
@@ -184,7 +186,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 
 #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
 	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
-		sched_rt_avg_update(rq, irq_delta + steal);
+		update_irq_load_avg(rq, irq_delta + steal);
 #endif
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 71fe74a..cc7a6e2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7290,7 +7290,7 @@ static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq)
 	return false;
 }
 
-static inline bool others_rqs_have_blocked(struct rq *rq)
+static inline bool others_have_blocked(struct rq *rq)
 {
 	if (READ_ONCE(rq->avg_rt.util_avg))
 		return true;
@@ -7298,6 +7298,11 @@ static inline bool others_rqs_have_blocked(struct rq *rq)
 	if (READ_ONCE(rq->avg_dl.util_avg))
 		return true;
 
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+	if (READ_ONCE(rq->avg_irq.util_avg))
+		return true;
+#endif
+
 	return false;
 }
 
@@ -7362,8 +7367,9 @@ static void update_blocked_averages(int cpu)
 	}
 	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
 	update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+	update_irq_load_avg(rq, 0);
 	/* Don't need periodic decay once load/util_avg are null */
-	if (others_rqs_have_blocked(rq))
+	if (others_have_blocked(rq))
 		done = false;
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -7432,9 +7438,10 @@ static inline void update_blocked_averages(int cpu)
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
 	update_rt_rq_load_avg(rq_clock_task(rq), rq, 0);
 	update_dl_rq_load_avg(rq_clock_task(rq), rq, 0);
+	update_irq_load_avg(rq, 0);
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
-	if (!cfs_rq_has_blocked(cfs_rq) && !others_rqs_have_blocked(rq))
+	if (!cfs_rq_has_blocked(cfs_rq) && !others_have_blocked(rq))
 		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index b86405e..b43e2af 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -351,3 +351,43 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 
 	return 0;
 }
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+/*
+ * irq:
+ *
+ *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
+ *   util_sum = cpu_scale * load_sum
+ *   runnable_load_sum = load_sum
+ *
+ */
+
+int update_irq_load_avg(struct rq *rq, u64 running)
+{
+	int ret = 0;
+	/*
+	 * We know the time that has been used by interrupt since last update
+	 * but we don't when. Let be pessimistic and assume that interrupt has
+	 * happened just before the update. This is not so far from reality
+	 * because interrupt will most probably wake up task and trig an update
+	 * of rq clock during which the metric si updated.
+	 * We start to decay with normal context time and then we add the
+	 * interrupt context time.
+	 * We can safely remove running from rq->clock because
+	 * rq->clock += delta with delta >= running
+	 */
+	ret = ___update_load_sum(rq->clock - running, rq->cpu, &rq->avg_irq,
+				0,
+				0,
+				0);
+	ret += ___update_load_sum(rq->clock, rq->cpu, &rq->avg_irq,
+				1,
+				1,
+				1);
+
+	if (ret)
+		___update_load_avg(&rq->avg_irq, 1, 1);
+
+	return ret;
+}
+#endif
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index 0e4f912..d2894db 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -6,6 +6,16 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq);
 int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
 int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
 
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+int update_irq_load_avg(struct rq *rq, u64 running);
+#else
+static inline int
+update_irq_load_avg(struct rq *rq, u64 running)
+{
+	return 0;
+}
+#endif
+
 /*
  * When a task is dequeued, its estimated utilization should not be update if
  * its util_avg has not been updated at least once.
@@ -51,6 +61,12 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
 {
 	return 0;
 }
+
+static inline int
+update_irq_load_avg(struct rq *rq, u64 running)
+{
+	return 0;
+}
 #endif
 
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bc4305f..b534a43 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -850,6 +850,9 @@ struct rq {
 	u64			age_stamp;
 	struct sched_avg	avg_rt;
 	struct sched_avg	avg_dl;
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+	struct sched_avg	avg_irq;
+#endif
 	u64			idle_stamp;
 	u64			avg_idle;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (6 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 07/11] sched/irq: add irq " Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-12  8:54   ` Dietmar Eggemann
  2018-06-08 12:09 ` [PATCH v6 09/11] sched: use pelt for scale_rt_capacity() Vincent Guittot
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

The time spent under interrupt can be significant but it is not reflected
in the utilization of CPU when deciding to choose an OPP. Now that we have
access to this metric, schedutil can take it into account when selecting
the OPP for a CPU.
rqs utilization don't see the time spend under interrupt context and report
their value in the normal context time window. We need to compensate this when
adding interrupt utilization

The CPU utilization is :
  irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg

A test with iperf on hikey (octo arm64) gives:
iperf -c server_address -r -t 5

w/o patch		w/ patch
Tx 276 Mbits/sec        304 Mbits/sec +10%
Rx 299 Mbits/sec        328 Mbits/sec +09%

8 iterations
stdev is lower than 1%
Only WFI idle state is enable (shallowest diel state)

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 25 +++++++++++++++++++++----
 kernel/sched/sched.h             | 13 +++++++++++++
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 25cee59..092c310 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -58,6 +58,7 @@ struct sugov_cpu {
 	unsigned long		util_dl;
 	unsigned long		bw_dl;
 	unsigned long		util_rt;
+	unsigned long		util_irq;
 	unsigned long		max;
 
 	/* The field below is for single-CPU policies only: */
@@ -182,21 +183,30 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
 	sg_cpu->util_dl  = cpu_util_dl(rq);
 	sg_cpu->bw_dl    = cpu_bw_dl(rq);
 	sg_cpu->util_rt  = cpu_util_rt(rq);
+	sg_cpu->util_irq = cpu_util_irq(rq);
 }
 
 static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
-	unsigned long util;
+	unsigned long util, max = sg_cpu->max;
 
 	if (rq->rt.rt_nr_running)
 		return sg_cpu->max;
 
+	if (unlikely(sg_cpu->util_irq >= max))
+		return max;
+
+	/* Sum rq utilization */
 	util = sg_cpu->util_cfs;
 	util += sg_cpu->util_rt;
 
-	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
-		return sg_cpu->max;
:confirm b9
+	/*
+	 * Interrupt time is not seen by rqs utilization so we can compare
+	 * them with the CPU capacity
+	 */
+	if ((util + sg_cpu->util_dl) >= max)
+		return max;
 
 	/*
 	 * As there is still idle time on the CPU, we need to compute the
@@ -207,10 +217,17 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 	 * periods of time.
 	 */
 
+	/* Weight rqs utilization to normal context window */
+	util *= (max - sg_cpu->util_irq);
+	util /= max;
+
+	/* Add interrupt utilization */
+	util += sg_cpu->util_irq;
+
 	/* Add DL bandwidth requirement */
 	util += sg_cpu->bw_dl;
 
-	return min(sg_cpu->max, util);
+	return min(max, util);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b534a43..873b567 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2221,4 +2221,17 @@ static inline unsigned long cpu_util_rt(struct rq *rq)
 {
 	return rq->avg_rt.util_avg;
 }
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+	return rq->avg_irq.util_avg;
+}
+#else
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+	return 0;
+}
+
+#endif
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 09/11] sched: use pelt for scale_rt_capacity()
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (7 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 10/11] sched: remove rt_avg code Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 11/11] proc/sched: remove unused sched_time_avg_ms Vincent Guittot
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

The utilization of the CPU by rt, dl and interrupts are now tracked with
PELT so we can use these metrics instead of rt_avg to evaluate the remaining
capacity available for cfs class.

scale_rt_capacity() behavior has been changed and now returns the remaining
capacity available for cfs instead of a scaling factor because rt, dl and
interrupt provide now absolute utilization value.

The same formula as schedutil is used:
  irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg
but the implementation is different because it doesn't return the same value
and doesn't benefit of the same optimization

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/deadline.c |  2 --
 kernel/sched/fair.c     | 41 +++++++++++++++++++----------------------
 kernel/sched/pelt.c     |  2 +-
 kernel/sched/rt.c       |  2 --
 4 files changed, 20 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 596097f..c882d156 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1180,8 +1180,6 @@ static void update_curr_dl(struct rq *rq)
 	curr->se.exec_start = now;
 	cgroup_account_cputime(curr, delta_exec);
 
-	sched_rt_avg_update(rq, delta_exec);
-
 	if (dl_entity_is_special(dl_se))
 		return;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc7a6e2..fefd71b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7551,39 +7551,36 @@ static inline int get_sd_load_idx(struct sched_domain *sd,
 static unsigned long scale_rt_capacity(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	u64 total, used, age_stamp, avg;
-	s64 delta;
-
-	/*
-	 * Since we're reading these variables without serialization make sure
-	 * we read them once before doing sanity checks on them.
-	 */
-	age_stamp = READ_ONCE(rq->age_stamp);
-	avg = READ_ONCE(rq->rt_avg);
-	delta = __rq_clock_broken(rq) - age_stamp;
+	unsigned long max = arch_scale_cpu_capacity(NULL, cpu);
+	unsigned long used, irq, free;
 
-	if (unlikely(delta < 0))
-		delta = 0;
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+	irq = READ_ONCE(rq->avg_irq.util_avg);
 
-	total = sched_avg_period() + delta;
+	if (unlikely(irq >= max))
+		return 1;
+#endif
 
-	used = div_u64(avg, total);
+	used = READ_ONCE(rq->avg_rt.util_avg);
+	used += READ_ONCE(rq->avg_dl.util_avg);
 
-	if (likely(used < SCHED_CAPACITY_SCALE))
-		return SCHED_CAPACITY_SCALE - used;
+	if (unlikely(used >= max))
+		return 1;
 
-	return 1;
+	free = max - used;
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+	free *= (max - irq);
+	free /= max;
+#endif
+	return free;
 }
 
 static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 {
-	unsigned long capacity = arch_scale_cpu_capacity(sd, cpu);
+	unsigned long capacity = scale_rt_capacity(cpu);
 	struct sched_group *sdg = sd->groups;
 
-	cpu_rq(cpu)->cpu_capacity_orig = capacity;
-
-	capacity *= scale_rt_capacity(cpu);
-	capacity >>= SCHED_CAPACITY_SHIFT;
+	cpu_rq(cpu)->cpu_capacity_orig = arch_scale_cpu_capacity(sd, cpu);
 
 	if (!capacity)
 		capacity = 1;
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index b43e2af..e2d53a6 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -237,7 +237,7 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load, unsigned long runna
 	 */
 	sa->load_avg = div_u64(load * sa->load_sum, divider);
 	sa->runnable_load_avg =	div_u64(runnable * sa->runnable_load_sum, divider);
-	sa->util_avg = sa->util_sum / divider;
+	WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
 }
 
 /*
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e8c08a8..4fd2b57 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -970,8 +970,6 @@ static void update_curr_rt(struct rq *rq)
 	curr->se.exec_start = now;
 	cgroup_account_cputime(curr, delta_exec);
 
-	sched_rt_avg_update(rq, delta_exec);
-
 	if (!rt_bandwidth_enabled())
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 10/11] sched: remove rt_avg code
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (8 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 09/11] sched: use pelt for scale_rt_capacity() Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  2018-06-08 12:09 ` [PATCH v6 11/11] proc/sched: remove unused sched_time_avg_ms Vincent Guittot
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar

rt_avg is no more used anywhere so we can remove all related code

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/core.c  | 26 --------------------------
 kernel/sched/fair.c  |  2 --
 kernel/sched/sched.h | 17 -----------------
 3 files changed, 45 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ab58288..213d277 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -650,23 +650,6 @@ bool sched_can_stop_tick(struct rq *rq)
 	return true;
 }
 #endif /* CONFIG_NO_HZ_FULL */
-
-void sched_avg_update(struct rq *rq)
-{
-	s64 period = sched_avg_period();
-
-	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (rq->age_stamp));
-		rq->age_stamp += period;
-		rq->rt_avg /= 2;
-	}
-}
-
 #endif /* CONFIG_SMP */
 
 #if defined(CONFIG_RT_GROUP_SCHED) || (defined(CONFIG_FAIR_GROUP_SCHED) && \
@@ -5710,13 +5693,6 @@ void set_rq_offline(struct rq *rq)
 	}
 }
 
-static void set_cpu_rq_start_time(unsigned int cpu)
-{
-	struct rq *rq = cpu_rq(cpu);
-
-	rq->age_stamp = sched_clock_cpu(cpu);
-}
-
 /*
  * used to mark begin/end of suspend/resume:
  */
@@ -5834,7 +5810,6 @@ static void sched_rq_cpu_starting(unsigned int cpu)
 
 int sched_cpu_starting(unsigned int cpu)
 {
-	set_cpu_rq_start_time(cpu);
 	sched_rq_cpu_starting(cpu);
 	sched_tick_start(cpu);
 	return 0;
@@ -6102,7 +6077,6 @@ void __init sched_init(void)
 
 #ifdef CONFIG_SMP
 	idle_thread_set_boot_cpu();
-	set_cpu_rq_start_time(smp_processor_id());
 #endif
 	init_sched_fair_class();
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fefd71b..3594692 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5323,8 +5323,6 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 
 		this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
 	}
-
-	sched_avg_update(this_rq);
 }
 
 /* Used instead of source_load when we know the type == 0 */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 873b567..1faab06 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -846,8 +846,6 @@ struct rq {
 
 	struct list_head cfs_tasks;
 
-	u64			rt_avg;
-	u64			age_stamp;
 	struct sched_avg	avg_rt;
 	struct sched_avg	avg_dl;
 #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
@@ -1712,11 +1710,6 @@ extern const_debug unsigned int sysctl_sched_time_avg;
 extern const_debug unsigned int sysctl_sched_nr_migrate;
 extern const_debug unsigned int sysctl_sched_migration_cost;
 
-static inline u64 sched_avg_period(void)
-{
-	return (u64)sysctl_sched_time_avg * NSEC_PER_MSEC / 2;
-}
-
 #ifdef CONFIG_SCHED_HRTICK
 
 /*
@@ -1753,8 +1746,6 @@ unsigned long arch_scale_freq_capacity(int cpu)
 #endif
 
 #ifdef CONFIG_SMP
-extern void sched_avg_update(struct rq *rq);
-
 #ifndef arch_scale_cpu_capacity
 static __always_inline
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
@@ -1765,12 +1756,6 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 	return SCHED_CAPACITY_SCALE;
 }
 #endif
-
-static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
-{
-	rq->rt_avg += rt_delta * arch_scale_freq_capacity(cpu_of(rq));
-	sched_avg_update(rq);
-}
 #else
 #ifndef arch_scale_cpu_capacity
 static __always_inline
@@ -1779,8 +1764,6 @@ unsigned long arch_scale_cpu_capacity(void __always_unused *sd, int cpu)
 	return SCHED_CAPACITY_SCALE;
 }
 #endif
-static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
-static inline void sched_avg_update(struct rq *rq) { }
 #endif
 
 struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 11/11] proc/sched: remove unused sched_time_avg_ms
  2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
                   ` (9 preceding siblings ...)
  2018-06-08 12:09 ` [PATCH v6 10/11] sched: remove rt_avg code Vincent Guittot
@ 2018-06-08 12:09 ` Vincent Guittot
  10 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:09 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, Vincent Guittot, Ingo Molnar,
	Kees Cook, Luis R. Rodriguez

/proc/sys/kernel/sched_time_avg_ms entry is not used anywhere.
Remove it

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched/sysctl.h | 1 -
 kernel/sched/core.c          | 8 --------
 kernel/sched/sched.h         | 1 -
 kernel/sysctl.c              | 8 --------
 4 files changed, 18 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index 1c1a151..913488d 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -40,7 +40,6 @@ extern unsigned int sysctl_numa_balancing_scan_size;
 #ifdef CONFIG_SCHED_DEBUG
 extern __read_mostly unsigned int sysctl_sched_migration_cost;
 extern __read_mostly unsigned int sysctl_sched_nr_migrate;
-extern __read_mostly unsigned int sysctl_sched_time_avg;
 
 int sched_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *length,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 213d277..9894bc7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -46,14 +46,6 @@ const_debug unsigned int sysctl_sched_features =
 const_debug unsigned int sysctl_sched_nr_migrate = 32;
 
 /*
- * period over which we average the RT time consumption, measured
- * in ms.
- *
- * default: 1s
- */
-const_debug unsigned int sysctl_sched_time_avg = MSEC_PER_SEC;
-
-/*
  * period over which we measure -rt task CPU usage in us.
  * default: 1s
  */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1faab06..6766a3c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1706,7 +1706,6 @@ extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
 extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags);
 
-extern const_debug unsigned int sysctl_sched_time_avg;
 extern const_debug unsigned int sysctl_sched_nr_migrate;
 extern const_debug unsigned int sysctl_sched_migration_cost;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6a78cf7..d77a959 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -368,14 +368,6 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
-	{
-		.procname	= "sched_time_avg_ms",
-		.data		= &sysctl_sched_time_avg,
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
-		.extra1		= &one,
-	},
 #ifdef CONFIG_SCHEDSTATS
 	{
 		.procname	= "sched_schedstats",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:09 ` [PATCH v6 06/11] cpufreq/schedutil: use dl " Vincent Guittot
@ 2018-06-08 12:39   ` Juri Lelli
  2018-06-08 12:48     ` Vincent Guittot
  2018-06-22 15:24   ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Juri Lelli @ 2018-06-08 12:39 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, rjw, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

Hi Vincent,

On 08/06/18 14:09, Vincent Guittot wrote:
> Now that we have both the dl class bandwidth requirement and the dl class
> utilization, we can detect when CPU is fully used so we should run at max.
> Otherwise, we keep using the dl bandwidth requirement to define the
> utilization of the CPU
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---

[...]

> @@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>  	if (rq->rt.rt_nr_running)
>  		return sg_cpu->max;
>  
> -	util = sg_cpu->util_dl;
> -	util += sg_cpu->util_cfs;
> +	util = sg_cpu->util_cfs;
>  	util += sg_cpu->util_rt;
>  
> +	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
> +		return sg_cpu->max;
> +

Mmm, won't we run at max (or reach max) with a, say, 100ms/500ms DL task
running alone?

Best,

- Juri

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:39   ` Juri Lelli
@ 2018-06-08 12:48     ` Vincent Guittot
  2018-06-08 12:54       ` Juri Lelli
  0 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 12:48 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On 8 June 2018 at 14:39, Juri Lelli <juri.lelli@redhat.com> wrote:
> Hi Vincent,
>
> On 08/06/18 14:09, Vincent Guittot wrote:
>> Now that we have both the dl class bandwidth requirement and the dl class
>> utilization, we can detect when CPU is fully used so we should run at max.
>> Otherwise, we keep using the dl bandwidth requirement to define the
>> utilization of the CPU
>>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>
> [...]
>
>> @@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>>       if (rq->rt.rt_nr_running)
>>               return sg_cpu->max;
>>
>> -     util = sg_cpu->util_dl;
>> -     util += sg_cpu->util_cfs;
>> +     util = sg_cpu->util_cfs;
>>       util += sg_cpu->util_rt;
>>
>> +     if ((util + sg_cpu->util_dl) >= sg_cpu->max)
>> +             return sg_cpu->max;
>> +
>
> Mmm, won't we run at max (or reach max) with a, say, 100ms/500ms DL task
> running alone?

not for a 100ms running task. You have to run more than 320ms to reach max value

100ms/500ms will vary between 0 and 907

Vincent

>
> Best,
>
> - Juri

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:48     ` Vincent Guittot
@ 2018-06-08 12:54       ` Juri Lelli
  2018-06-08 13:36         ` Juri Lelli
  0 siblings, 1 reply; 56+ messages in thread
From: Juri Lelli @ 2018-06-08 12:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On 08/06/18 14:48, Vincent Guittot wrote:
> On 8 June 2018 at 14:39, Juri Lelli <juri.lelli@redhat.com> wrote:
> > Hi Vincent,
> >
> > On 08/06/18 14:09, Vincent Guittot wrote:
> >> Now that we have both the dl class bandwidth requirement and the dl class
> >> utilization, we can detect when CPU is fully used so we should run at max.
> >> Otherwise, we keep using the dl bandwidth requirement to define the
> >> utilization of the CPU
> >>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> ---
> >
> > [...]
> >
> >> @@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >>       if (rq->rt.rt_nr_running)
> >>               return sg_cpu->max;
> >>
> >> -     util = sg_cpu->util_dl;
> >> -     util += sg_cpu->util_cfs;
> >> +     util = sg_cpu->util_cfs;
> >>       util += sg_cpu->util_rt;
> >>
> >> +     if ((util + sg_cpu->util_dl) >= sg_cpu->max)
> >> +             return sg_cpu->max;
> >> +
> >
> > Mmm, won't we run at max (or reach max) with a, say, 100ms/500ms DL task
> > running alone?
> 
> not for a 100ms running task. You have to run more than 320ms to reach max value
> 
> 100ms/500ms will vary between 0 and 907

OK, right, my point I guess is still that such a task will run fine at
~250 and it might be save more energy by doing so?

Also, less freq switches (consider for example a few background CFS
tasks waking up from time to time).

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:54       ` Juri Lelli
@ 2018-06-08 13:36         ` Juri Lelli
  2018-06-08 13:38           ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Juri Lelli @ 2018-06-08 13:36 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar, Luca Abeni,
	Claudio Scordino

On 08/06/18 14:54, Juri Lelli wrote:
> On 08/06/18 14:48, Vincent Guittot wrote:
> > On 8 June 2018 at 14:39, Juri Lelli <juri.lelli@redhat.com> wrote:
> > > Hi Vincent,
> > >
> > > On 08/06/18 14:09, Vincent Guittot wrote:
> > >> Now that we have both the dl class bandwidth requirement and the dl class
> > >> utilization, we can detect when CPU is fully used so we should run at max.
> > >> Otherwise, we keep using the dl bandwidth requirement to define the
> > >> utilization of the CPU
> > >>
> > >> Cc: Ingo Molnar <mingo@redhat.com>
> > >> Cc: Peter Zijlstra <peterz@infradead.org>
> > >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > >> ---
> > >
> > > [...]
> > >
> > >> @@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> > >>       if (rq->rt.rt_nr_running)
> > >>               return sg_cpu->max;
> > >>
> > >> -     util = sg_cpu->util_dl;
> > >> -     util += sg_cpu->util_cfs;
> > >> +     util = sg_cpu->util_cfs;
> > >>       util += sg_cpu->util_rt;
> > >>
> > >> +     if ((util + sg_cpu->util_dl) >= sg_cpu->max)
> > >> +             return sg_cpu->max;
> > >> +
> > >
> > > Mmm, won't we run at max (or reach max) with a, say, 100ms/500ms DL task
> > > running alone?
> > 
> > not for a 100ms running task. You have to run more than 320ms to reach max value
> > 
> > 100ms/500ms will vary between 0 and 907
> 
> OK, right, my point I guess is still that such a task will run fine at
> ~250 and it might be save more energy by doing so?

As discussed on IRC, we still endup selecting 1/5 of max freq because
util_dl is below max.

So, turning point is at ~320ms/[something_bigger], which looks a pretty
big runtime, but I'm not sure if having that is OK. Also, it becomes
smaller with CFS/RT background "perturbations". Mmm.

BTW, adding Luca and Claudio. :)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 13:36         ` Juri Lelli
@ 2018-06-08 13:38           ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-08 13:38 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar, Luca Abeni,
	Claudio Scordino

On 8 June 2018 at 15:36, Juri Lelli <juri.lelli@redhat.com> wrote:
> On 08/06/18 14:54, Juri Lelli wrote:
>> On 08/06/18 14:48, Vincent Guittot wrote:
>> > On 8 June 2018 at 14:39, Juri Lelli <juri.lelli@redhat.com> wrote:
>> > > Hi Vincent,
>> > >
>> > > On 08/06/18 14:09, Vincent Guittot wrote:
>> > >> Now that we have both the dl class bandwidth requirement and the dl class
>> > >> utilization, we can detect when CPU is fully used so we should run at max.
>> > >> Otherwise, we keep using the dl bandwidth requirement to define the
>> > >> utilization of the CPU
>> > >>
>> > >> Cc: Ingo Molnar <mingo@redhat.com>
>> > >> Cc: Peter Zijlstra <peterz@infradead.org>
>> > >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> > >> ---
>> > >
>> > > [...]
>> > >
>> > >> @@ -190,20 +192,24 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>> > >>       if (rq->rt.rt_nr_running)
>> > >>               return sg_cpu->max;
>> > >>
>> > >> -     util = sg_cpu->util_dl;
>> > >> -     util += sg_cpu->util_cfs;
>> > >> +     util = sg_cpu->util_cfs;
>> > >>       util += sg_cpu->util_rt;
>> > >>
>> > >> +     if ((util + sg_cpu->util_dl) >= sg_cpu->max)
>> > >> +             return sg_cpu->max;
>> > >> +
>> > >
>> > > Mmm, won't we run at max (or reach max) with a, say, 100ms/500ms DL task
>> > > running alone?
>> >
>> > not for a 100ms running task. You have to run more than 320ms to reach max value
>> >
>> > 100ms/500ms will vary between 0 and 907
>>
>> OK, right, my point I guess is still that such a task will run fine at
>> ~250 and it might be save more energy by doing so?
>
> As discussed on IRC, we still endup selecting 1/5 of max freq because
> util_dl is below max.
>
> So, turning point is at ~320ms/[something_bigger], which looks a pretty
> big runtime, but I'm not sure if having that is OK. Also, it becomes
> smaller with CFS/RT background "perturbations". Mmm.
>
> BTW, adding Luca and Claudio. :)

Argh... I have added few more but forgot Luca and Claudio. I'm very sorry

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-08 12:09 ` [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt Vincent Guittot
@ 2018-06-12  8:54   ` Dietmar Eggemann
  2018-06-12  9:10     ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Dietmar Eggemann @ 2018-06-12  8:54 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	quentin.perret, Ingo Molnar

On 06/08/2018 02:09 PM, Vincent Guittot wrote:

[...]

> @@ -182,21 +183,30 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
>   	sg_cpu->util_dl  = cpu_util_dl(rq);
>   	sg_cpu->bw_dl    = cpu_bw_dl(rq);
>   	sg_cpu->util_rt  = cpu_util_rt(rq);
> +	sg_cpu->util_irq = cpu_util_irq(rq);
>   }
>   
>   static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>   {
>   	struct rq *rq = cpu_rq(sg_cpu->cpu);
> -	unsigned long util;
> +	unsigned long util, max = sg_cpu->max;
>   
>   	if (rq->rt.rt_nr_running)
>   		return sg_cpu->max;
>   
> +	if (unlikely(sg_cpu->util_irq >= max))
> +		return max;
> +
> +	/* Sum rq utilization */
>   	util = sg_cpu->util_cfs;
>   	util += sg_cpu->util_rt;
>   
> -	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
> -		return sg_cpu->max;
> :confirm b9

This didn't let me apply the patch ;-) After removing this line it worked.

[...]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-12  8:54   ` Dietmar Eggemann
@ 2018-06-12  9:10     ` Vincent Guittot
  2018-06-12  9:16       ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-12  9:10 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Morten Rasmussen, viresh kumar, Valentin Schneider,
	Patrick Bellasi, Joel Fernandes, Daniel Lezcano, Quentin Perret,
	Ingo Molnar

On 12 June 2018 at 10:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 06/08/2018 02:09 PM, Vincent Guittot wrote:
>
> [...]
>
>> @@ -182,21 +183,30 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
>>         sg_cpu->util_dl  = cpu_util_dl(rq);
>>         sg_cpu->bw_dl    = cpu_bw_dl(rq);
>>         sg_cpu->util_rt  = cpu_util_rt(rq);
>> +       sg_cpu->util_irq = cpu_util_irq(rq);
>>   }
>>     static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>>   {
>>         struct rq *rq = cpu_rq(sg_cpu->cpu);
>> -       unsigned long util;
>> +       unsigned long util, max = sg_cpu->max;
>>         if (rq->rt.rt_nr_running)
>>                 return sg_cpu->max;
>>   +     if (unlikely(sg_cpu->util_irq >= max))
>> +               return max;
>> +
>> +       /* Sum rq utilization */
>>         util = sg_cpu->util_cfs;
>>         util += sg_cpu->util_rt;
>>   -     if ((util + sg_cpu->util_dl) >= sg_cpu->max)
>> -               return sg_cpu->max;
>> :confirm b9
>
>
> This didn't let me apply the patch ;-) After removing this line it worked.

Argh ... i have done something wrong
I'm going to resend it
>
> [...]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-12  9:10     ` Vincent Guittot
@ 2018-06-12  9:16       ` Vincent Guittot
  2018-06-12  9:20         ` Quentin Perret
  0 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-12  9:16 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, dietmar.eggemann, Morten.Rasmussen,
	viresh.kumar, valentin.schneider, patrick.bellasi, joel,
	daniel.lezcano, quentin.perret, luca.abeni, claudio,
	Vincent Guittot, Ingo Molnar

The time spent under interrupt can be significant but it is not reflected
in the utilization of CPU when deciding to choose an OPP. Now that we have
access to this metric, schedutil can take it into account when selecting
the OPP for a CPU.
rqs utilization don't see the time spend under interrupt context and report
their value in the normal context time window. We need to compensate this when
adding interrupt utilization

The CPU utilization is :
  irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg

A test with iperf on hikey (octo arm64) gives:
iperf -c server_address -r -t 5

w/o patch		w/ patch
Tx 276 Mbits/sec        304 Mbits/sec +10%
Rx 299 Mbits/sec        328 Mbits/sec +09%

8 iterations
stdev is lower than 1%
Only WFI idle state is enable (shallowest diel state)

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 25 +++++++++++++++++++++----
 kernel/sched/sched.h             | 13 +++++++++++++
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 25cee59..092c310 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -58,6 +58,7 @@ struct sugov_cpu {
 	unsigned long		util_dl;
 	unsigned long		bw_dl;
 	unsigned long		util_rt;
+	unsigned long		util_irq;
 	unsigned long		max;
 
 	/* The field below is for single-CPU policies only: */
@@ -182,21 +183,30 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
 	sg_cpu->util_dl  = cpu_util_dl(rq);
 	sg_cpu->bw_dl    = cpu_bw_dl(rq);
 	sg_cpu->util_rt  = cpu_util_rt(rq);
+	sg_cpu->util_irq = cpu_util_irq(rq);
 }
 
 static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
-	unsigned long util;
+	unsigned long util, max = sg_cpu->max;
 
 	if (rq->rt.rt_nr_running)
 		return sg_cpu->max;
 
+	if (unlikely(sg_cpu->util_irq >= max))
+		return max;
+
+	/* Sum rq utilization */
 	util = sg_cpu->util_cfs;
 	util += sg_cpu->util_rt;
 
-	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
-		return sg_cpu->max;
+	/*
+	 * Interrupt time is not seen by rqs utilization so we can compare
+	 * them with the CPU capacity
+	 */
+	if ((util + sg_cpu->util_dl) >= max)
+		return max;
 
 	/*
 	 * As there is still idle time on the CPU, we need to compute the
@@ -207,10 +217,17 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
 	 * periods of time.
 	 */
 
+	/* Weight rqs utilization to normal context window */
+	util *= (max - sg_cpu->util_irq);
+	util /= max;
+
+	/* Add interrupt utilization */
+	util += sg_cpu->util_irq;
+
 	/* Add DL bandwidth requirement */
 	util += sg_cpu->bw_dl;
 
-	return min(sg_cpu->max, util);
+	return min(max, util);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b534a43..873b567 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2221,4 +2221,17 @@ static inline unsigned long cpu_util_rt(struct rq *rq)
 {
 	return rq->avg_rt.util_avg;
 }
+
+#if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING)
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+	return rq->avg_irq.util_avg;
+}
+#else
+static inline unsigned long cpu_util_irq(struct rq *rq)
+{
+	return 0;
+}
+
+#endif
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-12  9:16       ` Vincent Guittot
@ 2018-06-12  9:20         ` Quentin Perret
  2018-06-12  9:26           ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Quentin Perret @ 2018-06-12  9:20 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, luca.abeni, claudio,
	Ingo Molnar

On Tuesday 12 Jun 2018 at 11:16:56 (+0200), Vincent Guittot wrote:
> The time spent under interrupt can be significant but it is not reflected
> in the utilization of CPU when deciding to choose an OPP. Now that we have
> access to this metric, schedutil can take it into account when selecting
> the OPP for a CPU.
> rqs utilization don't see the time spend under interrupt context and report
> their value in the normal context time window. We need to compensate this when
> adding interrupt utilization
> 
> The CPU utilization is :
>   irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg
> 
> A test with iperf on hikey (octo arm64) gives:
> iperf -c server_address -r -t 5
> 
> w/o patch		w/ patch
> Tx 276 Mbits/sec        304 Mbits/sec +10%
> Rx 299 Mbits/sec        328 Mbits/sec +09%
> 
> 8 iterations
> stdev is lower than 1%
> Only WFI idle state is enable (shallowest diel state)
                                            ^^^^
nit: s/diel/idle

And, out of curiosity, what happens if you leave the idle states
untouched ? Do you still see an improvement ? Or is it lost in the
noise ?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt
  2018-06-12  9:20         ` Quentin Perret
@ 2018-06-12  9:26           ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-12  9:26 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Luca Abeni, Claudio Scordino, Ingo Molnar

On 12 June 2018 at 11:20, Quentin Perret <quentin.perret@arm.com> wrote:
> On Tuesday 12 Jun 2018 at 11:16:56 (+0200), Vincent Guittot wrote:
>> The time spent under interrupt can be significant but it is not reflected
>> in the utilization of CPU when deciding to choose an OPP. Now that we have
>> access to this metric, schedutil can take it into account when selecting
>> the OPP for a CPU.
>> rqs utilization don't see the time spend under interrupt context and report
>> their value in the normal context time window. We need to compensate this when
>> adding interrupt utilization
>>
>> The CPU utilization is :
>>   irq util_avg + (1 - irq util_avg / max capacity ) * /Sum rq util_avg
>>
>> A test with iperf on hikey (octo arm64) gives:
>> iperf -c server_address -r -t 5
>>
>> w/o patch             w/ patch
>> Tx 276 Mbits/sec        304 Mbits/sec +10%
>> Rx 299 Mbits/sec        328 Mbits/sec +09%
>>
>> 8 iterations
>> stdev is lower than 1%
>> Only WFI idle state is enable (shallowest diel state)
>                                             ^^^^
> nit: s/diel/idle
>
> And, out of curiosity, what happens if you leave the idle states
> untouched ? Do you still see an improvement ? Or is it lost in the
> noise ?

the result are less stable because c-state wake up time impact
performance and cpuidle is not good to select the right idle state in
such case. Normally, an app should use qos dma latency or a driver per
device resume latency

>
> Thanks,
> Quentin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking
  2018-06-08 12:09 ` [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking Vincent Guittot
@ 2018-06-15 11:52   ` Dietmar Eggemann
  2018-06-15 12:18     ` Vincent Guittot
  2018-06-21 18:50   ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Dietmar Eggemann @ 2018-06-15 11:52 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	quentin.perret, Ingo Molnar

On 06/08/2018 02:09 PM, Vincent Guittot wrote:
> schedutil governor relies on cfs_rq's util_avg to choose the OPP when cfs
> tasks are running. When the CPU is overloaded by cfs and rt tasks, cfs tasks
> are preempted by rt tasks and in this case util_avg reflects the remaining
> capacity but not what cfs want to use. In such case, schedutil can select a
> lower OPP whereas the CPU is overloaded. In order to have a more accurate
> view of the utilization of the CPU, we track the utilization of rt tasks.
> 
> rt_rq uses rq_clock_task and cfs_rq uses cfs_rq_clock_task but they are
> the same at the root group level, so the PELT windows of the util_sum are
> aligned.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

[...]

;
> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
> index 4174582..81c0d7e 100644
> --- a/kernel/sched/pelt.c
> +++ b/kernel/sched/pelt.c
> @@ -307,3 +307,25 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
>   
>   	return 0;
>   }
> +
> +/*
> + * rt_rq:
> + *
> + *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
> + *   util_sum = cpu_scale * load_sum
> + *   runnable_load_sum = load_sum
> + *
> + */
> +
> +int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
> +{
> +	if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
> +				running,
> +				running,
> +				running)) {

The patch clearly says that this is about utilization but what happens
to load and runnable load for the rt rq part when you call
___update_load_sum() with load=[0,1] and runnable=[0,1]?

It looks like that the math would require 1024 instead of 1 for load and
runnable so that we would see a load_avg or runnable_load_avg != 0.

1594.075128: bprint: update_rt_rq_load_avg: now=1593937228087 cpu=4 running=1
1594.075129: bprint: update_rt_rq_load_avg: delta=3068 cpu=4 load=1 runnable=1 running=1 scale_freq=1024 scale_cpu=1024 periods=2
1594.075130: bprint: update_rt_rq_load_avg: load_sum=23927 +2879 runnable_load_sum=23927 +2879 util_sum=24506165 +2948096
1594.075130: bprint: update_rt_rq_load_avg: load_avg=0 runnable_load_avg=0 util_avg=513

IMHO, the patch should say whether load and runnable load are supported as well or not. 

[...]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking
  2018-06-15 11:52   ` Dietmar Eggemann
@ 2018-06-15 12:18     ` Vincent Guittot
  2018-06-15 14:55       ` Dietmar Eggemann
  0 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-15 12:18 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Morten Rasmussen, viresh kumar, Valentin Schneider,
	Patrick Bellasi, Joel Fernandes, Daniel Lezcano, Quentin Perret,
	Ingo Molnar

Hi Dietmar,

On 15 June 2018 at 13:52, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 06/08/2018 02:09 PM, Vincent Guittot wrote:
>> schedutil governor relies on cfs_rq's util_avg to choose the OPP when cfs
>> tasks are running. When the CPU is overloaded by cfs and rt tasks, cfs tasks
>> are preempted by rt tasks and in this case util_avg reflects the remaining
>> capacity but not what cfs want to use. In such case, schedutil can select a
>> lower OPP whereas the CPU is overloaded. In order to have a more accurate
>> view of the utilization of the CPU, we track the utilization of rt tasks.
>>
>> rt_rq uses rq_clock_task and cfs_rq uses cfs_rq_clock_task but they are
>> the same at the root group level, so the PELT windows of the util_sum are
>> aligned.
>>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>
> [...]
>
> ;
>> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
>> index 4174582..81c0d7e 100644
>> --- a/kernel/sched/pelt.c
>> +++ b/kernel/sched/pelt.c
>> @@ -307,3 +307,25 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
>>
>>       return 0;
>>   }
>> +
>> +/*
>> + * rt_rq:
>> + *
>> + *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
>> + *   util_sum = cpu_scale * load_sum
>> + *   runnable_load_sum = load_sum
>> + *
>> + */
>> +
>> +int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
>> +{
>> +     if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
>> +                             running,
>> +                             running,
>> +                             running)) {
>
> The patch clearly says that this is about utilization but what happens
> to load and runnable load for the rt rq part when you call
> ___update_load_sum() with load=[0,1] and runnable=[0,1]?

I would say the same than what happens for se which has
___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
cfs_rq->curr == se))

>
> It looks like that the math would require 1024 instead of 1 for load and
> runnable so that we would see a load_avg or runnable_load_avg != 0.

why does it require 1024 ? the min  weight of a task is 15 and the min
share of a sched group is 2. AFAICT, there is no requirement mainly
because we are not using them as they will not give any additional
information compare to util_avg

>
> 1594.075128: bprint: update_rt_rq_load_avg: now=1593937228087 cpu=4 running=1
> 1594.075129: bprint: update_rt_rq_load_avg: delta=3068 cpu=4 load=1 runnable=1 running=1 scale_freq=1024 scale_cpu=1024 periods=2
> 1594.075130: bprint: update_rt_rq_load_avg: load_sum=23927 +2879 runnable_load_sum=23927 +2879 util_sum=24506165 +2948096
> 1594.075130: bprint: update_rt_rq_load_avg: load_avg=0 runnable_load_avg=0 util_avg=513
>
> IMHO, the patch should say whether load and runnable load are supported as well or not.

Although it is stated that we track only utilization , i can probably
mentioned clearly that load_avg and runnable_load_avg are useless

Vincent
>
> [...]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking
  2018-06-15 12:18     ` Vincent Guittot
@ 2018-06-15 14:55       ` Dietmar Eggemann
  0 siblings, 0 replies; 56+ messages in thread
From: Dietmar Eggemann @ 2018-06-15 14:55 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Morten Rasmussen, viresh kumar, Valentin Schneider,
	Patrick Bellasi, Joel Fernandes, Daniel Lezcano, Quentin Perret,
	Ingo Molnar

On 06/15/2018 02:18 PM, Vincent Guittot wrote:
> Hi Dietmar,
> 
> On 15 June 2018 at 13:52, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 06/08/2018 02:09 PM, Vincent Guittot wrote:
>>> schedutil governor relies on cfs_rq's util_avg to choose the OPP when cfs
>>> tasks are running. When the CPU is overloaded by cfs and rt tasks, cfs tasks
>>> are preempted by rt tasks and in this case util_avg reflects the remaining
>>> capacity but not what cfs want to use. In such case, schedutil can select a
>>> lower OPP whereas the CPU is overloaded. In order to have a more accurate
>>> view of the utilization of the CPU, we track the utilization of rt tasks.
>>>
>>> rt_rq uses rq_clock_task and cfs_rq uses cfs_rq_clock_task but they are
>>> the same at the root group level, so the PELT windows of the util_sum are
>>> aligned.
>>>
>>> Cc: Ingo Molnar <mingo@redhat.com>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>
>> [...]
>>
>> ;
>>> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
>>> index 4174582..81c0d7e 100644
>>> --- a/kernel/sched/pelt.c
>>> +++ b/kernel/sched/pelt.c
>>> @@ -307,3 +307,25 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
>>>
>>>        return 0;
>>>    }
>>> +
>>> +/*
>>> + * rt_rq:
>>> + *
>>> + *   util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked
>>> + *   util_sum = cpu_scale * load_sum
>>> + *   runnable_load_sum = load_sum
>>> + *
>>> + */
>>> +
>>> +int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
>>> +{
>>> +     if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
>>> +                             running,
>>> +                             running,
>>> +                             running)) {
>>
>> The patch clearly says that this is about utilization but what happens
>> to load and runnable load for the rt rq part when you call
>> ___update_load_sum() with load=[0,1] and runnable=[0,1]?
> 
> I would say the same than what happens for se which has
> ___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
> cfs_rq->curr == se))
>

That's correct but I was referring to the results of the PELT math operations
for these cpu related entities. With 1024 for load and runnable you get the
same avg for all three of them:

295.879574: bprint: update_rt_rq_load_avg: now=295694598492 cpu=4 running=1
295.879575: bprint: update_rt_rq_load_avg: delta=4448 cpu=4 load=1024 runnable=1024 running=1 scale_freq=391 scale_cpu=1024 periods=4
295.879577: bprint: update_rt_rq_load_avg: load_sum=18398068 +1451008 runnable_load_sum=18398068 +1451008 util_sum=18398068 +1451008
295.879578: bprint: update_rt_rq_load_avg: load_avg=390 runnable_load_avg=390 util_avg=390

Which is meaningless since for load and runnable load, you would have to have
different call points.

>> It looks like that the math would require 1024 instead of 1 for load and
>> runnable so that we would see a load_avg or runnable_load_avg != 0.
> 
> why does it require 1024 ? the min  weight of a task is 15 and the min
> share of a sched group is 2. AFAICT, there is no requirement mainly
> because we are not using them as they will not give any additional
> information compare to util_avg

Agreed.

>> 1594.075128: bprint: update_rt_rq_load_avg: now=1593937228087 cpu=4 running=1
>> 1594.075129: bprint: update_rt_rq_load_avg: delta=3068 cpu=4 load=1 runnable=1 running=1 scale_freq=1024 scale_cpu=1024 periods=2
>> 1594.075130: bprint: update_rt_rq_load_avg: load_sum=23927 +2879 runnable_load_sum=23927 +2879 util_sum=24506165 +2948096
>> 1594.075130: bprint: update_rt_rq_load_avg: load_avg=0 runnable_load_avg=0 util_avg=513
>>
>> IMHO, the patch should say whether load and runnable load are supported as well or not.
> 
> Although it is stated that we track only utilization , i can probably
> mentioned clearly that load_avg and runnable_load_avg are useless

That would be helpful. Reading Peter's answer https://lkml.org/lkml/2018/6/4/757
made me wondering ...
 
[...]


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-08 12:09 ` [PATCH v6 04/11] cpufreq/schedutil: use rt " Vincent Guittot
@ 2018-06-18  9:00   ` Dietmar Eggemann
  2018-06-18 12:58     ` Vincent Guittot
  2018-06-21 18:45   ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Dietmar Eggemann @ 2018-06-18  9:00 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel
  Cc: rjw, juri.lelli, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	quentin.perret, Ingo Molnar

On 06/08/2018 02:09 PM, Vincent Guittot wrote:
> Take into account rt utilization when selecting an OPP for cfs tasks in order
> to reflect the utilization of the CPU.

The rt utilization signal is only tracked per-cpu, not per-entity. So it 
is not aware of PELT migrations (attach/detach).

IMHO, this patch deserves some explanation why the temporary 
inflation/deflation of the OPP driving utilization signal in case an 
rt-task migrates off/on (missing detach/attach for rt-signal) doesn't 
harm performance or energy consumption.

There was some talk (mainly on #sched irc) about ... (1) preempted cfs 
tasks (with reduced demand (utilization id only running) signals) using 
this remaining rt utilization of an rt task which migrated off and ... 
(2) going to max when an rt tasks runs ... but a summary of all of that 
in this patch would really help to understand.

> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>   kernel/sched/cpufreq_schedutil.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 28592b6..32f97fb 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -56,6 +56,7 @@ struct sugov_cpu {
>   	/* The fields below are only needed when sharing a policy: */
>   	unsigned long		util_cfs;
>   	unsigned long		util_dl;
> +	unsigned long		util_rt;
>   	unsigned long		max;
>   
>   	/* The field below is for single-CPU policies only: */
> @@ -178,15 +179,21 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
>   	sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
>   	sg_cpu->util_cfs = cpu_util_cfs(rq);
>   	sg_cpu->util_dl  = cpu_util_dl(rq);
> +	sg_cpu->util_rt  = cpu_util_rt(rq);
>   }
>   
>   static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>   {
>   	struct rq *rq = cpu_rq(sg_cpu->cpu);
> +	unsigned long util;
>   
>   	if (rq->rt.rt_nr_running)
>   		return sg_cpu->max;
>   
> +	util = sg_cpu->util_dl;
> +	util += sg_cpu->util_cfs;
> +	util += sg_cpu->util_rt;
> +
>   	/*
>   	 * Utilization required by DEADLINE must always be granted while, for
>   	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>   	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
>   	 * ready for such an interface. So, we only do the latter for now.
>   	 */
> -	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> +	return min(sg_cpu->max, util);
>   }
>   
>   static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-18  9:00   ` Dietmar Eggemann
@ 2018-06-18 12:58     ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-18 12:58 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Morten Rasmussen, viresh kumar, Valentin Schneider,
	Patrick Bellasi, Joel Fernandes, Daniel Lezcano, Quentin Perret,
	Ingo Molnar

On Mon, 18 Jun 2018 at 11:00, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 06/08/2018 02:09 PM, Vincent Guittot wrote:
> > Take into account rt utilization when selecting an OPP for cfs tasks in order
> > to reflect the utilization of the CPU.
>
> The rt utilization signal is only tracked per-cpu, not per-entity. So it
> is not aware of PELT migrations (attach/detach).
>
> IMHO, this patch deserves some explanation why the temporary
> inflation/deflation of the OPP driving utilization signal in case an
> rt-task migrates off/on (missing detach/attach for rt-signal) doesn't
> harm performance or energy consumption.
>
> There was some talk (mainly on #sched irc) about ... (1) preempted cfs
> tasks (with reduced demand (utilization id only running) signals) using
> this remaining rt utilization of an rt task which migrated off and ...
> (2) going to max when an rt tasks runs ... but a summary of all of that
> in this patch would really help to understand.

Ok. I will add more comments in the next version. I just wait a bit to
let time to get more feedback before sending a new release

>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >   kernel/sched/cpufreq_schedutil.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 28592b6..32f97fb 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -56,6 +56,7 @@ struct sugov_cpu {
> >       /* The fields below are only needed when sharing a policy: */
> >       unsigned long           util_cfs;
> >       unsigned long           util_dl;
> > +     unsigned long           util_rt;
> >       unsigned long           max;
> >
> >       /* The field below is for single-CPU policies only: */
> > @@ -178,15 +179,21 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >       sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
> >       sg_cpu->util_cfs = cpu_util_cfs(rq);
> >       sg_cpu->util_dl  = cpu_util_dl(rq);
> > +     sg_cpu->util_rt  = cpu_util_rt(rq);
> >   }
> >
> >   static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >   {
> >       struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +     unsigned long util;
> >
> >       if (rq->rt.rt_nr_running)
> >               return sg_cpu->max;
> >
> > +     util = sg_cpu->util_dl;
> > +     util += sg_cpu->util_cfs;
> > +     util += sg_cpu->util_rt;
> > +
> >       /*
> >        * Utilization required by DEADLINE must always be granted while, for
> >        * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> > @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >        * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >        * ready for such an interface. So, we only do the latter for now.
> >        */
> > -     return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +     return min(sg_cpu->max, util);
> >   }
> >
> >   static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags)
> >
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 02/11] sched/pelt: remove blank line
  2018-06-08 12:09 ` [PATCH v6 02/11] sched/pelt: remove blank line Vincent Guittot
@ 2018-06-21 14:33   ` Peter Zijlstra
  2018-06-21 18:42     ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-21 14:33 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On Fri, Jun 08, 2018 at 02:09:45PM +0200, Vincent Guittot wrote:
> diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
> index e6ecbb2..4174582 100644
> --- a/kernel/sched/pelt.c
> +++ b/kernel/sched/pelt.c
> @@ -287,7 +287,6 @@ int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_e
>  
>  	if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
>  				cfs_rq->curr == se)) {
> -
>  		___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
>  		cfs_se_util_change(&se->avg);
>  		return 1;
> @@ -302,7 +301,6 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
>  				scale_load_down(cfs_rq->load.weight),
>  				scale_load_down(cfs_rq->runnable_weight),
>  				cfs_rq->curr != NULL)) {
> -
>  		___update_load_avg(&cfs_rq->avg, 1, 1);
>  		return 1;
>  	}

So I put them there on purpose, I find it easier to read when a
multi-line if statement and the body are separated. Makes it clearer
where the if ends and the block begins.

I mean, all whitespace in C is superfluous, and yet we keep adding it to
these files :-)

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 02/11] sched/pelt: remove blank line
  2018-06-21 14:33   ` Peter Zijlstra
@ 2018-06-21 18:42     ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-21 18:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Rafael J. Wysocki, Juri Lelli,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On Thu, 21 Jun 2018 at 16:33, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 08, 2018 at 02:09:45PM +0200, Vincent Guittot wrote:
> > diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
> > index e6ecbb2..4174582 100644
> > --- a/kernel/sched/pelt.c
> > +++ b/kernel/sched/pelt.c
> > @@ -287,7 +287,6 @@ int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_e
> >
> >       if (___update_load_sum(now, cpu, &se->avg, !!se->on_rq, !!se->on_rq,
> >                               cfs_rq->curr == se)) {
> > -
> >               ___update_load_avg(&se->avg, se_weight(se), se_runnable(se));
> >               cfs_se_util_change(&se->avg);
> >               return 1;
> > @@ -302,7 +301,6 @@ int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq)
> >                               scale_load_down(cfs_rq->load.weight),
> >                               scale_load_down(cfs_rq->runnable_weight),
> >                               cfs_rq->curr != NULL)) {
> > -
> >               ___update_load_avg(&cfs_rq->avg, 1, 1);
> >               return 1;
> >       }
>
> So I put them there on purpose, I find it easier to read when a
> multi-line if statement and the body are separated. Makes it clearer
> where the if ends and the block begins.
>
> I mean, all whitespace in C is superfluous, and yet we keep adding it to
> these files :-)

I'm fine with keeping it as well as for newly added ones; Patrick
raised the point on a similar empty line  for the newly added
update_rt_rq_load_avg() function on previous revision and just wanted
to keep them all aligned

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-08 12:09 ` [PATCH v6 04/11] cpufreq/schedutil: use rt " Vincent Guittot
  2018-06-18  9:00   ` Dietmar Eggemann
@ 2018-06-21 18:45   ` Peter Zijlstra
  2018-06-21 18:57     ` Peter Zijlstra
                       ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-21 18:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On Fri, Jun 08, 2018 at 02:09:47PM +0200, Vincent Guittot wrote:
>  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>  {
>  	struct rq *rq = cpu_rq(sg_cpu->cpu);
> +	unsigned long util;
>  
>  	if (rq->rt.rt_nr_running)
>  		return sg_cpu->max;
>  
> +	util = sg_cpu->util_dl;
> +	util += sg_cpu->util_cfs;
> +	util += sg_cpu->util_rt;
> +
>  	/*
>  	 * Utilization required by DEADLINE must always be granted while, for
>  	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>  	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
>  	 * ready for such an interface. So, we only do the latter for now.
>  	 */
> -	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> +	return min(sg_cpu->max, util);
>  }

So this (and the dl etc. equivalents) result in exactly the problems
complained about last time, no?

What I proposed was something along the lines of:

	util = 1024 * sg_cpu->util_cfs;
	util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));

	return min(sg_cpu->max, util + sg_cpu->bw_dl);

Where we, instead of directly adding the various util signals.

I now see an email from Quentin asking if these things are not in fact
the same, but no, they are not. The difference is that the above only
affects the CFS signal and will re-normalize the utilization of an
'always' running task back to 1 by compensating for the stolen capacity.

But it will not, like these here patches, affect the OPP selection of
other classes. If there is no CFS utilization (or very little), then the
renormalization will not matter, and the existing DL bandwidth
compuation will be unaffected.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking
  2018-06-08 12:09 ` [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking Vincent Guittot
  2018-06-15 11:52   ` Dietmar Eggemann
@ 2018-06-21 18:50   ` Peter Zijlstra
  1 sibling, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-21 18:50 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On Fri, Jun 08, 2018 at 02:09:46PM +0200, Vincent Guittot wrote:
> +int update_rt_rq_load_avg(u64 now, struct rq *rq, int running)
> +{
> +	if (___update_load_sum(now, rq->cpu, &rq->avg_rt,
> +				running,
> +				running,
> +				running)) {

For code like this I wish C had grown named arguments for calls, just
like it has named initializers.

Something like:

	___update_load_sum(now, rq->cpu, &rq->avg_rt,
			   .load = running, .runnable = running,
			   .running = running)

would be so much easier to read... a well, maybe in another 30 years or
so.

> +		___update_load_avg(&rq->avg_rt, 1, 1);
> +		return 1;
> +	}
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-21 18:45   ` Peter Zijlstra
@ 2018-06-21 18:57     ` Peter Zijlstra
  2018-06-22  8:10       ` Vincent Guittot
  2018-06-22  7:58     ` Juri Lelli
  2018-06-22  7:58     ` Quentin Perret
  2 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-21 18:57 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On Thu, Jun 21, 2018 at 08:45:24PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 08, 2018 at 02:09:47PM +0200, Vincent Guittot wrote:
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> >  	struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +	unsigned long util;
> >  
> >  	if (rq->rt.rt_nr_running)
> >  		return sg_cpu->max;
> >  
> > +	util = sg_cpu->util_dl;
> > +	util += sg_cpu->util_cfs;
> > +	util += sg_cpu->util_rt;
> > +
> >  	/*
> >  	 * Utilization required by DEADLINE must always be granted while, for
> >  	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> > @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  	 * ready for such an interface. So, we only do the latter for now.
> >  	 */
> > -	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +	return min(sg_cpu->max, util);
> >  }
> 
> So this (and the dl etc. equivalents) result in exactly the problems
> complained about last time, no?
> 
> What I proposed was something along the lines of:
> 
> 	util = 1024 * sg_cpu->util_cfs;
> 	util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> 
> 	return min(sg_cpu->max, util + sg_cpu->bw_dl);
> 
> Where we, instead of directly adding the various util signals.

That looks unfinished; I think that wants to include: "we renormalize
the CFS signal".

> I now see an email from Quentin asking if these things are not in fact
> the same, but no, they are not. The difference is that the above only
> affects the CFS signal and will re-normalize the utilization of an
> 'always' running task back to 1 by compensating for the stolen capacity.
> 
> But it will not, like these here patches, affect the OPP selection of
> other classes. If there is no CFS utilization (or very little), then the
> renormalization will not matter, and the existing DL bandwidth
> compuation will be unaffected.
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-21 18:45   ` Peter Zijlstra
  2018-06-21 18:57     ` Peter Zijlstra
@ 2018-06-22  7:58     ` Juri Lelli
  2018-06-22  7:58     ` Quentin Perret
  2 siblings, 0 replies; 56+ messages in thread
From: Juri Lelli @ 2018-06-22  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On 21/06/18 20:45, Peter Zijlstra wrote:
> On Fri, Jun 08, 2018 at 02:09:47PM +0200, Vincent Guittot wrote:
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> >  	struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +	unsigned long util;
> >  
> >  	if (rq->rt.rt_nr_running)
> >  		return sg_cpu->max;
> >  
> > +	util = sg_cpu->util_dl;
> > +	util += sg_cpu->util_cfs;
> > +	util += sg_cpu->util_rt;
> > +
> >  	/*
> >  	 * Utilization required by DEADLINE must always be granted while, for
> >  	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> > @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  	 * ready for such an interface. So, we only do the latter for now.
> >  	 */
> > -	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +	return min(sg_cpu->max, util);
> >  }
> 
> So this (and the dl etc. equivalents) result in exactly the problems
> complained about last time, no?
> 
> What I proposed was something along the lines of:
> 
> 	util = 1024 * sg_cpu->util_cfs;
> 	util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> 
> 	return min(sg_cpu->max, util + sg_cpu->bw_dl);
> 
> Where we, instead of directly adding the various util signals.
> 
> I now see an email from Quentin asking if these things are not in fact
> the same, but no, they are not. The difference is that the above only
> affects the CFS signal and will re-normalize the utilization of an
> 'always' running task back to 1 by compensating for the stolen capacity.
> 
> But it will not, like these here patches, affect the OPP selection of
> other classes. If there is no CFS utilization (or very little), then the
> renormalization will not matter, and the existing DL bandwidth
> compuation will be unaffected.

IIUC, even with very little CFS utilization, the final OPP selection
will still be "inflated" w.r.t. bw_dl in case util_dl is big (like for a
DL task with a big period and not so big runtime). But I guess that's
OK, since we agreed that such DL tasks should be the exception anyway.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-21 18:45   ` Peter Zijlstra
  2018-06-21 18:57     ` Peter Zijlstra
  2018-06-22  7:58     ` Juri Lelli
@ 2018-06-22  7:58     ` Quentin Perret
  2018-06-22 11:37       ` Peter Zijlstra
  2 siblings, 1 reply; 56+ messages in thread
From: Quentin Perret @ 2018-06-22  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

Hi Peter,

On Thursday 21 Jun 2018 at 20:45:24 (+0200), Peter Zijlstra wrote:
> On Fri, Jun 08, 2018 at 02:09:47PM +0200, Vincent Guittot wrote:
> >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  {
> >  	struct rq *rq = cpu_rq(sg_cpu->cpu);
> > +	unsigned long util;
> >  
> >  	if (rq->rt.rt_nr_running)
> >  		return sg_cpu->max;
> >  
> > +	util = sg_cpu->util_dl;
> > +	util += sg_cpu->util_cfs;
> > +	util += sg_cpu->util_rt;
> > +
> >  	/*
> >  	 * Utilization required by DEADLINE must always be granted while, for
> >  	 * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> > @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> >  	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> >  	 * ready for such an interface. So, we only do the latter for now.
> >  	 */
> > -	return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > +	return min(sg_cpu->max, util);
> >  }
> 
> So this (and the dl etc. equivalents) result in exactly the problems
> complained about last time, no?
> 
> What I proposed was something along the lines of:
> 
> 	util = 1024 * sg_cpu->util_cfs;
> 	util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> 
> 	return min(sg_cpu->max, util + sg_cpu->bw_dl);
> 
> Where we, instead of directly adding the various util signals.
> 
> I now see an email from Quentin asking if these things are not in fact
> the same, but no, they are not. The difference is that the above only
> affects the CFS signal and will re-normalize the utilization of an
> 'always' running task back to 1 by compensating for the stolen capacity.
> 
> But it will not, like these here patches, affect the OPP selection of
> other classes. If there is no CFS utilization (or very little), then the
> renormalization will not matter, and the existing DL bandwidth
> compuation will be unaffected.

Right, thinking more carefully about this re-scaling, the two things are
indeed not the same, but I'm still not sure if this is what we want.

Say we have 50% of the capacity stolen by RT, and a 25% CFS task
running. If we re-scale, we'll end up with a 50% request for CFS
(util==512 for your code above). But if we want to see a little bit
of idle time in the system, we should really request an OPP for 75%+ of
capacity no ? Or am I missing something ?

And also, I think Juri had concerns when we use the util_dl (as a PELT
signal) for OPP selection since that kills the benefit of DL for long
running DL tasks. Or can we assume that DL tasks with very long
runtime/periods are a corner case we can ignore ?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-21 18:57     ` Peter Zijlstra
@ 2018-06-22  8:10       ` Vincent Guittot
  2018-06-22 11:41         ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22  8:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Rafael J. Wysocki, Juri Lelli,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On Thu, 21 Jun 2018 at 20:57, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 21, 2018 at 08:45:24PM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 08, 2018 at 02:09:47PM +0200, Vincent Guittot wrote:
> > >  static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> > >  {
> > >     struct rq *rq = cpu_rq(sg_cpu->cpu);
> > > +   unsigned long util;
> > >
> > >     if (rq->rt.rt_nr_running)
> > >             return sg_cpu->max;
> > >
> > > +   util = sg_cpu->util_dl;
> > > +   util += sg_cpu->util_cfs;
> > > +   util += sg_cpu->util_rt;
> > > +
> > >     /*
> > >      * Utilization required by DEADLINE must always be granted while, for
> > >      * FAIR, we use blocked utilization of IDLE CPUs as a mechanism to
> > > @@ -197,7 +204,7 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> > >      * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> > >      * ready for such an interface. So, we only do the latter for now.
> > >      */
> > > -   return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs));
> > > +   return min(sg_cpu->max, util);
> > >  }
> >
> > So this (and the dl etc. equivalents) result in exactly the problems
> > complained about last time, no?
> >
> > What I proposed was something along the lines of:
> >
> >       util = 1024 * sg_cpu->util_cfs;
> >       util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> >
> >       return min(sg_cpu->max, util + sg_cpu->bw_dl);

I see that you use sg_cpu->util_dl and sg_cpu->bw_dl in your equation
above but this patch 04 only adds rt util_avg and the dl util_avg has
not been added yet.
 dl util_avg is added in patch 6
So for this patch, we are only using sg_cpu->bw_dl

> >
> > Where we, instead of directly adding the various util signals.
>
> That looks unfinished; I think that wants to include: "we renormalize
> the CFS signal".
>
> > I now see an email from Quentin asking if these things are not in fact
> > the same, but no, they are not. The difference is that the above only
> > affects the CFS signal and will re-normalize the utilization of an
> > 'always' running task back to 1 by compensating for the stolen capacity.
> >
> > But it will not, like these here patches, affect the OPP selection of
> > other classes. If there is no CFS utilization (or very little), then the
> > renormalization will not matter, and the existing DL bandwidth
> > compuation will be unaffected.
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22  7:58     ` Quentin Perret
@ 2018-06-22 11:37       ` Peter Zijlstra
  2018-06-22 11:44         ` Peter Zijlstra
                           ` (3 more replies)
  0 siblings, 4 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 11:37 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Fri, Jun 22, 2018 at 08:58:53AM +0100, Quentin Perret wrote:
> Say we have 50% of the capacity stolen by RT, and a 25% CFS task
> running. If we re-scale, we'll end up with a 50% request for CFS
> (util==512 for your code above). But if we want to see a little bit
> of idle time in the system, we should really request an OPP for 75%+ of
> capacity no ? Or am I missing something ?

That is true.. So we could limit the scaling to the case where there is
no idle time, something like:

	util = sg_cpu->util_cfs;

	cap_cfs = (1024 - (sg_cpu->util_rt + ...));
	if (util == cap_cfs)
		util = sg_cpu->max;

That specifically handles the '0% idle -> 100% freq' case, but I don't
realy like edge behaviour like that. If for some reason it all doesn't
quite align you're left with bits.

And the linear scaling is the next simplest thing that avoids the hard
boundary case.

I suppose we can make it more complicated, something like:

             u           u
  f := u + (--- - u) * (---)^n
            1-r         1-r

Where: u := cfs util
       r := \Sum !cfs util
       f := frequency request

That would still satisfy all criteria I think:

  r = 0      -> f := u
  u = (1-r)  -> f := 1

and in particular:

  u << (1-r) -> f ~= u

which casuses less inflation than the linear thing where there is idle
time.

In your specific example that ends up with:

             .25           .25
  f = .25 + (--- - .25) * (---)^n = .25 + .0625 (for n=2)
             .5            .5     = .25 + .125  (for n=1)

But is that needed complexity?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22  8:10       ` Vincent Guittot
@ 2018-06-22 11:41         ` Peter Zijlstra
  2018-06-22 12:14           ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 11:41 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Ingo Molnar, linux-kernel, Rafael J. Wysocki, Juri Lelli,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On Fri, Jun 22, 2018 at 10:10:32AM +0200, Vincent Guittot wrote:
> On Thu, 21 Jun 2018 at 20:57, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > So this (and the dl etc. equivalents) result in exactly the problems
> > > complained about last time, no?
> > >
> > > What I proposed was something along the lines of:
> > >
> > >       util = 1024 * sg_cpu->util_cfs;
> > >       util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> > >
> > >       return min(sg_cpu->max, util + sg_cpu->bw_dl);
> 
> I see that you use sg_cpu->util_dl and sg_cpu->bw_dl in your equation
> above but this patch 04 only adds rt util_avg and the dl util_avg has
> not been added yet.
>  dl util_avg is added in patch 6
> So for this patch, we are only using sg_cpu->bw_dl

Yeah, not the point really.

It is about how we're going to use the (rt,dl,irq etc..) util values,
more than which particular one was introduced here.

I'm just not a big fan of the whole: freq := cfs_util + rt_util thing
(as would be obvious by now). 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 11:37       ` Peter Zijlstra
@ 2018-06-22 11:44         ` Peter Zijlstra
  2018-06-22 12:23         ` Vincent Guittot
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 11:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Fri, Jun 22, 2018 at 01:37:13PM +0200, Peter Zijlstra wrote:
> I suppose we can make it more complicated, something like:
> 
>              u           u
>   f := u + (--- - u) * (---)^n
>             1-r         1-r
> 
> Where: u := cfs util
>        r := \Sum !cfs util
>        f := frequency request
> 
> That would still satisfy all criteria I think:
> 
>   r = 0      -> f := u
>   u = (1-r)  -> f := 1
> 
> and in particular:
> 
>   u << (1-r) -> f ~= u
> 
> which casuses less inflation than the linear thing where there is idle
> time.

Note that for n=0 this last property is lost and we have the initial
linear case back.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 11:41         ` Peter Zijlstra
@ 2018-06-22 12:14           ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 12:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Rafael J. Wysocki, Juri Lelli,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On Fri, 22 Jun 2018 at 13:41, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 22, 2018 at 10:10:32AM +0200, Vincent Guittot wrote:
> > On Thu, 21 Jun 2018 at 20:57, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > > So this (and the dl etc. equivalents) result in exactly the problems
> > > > complained about last time, no?
> > > >
> > > > What I proposed was something along the lines of:
> > > >
> > > >       util = 1024 * sg_cpu->util_cfs;
> > > >       util /= (1024 - (sg_cpu->util_rt + sg_cpu->util_dl + ...));
> > > >
> > > >       return min(sg_cpu->max, util + sg_cpu->bw_dl);
> >
> > I see that you use sg_cpu->util_dl and sg_cpu->bw_dl in your equation
> > above but this patch 04 only adds rt util_avg and the dl util_avg has
> > not been added yet.
> >  dl util_avg is added in patch 6
> > So for this patch, we are only using sg_cpu->bw_dl
>
> Yeah, not the point really.
>
> It is about how we're going to use the (rt,dl,irq etc..) util values,
> more than which particular one was introduced here.

ok

>
> I'm just not a big fan of the whole: freq := cfs_util + rt_util thing
> (as would be obvious by now).

so I'm not sure to catch what you don't like with the sum ? Is it the
special case for dl and how we switch between dl_bw and dl util_avg
which can generate a drop in frequency
Because the increase is linear regarding rt and cfs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 11:37       ` Peter Zijlstra
  2018-06-22 11:44         ` Peter Zijlstra
@ 2018-06-22 12:23         ` Vincent Guittot
  2018-06-22 13:26           ` Peter Zijlstra
  2018-06-22 12:54         ` Quentin Perret
  2018-06-22 15:22         ` Peter Zijlstra
  3 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 12:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 13:37, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 22, 2018 at 08:58:53AM +0100, Quentin Perret wrote:
> > Say we have 50% of the capacity stolen by RT, and a 25% CFS task
> > running. If we re-scale, we'll end up with a 50% request for CFS
> > (util==512 for your code above). But if we want to see a little bit
> > of idle time in the system, we should really request an OPP for 75%+ of
> > capacity no ? Or am I missing something ?
>
> That is true.. So we could limit the scaling to the case where there is
> no idle time, something like:
>
>         util = sg_cpu->util_cfs;
>
>         cap_cfs = (1024 - (sg_cpu->util_rt + ...));
>         if (util == cap_cfs)
>                 util = sg_cpu->max;
>
> That specifically handles the '0% idle -> 100% freq' case, but I don't
> realy like edge behaviour like that. If for some reason it all doesn't
> quite align you're left with bits.
>
> And the linear scaling is the next simplest thing that avoids the hard
> boundary case.
>
> I suppose we can make it more complicated, something like:
>
>              u           u
>   f := u + (--- - u) * (---)^n
>             1-r         1-r
>
> Where: u := cfs util
>        r := \Sum !cfs util
>        f := frequency request
>
> That would still satisfy all criteria I think:
>
>   r = 0      -> f := u
>   u = (1-r)  -> f := 1
>
> and in particular:
>
>   u << (1-r) -> f ~= u
>
> which casuses less inflation than the linear thing where there is idle
> time.
>
> In your specific example that ends up with:
>
>              .25           .25
>   f = .25 + (--- - .25) * (---)^n = .25 + .0625 (for n=2)
>              .5            .5     = .25 + .125  (for n=1)
>
> But is that needed complexity?

And we are not yet at the right value for quentin's example as we need
something around 0.75 for is example
The non linearity only comes from dl so if we want to use the equation
above, u should be (cfs + rt) and r = dl
But this also means that we will start to inflate the utilization to
get higher OPP even if there is idle time and lost the interest of
using dl bw

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 11:37       ` Peter Zijlstra
  2018-06-22 11:44         ` Peter Zijlstra
  2018-06-22 12:23         ` Vincent Guittot
@ 2018-06-22 12:54         ` Quentin Perret
  2018-06-22 13:29           ` Peter Zijlstra
  2018-06-22 15:22         ` Peter Zijlstra
  3 siblings, 1 reply; 56+ messages in thread
From: Quentin Perret @ 2018-06-22 12:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Friday 22 Jun 2018 at 13:37:13 (+0200), Peter Zijlstra wrote:
> That is true.. So we could limit the scaling to the case where there is
> no idle time, something like:
> 
> 	util = sg_cpu->util_cfs;
> 
> 	cap_cfs = (1024 - (sg_cpu->util_rt + ...));
> 	if (util == cap_cfs)
> 		util = sg_cpu->max;
> 
> That specifically handles the '0% idle -> 100% freq' case, but I don't
> realy like edge behaviour like that. If for some reason it all doesn't
> quite align you're left with bits.
> 
> And the linear scaling is the next simplest thing that avoids the hard
> boundary case.

Right, so maybe we'll get something smoother by just summing the signals
as Vincent is proposing ? You will still request max freq for the
(util == cap_cfs) case you described. By definition, you will have
(util_cfs + util_rt + ...) == 1024 in this case.

cap_cfs is the delta between RT+DL+... and 1024, and the only case where
util_cfs can be equal to cap_cfs is if util_cfs fills that delta
entirely.

I hope that makes sense

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 12:23         ` Vincent Guittot
@ 2018-06-22 13:26           ` Peter Zijlstra
  2018-06-22 13:52             ` Peter Zijlstra
  2018-06-22 13:54             ` Vincent Guittot
  0 siblings, 2 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 13:26 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, Jun 22, 2018 at 02:23:22PM +0200, Vincent Guittot wrote:
> On Fri, 22 Jun 2018 at 13:37, Peter Zijlstra <peterz@infradead.org> wrote:
> > I suppose we can make it more complicated, something like:
> >
> >              u           u
> >   f := u + (--- - u) * (---)^n
> >             1-r         1-r
> >
> > Where: u := cfs util
> >        r := \Sum !cfs util
> >        f := frequency request
> >
> > That would still satisfy all criteria I think:
> >
> >   r = 0      -> f := u
> >   u = (1-r)  -> f := 1
> >
> > and in particular:
> >
> >   u << (1-r) -> f ~= u
> >
> > which casuses less inflation than the linear thing where there is idle
> > time.

> And we are not yet at the right value for quentin's example as we need
> something around 0.75 for is example

$ bc -l
define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }
f(.2,.7,0)
.66666666666666666666
f(.2,.7,2)
.40740740740740740739
f(.2,.7,4)
.29218106995884773661

So at 10% idle time, we've only inflated what should be 20% to 40%, that
is entirely reasonable I think. The linear case gave us 66%.  But feel
free to increase @n if you feel that helps, 4 is only one mult more than
2 and gets us down to 29%.

> The non linearity only comes from dl so if we want to use the equation
> above, u should be (cfs + rt) and r = dl

Right until we allow RT to run at anything other than f=1. Once we allow
rt util capping, either through Patrick's thing or CBS servers or
whatever, we get:

  f = min(1, f_rt + f_dl + f_cfs)

And then u_rt does want to be part of r. And while we do run RT at f=1,
it doesn't matter either way around I think.

> But this also means that we will start to inflate the utilization to
> get higher OPP even if there is idle time and lost the interest of
> using dl bw

You get _some_ inflation, but only if there is actual cfs utilization to
begin with.

And that is my objection to that straight sum thing; there the dl util
distorts the computed dl bandwidth thing even if there is no cfs
utilization.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 12:54         ` Quentin Perret
@ 2018-06-22 13:29           ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 13:29 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Fri, Jun 22, 2018 at 01:54:34PM +0100, Quentin Perret wrote:
> On Friday 22 Jun 2018 at 13:37:13 (+0200), Peter Zijlstra wrote:
> > That is true.. So we could limit the scaling to the case where there is
> > no idle time, something like:
> > 
> > 	util = sg_cpu->util_cfs;
> > 
> > 	cap_cfs = (1024 - (sg_cpu->util_rt + ...));
> > 	if (util == cap_cfs)
> > 		util = sg_cpu->max;
> > 
> > That specifically handles the '0% idle -> 100% freq' case, but I don't
> > realy like edge behaviour like that. If for some reason it all doesn't
> > quite align you're left with bits.
> > 
> > And the linear scaling is the next simplest thing that avoids the hard
> > boundary case.
> 
> Right, so maybe we'll get something smoother by just summing the signals
> as Vincent is proposing ? 

Sure, but see my previous mail just now, that has the problem of
u_{rt,dl} distoting f_{rt,dl} even when there is no u_cfs.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:26           ` Peter Zijlstra
@ 2018-06-22 13:52             ` Peter Zijlstra
  2018-06-22 13:54             ` Vincent Guittot
  1 sibling, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 13:52 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, Jun 22, 2018 at 03:26:24PM +0200, Peter Zijlstra wrote:
> > But this also means that we will start to inflate the utilization to
> > get higher OPP even if there is idle time and lost the interest of
> > using dl bw
> 
> You get _some_ inflation, but only if there is actual cfs utilization to
> begin with.

Note that at high idle time, the inflation really is minimal:

f(.2,.2,2)
.20312500000000000000


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:26           ` Peter Zijlstra
  2018-06-22 13:52             ` Peter Zijlstra
@ 2018-06-22 13:54             ` Vincent Guittot
  2018-06-22 13:57               ` Vincent Guittot
                                 ` (2 more replies)
  1 sibling, 3 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 13:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 22, 2018 at 02:23:22PM +0200, Vincent Guittot wrote:
> > On Fri, 22 Jun 2018 at 13:37, Peter Zijlstra <peterz@infradead.org> wrote:
> > > I suppose we can make it more complicated, something like:
> > >
> > >              u           u
> > >   f := u + (--- - u) * (---)^n
> > >             1-r         1-r
> > >
> > > Where: u := cfs util
> > >        r := \Sum !cfs util
> > >        f := frequency request
> > >
> > > That would still satisfy all criteria I think:
> > >
> > >   r = 0      -> f := u
> > >   u = (1-r)  -> f := 1
> > >
> > > and in particular:
> > >
> > >   u << (1-r) -> f ~= u
> > >
> > > which casuses less inflation than the linear thing where there is idle
> > > time.
>
> > And we are not yet at the right value for quentin's example as we need
> > something around 0.75 for is example
>
> $ bc -l
> define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }
> f(.2,.7,0)
> .66666666666666666666
> f(.2,.7,2)
> .40740740740740740739
> f(.2,.7,4)
> .29218106995884773661
>
> So at 10% idle time, we've only inflated what should be 20% to 40%, that
> is entirely reasonable I think. The linear case gave us 66%.  But feel
> free to increase @n if you feel that helps, 4 is only one mult more than
> 2 and gets us down to 29%.

I'm a bit lost with your example.
u = 0.2 (for cfs) and r=0.7 (let say for rt) in your example and idle is 0.1

For rt task, we run 0.7 of the time at f=1 then we will select f=0.4
for run cfs task with u=0.2 but u is the utilization at f=1 which
means that it will take 250% of normal time to execute at f=0.4 which
means 0.5  time instead of 0.2 at f=1 so we are going out of time. In
order to have enough time to run r and u we must run at least  f=0.666
for cfs = 0.2/(1-0.7). If rt task doesn't run at f=1 we would have to
run at f=0.9

>
> > The non linearity only comes from dl so if we want to use the equation
> > above, u should be (cfs + rt) and r = dl
>
> Right until we allow RT to run at anything other than f=1. Once we allow
> rt util capping, either through Patrick's thing or CBS servers or
> whatever, we get:
>
>   f = min(1, f_rt + f_dl + f_cfs)
>
> And then u_rt does want to be part of r. And while we do run RT at f=1,
> it doesn't matter either way around I think.
>
> > But this also means that we will start to inflate the utilization to
> > get higher OPP even if there is idle time and lost the interest of
> > using dl bw
>
> You get _some_ inflation, but only if there is actual cfs utilization to
> begin with.
>
> And that is my objection to that straight sum thing; there the dl util
> distorts the computed dl bandwidth thing even if there is no cfs
> utilization.

hmm,


>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:54             ` Vincent Guittot
@ 2018-06-22 13:57               ` Vincent Guittot
  2018-06-22 14:46                 ` Peter Zijlstra
  2018-06-22 14:11               ` Peter Zijlstra
  2018-06-22 14:12               ` Vincent Guittot
  2 siblings, 1 reply; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 15:54, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Jun 22, 2018 at 02:23:22PM +0200, Vincent Guittot wrote:
> > > On Fri, 22 Jun 2018 at 13:37, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > I suppose we can make it more complicated, something like:
> > > >
> > > >              u           u
> > > >   f := u + (--- - u) * (---)^n
> > > >             1-r         1-r
> > > >
> > > > Where: u := cfs util
> > > >        r := \Sum !cfs util
> > > >        f := frequency request
> > > >
> > > > That would still satisfy all criteria I think:
> > > >
> > > >   r = 0      -> f := u
> > > >   u = (1-r)  -> f := 1
> > > >
> > > > and in particular:
> > > >
> > > >   u << (1-r) -> f ~= u
> > > >
> > > > which casuses less inflation than the linear thing where there is idle
> > > > time.
> >
> > > And we are not yet at the right value for quentin's example as we need
> > > something around 0.75 for is example
> >
> > $ bc -l
> > define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }
> > f(.2,.7,0)
> > .66666666666666666666
> > f(.2,.7,2)
> > .40740740740740740739
> > f(.2,.7,4)
> > .29218106995884773661
> >
> > So at 10% idle time, we've only inflated what should be 20% to 40%, that
> > is entirely reasonable I think. The linear case gave us 66%.  But feel
> > free to increase @n if you feel that helps, 4 is only one mult more than
> > 2 and gets us down to 29%.
>
> I'm a bit lost with your example.
> u = 0.2 (for cfs) and r=0.7 (let say for rt) in your example and idle is 0.1
>
> For rt task, we run 0.7 of the time at f=1 then we will select f=0.4
> for run cfs task with u=0.2 but u is the utilization at f=1 which
> means that it will take 250% of normal time to execute at f=0.4 which
> means 0.5  time instead of 0.2 at f=1 so we are going out of time. In
> order to have enough time to run r and u we must run at least  f=0.666
> for cfs = 0.2/(1-0.7). If rt task doesn't run at f=1 we would have to
> run at f=0.9
>
> >
> > > The non linearity only comes from dl so if we want to use the equation
> > > above, u should be (cfs + rt) and r = dl
> >
> > Right until we allow RT to run at anything other than f=1. Once we allow
> > rt util capping, either through Patrick's thing or CBS servers or
> > whatever, we get:
> >
> >   f = min(1, f_rt + f_dl + f_cfs)
> >
> > And then u_rt does want to be part of r. And while we do run RT at f=1,
> > it doesn't matter either way around I think.
> >
> > > But this also means that we will start to inflate the utilization to
> > > get higher OPP even if there is idle time and lost the interest of
> > > using dl bw
> >
> > You get _some_ inflation, but only if there is actual cfs utilization to
> > begin with.
> >
> > And that is my objection to that straight sum thing; there the dl util
> > distorts the computed dl bandwidth thing even if there is no cfs
> > utilization.
>
> hmm,

forgot to finish this sentence

hmm, dl util_avg is only used to detect is there is idle time so if
cfs util is nul we will not distort the  dl bw (for a use case where
there is no rt task running)

>
>
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:54             ` Vincent Guittot
  2018-06-22 13:57               ` Vincent Guittot
@ 2018-06-22 14:11               ` Peter Zijlstra
  2018-06-22 14:48                 ` Peter Zijlstra
  2018-06-22 14:12               ` Vincent Guittot
  2 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 14:11 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, Jun 22, 2018 at 03:54:24PM +0200, Vincent Guittot wrote:
> On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:

> > $ bc -l
> > define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }
> > f(.2,.7,0)
> > .66666666666666666666
> > f(.2,.7,2)
> > .40740740740740740739
> > f(.2,.7,4)
> > .29218106995884773661
> >
> > So at 10% idle time, we've only inflated what should be 20% to 40%, that
> > is entirely reasonable I think. The linear case gave us 66%.  But feel
> > free to increase @n if you feel that helps, 4 is only one mult more than
> > 2 and gets us down to 29%.
> 
> I'm a bit lost with your example.
> u = 0.2 (for cfs) and r=0.7 (let say for rt) in your example and idle is 0.1
> 
> For rt task, we run 0.7 of the time at f=1 then we will select f=0.4
> for run cfs task with u=0.2 but u is the utilization at f=1 which
> means that it will take 250% of normal time to execute at f=0.4 which
> means 0.5  time instead of 0.2 at f=1 so we are going out of time. In
> order to have enough time to run r and u we must run at least  f=0.666
> for cfs = 0.2/(1-0.7). 

Argh.. that is n=0. So clearly I went off the rails somewhere.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:54             ` Vincent Guittot
  2018-06-22 13:57               ` Vincent Guittot
  2018-06-22 14:11               ` Peter Zijlstra
@ 2018-06-22 14:12               ` Vincent Guittot
  2 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 15:54, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Jun 22, 2018 at 02:23:22PM +0200, Vincent Guittot wrote:
> > > On Fri, 22 Jun 2018 at 13:37, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > I suppose we can make it more complicated, something like:
> > > >
> > > >              u           u
> > > >   f := u + (--- - u) * (---)^n
> > > >             1-r         1-r
> > > >
> > > > Where: u := cfs util
> > > >        r := \Sum !cfs util
> > > >        f := frequency request
> > > >
> > > > That would still satisfy all criteria I think:
> > > >
> > > >   r = 0      -> f := u
> > > >   u = (1-r)  -> f := 1
> > > >
> > > > and in particular:
> > > >
> > > >   u << (1-r) -> f ~= u
> > > >
> > > > which casuses less inflation than the linear thing where there is idle
> > > > time.
> >
> > > And we are not yet at the right value for quentin's example as we need
> > > something around 0.75 for is example
> >
> > $ bc -l
> > define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }
> > f(.2,.7,0)
> > .66666666666666666666
> > f(.2,.7,2)
> > .40740740740740740739
> > f(.2,.7,4)
> > .29218106995884773661
> >
> > So at 10% idle time, we've only inflated what should be 20% to 40%, that
> > is entirely reasonable I think. The linear case gave us 66%.  But feel
> > free to increase @n if you feel that helps, 4 is only one mult more than
> > 2 and gets us down to 29%.
>
> I'm a bit lost with your example.
> u = 0.2 (for cfs) and r=0.7 (let say for rt) in your example and idle is 0.1
>
> For rt task, we run 0.7 of the time at f=1 then we will select f=0.4
> for run cfs task with u=0.2 but u is the utilization at f=1 which
> means that it will take 250% of normal time to execute at f=0.4 which
> means 0.5  time instead of 0.2 at f=1 so we are going out of time. In
> order to have enough time to run r and u we must run at least  f=0.666
> for cfs = 0.2/(1-0.7). If rt task doesn't run at f=1 we would have to
> run at f=0.9

The current proposal keeps thing simple and doesn't take into account
the fact that rt runs at max freq which gives more margin when cfs is
running.
If we want to take into account the fact that rt task run at max freq
when computing frequency for dl and cfs we should use f = (cfs util +
dl bw) /(1 - rt util). Then this doesn't take into account the fact
that f=maw as soon as rt is runnable which means that dl can run at
max part of its time.

>
> >
> > > The non linearity only comes from dl so if we want to use the equation
> > > above, u should be (cfs + rt) and r = dl
> >
> > Right until we allow RT to run at anything other than f=1. Once we allow
> > rt util capping, either through Patrick's thing or CBS servers or
> > whatever, we get:
> >
> >   f = min(1, f_rt + f_dl + f_cfs)
> >
> > And then u_rt does want to be part of r. And while we do run RT at f=1,
> > it doesn't matter either way around I think.
> >
> > > But this also means that we will start to inflate the utilization to
> > > get higher OPP even if there is idle time and lost the interest of
> > > using dl bw
> >
> > You get _some_ inflation, but only if there is actual cfs utilization to
> > begin with.
> >
> > And that is my objection to that straight sum thing; there the dl util
> > distorts the computed dl bandwidth thing even if there is no cfs
> > utilization.
>
> hmm,
>
>
> >

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 13:57               ` Vincent Guittot
@ 2018-06-22 14:46                 ` Peter Zijlstra
  2018-06-22 14:49                   ` Vincent Guittot
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 14:46 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, Jun 22, 2018 at 03:57:24PM +0200, Vincent Guittot wrote:
> On Fri, 22 Jun 2018 at 15:54, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> > On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:

> > > And that is my objection to that straight sum thing; there the dl util
> > > distorts the computed dl bandwidth thing even if there is no cfs
> > > utilization.
> >
> > hmm,
> 
> forgot to finish this sentence
> 
> hmm, dl util_avg is only used to detect is there is idle time so if
> cfs util is nul we will not distort the  dl bw (for a use case where
> there is no rt task running)

Hurm.. let me apply those patches and read the result, because I think I
missed something.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 14:11               ` Peter Zijlstra
@ 2018-06-22 14:48                 ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 14:48 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, Jun 22, 2018 at 04:11:59PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 22, 2018 at 03:54:24PM +0200, Vincent Guittot wrote:
> > On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > define f (u,r,n) { return u + ((u/(1-r)) - u) * (u/(1-r))^n; }

> > I'm a bit lost with your example.
> > u = 0.2 (for cfs) and r=0.7 (let say for rt) in your example and idle is 0.1
> > 
> > For rt task, we run 0.7 of the time at f=1 then we will select f=0.4
> > for run cfs task with u=0.2 but u is the utilization at f=1 which
> > means that it will take 250% of normal time to execute at f=0.4 which
> > means 0.5  time instead of 0.2 at f=1 so we are going out of time. In
> > order to have enough time to run r and u we must run at least  f=0.666
> > for cfs = 0.2/(1-0.7). 
> 
> Argh.. that is n=0. So clearly I went off the rails somewhere.

Aah, I think the number I've been computing is a 'corrected' u. Not an
f. It made sure that 0 idle got u=1, but no more.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 14:46                 ` Peter Zijlstra
@ 2018-06-22 14:49                   ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 14:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 16:46, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 22, 2018 at 03:57:24PM +0200, Vincent Guittot wrote:
> > On Fri, 22 Jun 2018 at 15:54, Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > > On Fri, 22 Jun 2018 at 15:26, Peter Zijlstra <peterz@infradead.org> wrote:
>
> > > > And that is my objection to that straight sum thing; there the dl util
> > > > distorts the computed dl bandwidth thing even if there is no cfs
> > > > utilization.
> > >
> > > hmm,
> >
> > forgot to finish this sentence
> >
> > hmm, dl util_avg is only used to detect is there is idle time so if
> > cfs util is nul we will not distort the  dl bw (for a use case where
> > there is no rt task running)
>
> Hurm.. let me apply those patches and read the result, because I think I
> missed something.

ok.

the patchset is available here if it can help :
https://git.linaro.org/people/vincent.guittot/kernel.git   branch:
sched-rt-utilization

>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 11:37       ` Peter Zijlstra
                           ` (2 preceding siblings ...)
  2018-06-22 12:54         ` Quentin Perret
@ 2018-06-22 15:22         ` Peter Zijlstra
  2018-06-22 15:30           ` Quentin Perret
  2018-06-22 17:24           ` Vincent Guittot
  3 siblings, 2 replies; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 15:22 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Fri, Jun 22, 2018 at 01:37:13PM +0200, Peter Zijlstra wrote:
> That is true.. So we could limit the scaling to the case where there is
> no idle time, something like:
> 
> 	util = sg_cpu->util_cfs;
> 
> 	cap_cfs = (1024 - (sg_cpu->util_rt + ...));
> 	if (util == cap_cfs)
> 		util = sg_cpu->max;
> 

OK, it appears this is more or less what the patches do. And I think
there's a small risk/hole with this where util ~= cap_cfs but very close
due to some unaccounted time.

FWIW, when looking, I saw no reason why sugov_get_util() and
sugov_aggregate_util() were in fact separate functions.

--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -53,11 +53,7 @@ struct sugov_cpu {
 	unsigned int		iowait_boost_max;
 	u64			last_update;
 
-	/* The fields below are only needed when sharing a policy: */
-	unsigned long		util_cfs;
 	unsigned long		util_dl;
-	unsigned long		bw_dl;
-	unsigned long		util_rt;
 	unsigned long		max;
 
 	/* The field below is for single-CPU policies only: */
@@ -181,44 +177,38 @@ static unsigned int get_next_freq(struct
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-static void sugov_get_util(struct sugov_cpu *sg_cpu)
+static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
+	unsigned long util, max;
 
-	sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
-	sg_cpu->util_cfs = cpu_util_cfs(rq);
-	sg_cpu->util_dl  = cpu_util_dl(rq);
-	sg_cpu->bw_dl    = cpu_bw_dl(rq);
-	sg_cpu->util_rt  = cpu_util_rt(rq);
-}
-
-static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
-{
-	struct rq *rq = cpu_rq(sg_cpu->cpu);
-	unsigned long util;
+	sg_cpu->max = max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
+	sg_cpu->util_dl   = cpu_util_dl(rq);
 
 	if (rq->rt.rt_nr_running)
-		return sg_cpu->max;
+		return max;
 
-	util = sg_cpu->util_cfs;
-	util += sg_cpu->util_rt;
+	util  = cpu_util_cfs(rq);
+	util += cpu_util_rt(rq);
 
-	if ((util + sg_cpu->util_dl) >= sg_cpu->max)
-		return sg_cpu->max;
+	/*
+	 * If there is no idle time, we should run at max frequency.
+	 */
+	if ((util + cpu_util_dl(rq)) >= max)
+		return max;
 
 	/*
-	 * As there is still idle time on the CPU, we need to compute the
-	 * utilization level of the CPU.
 	 * Bandwidth required by DEADLINE must always be granted while, for
 	 * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
 	 * to gracefully reduce the frequency when no tasks show up for longer
 	 * periods of time.
+	 *
+	 * Ideally we would like to set bw_dl as min/guaranteed freq and bw_dl
+	 * + util as requested freq. However, cpufreq is not yet ready for such
+	 * an interface. So, we only do the latter for now.
 	 */
 
-	/* Add DL bandwidth requirement */
-	util += sg_cpu->bw_dl;
-
-	return min(sg_cpu->max, util);
+	return min(max, cpu_bw_dl(rq) + util);
 }
 
 /**
@@ -396,9 +386,8 @@ static void sugov_update_single(struct u
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	sugov_get_util(sg_cpu);
+	util = sugov_get_util(sg_cpu);
 	max = sg_cpu->max;
-	util = sugov_aggregate_util(sg_cpu);
 	sugov_iowait_apply(sg_cpu, time, &util, &max);
 	next_f = get_next_freq(sg_policy, util, max);
 	/*
@@ -437,9 +426,8 @@ static unsigned int sugov_next_freq_shar
 		struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
 		unsigned long j_util, j_max;
 
-		sugov_get_util(j_sg_cpu);
+		j_util = sugov_get_util(j_sg_cpu);
 		j_max = j_sg_cpu->max;
-		j_util = sugov_aggregate_util(j_sg_cpu);
 		sugov_iowait_apply(j_sg_cpu, time, &j_util, &j_max);
 
 		if (j_util * max > j_max * util) {


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-08 12:09 ` [PATCH v6 06/11] cpufreq/schedutil: use dl " Vincent Guittot
  2018-06-08 12:39   ` Juri Lelli
@ 2018-06-22 15:24   ` Peter Zijlstra
  2018-06-22 17:22     ` Vincent Guittot
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2018-06-22 15:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, rjw, juri.lelli, dietmar.eggemann,
	Morten.Rasmussen, viresh.kumar, valentin.schneider,
	patrick.bellasi, joel, daniel.lezcano, quentin.perret,
	Ingo Molnar

On Fri, Jun 08, 2018 at 02:09:49PM +0200, Vincent Guittot wrote:
> -	 * Ideally we would like to set util_dl as min/guaranteed freq and
> -	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> -	 * ready for such an interface. So, we only do the latter for now.

Please don't delete that comment. It is not less relevant.

> -static inline unsigned long cpu_util_dl(struct rq *rq)
> +static inline unsigned long cpu_bw_dl(struct rq *rq)

I think you forgot to fix-up ignore_dl_rate_limit().

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 15:22         ` Peter Zijlstra
@ 2018-06-22 15:30           ` Quentin Perret
  2018-06-22 17:24           ` Vincent Guittot
  1 sibling, 0 replies; 56+ messages in thread
From: Quentin Perret @ 2018-06-22 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, mingo, linux-kernel, rjw, juri.lelli,
	dietmar.eggemann, Morten.Rasmussen, viresh.kumar,
	valentin.schneider, patrick.bellasi, joel, daniel.lezcano,
	Ingo Molnar

On Friday 22 Jun 2018 at 17:22:58 (+0200), Peter Zijlstra wrote:
> On Fri, Jun 22, 2018 at 01:37:13PM +0200, Peter Zijlstra wrote:
> > That is true.. So we could limit the scaling to the case where there is
> > no idle time, something like:
> > 
> > 	util = sg_cpu->util_cfs;
> > 
> > 	cap_cfs = (1024 - (sg_cpu->util_rt + ...));
> > 	if (util == cap_cfs)
> > 		util = sg_cpu->max;
> > 
> 
> OK, it appears this is more or less what the patches do. And I think
> there's a small risk/hole with this where util ~= cap_cfs but very close
> due to some unaccounted time.

So Vincent suggested at some point to add a margin to avoid that issue
IIRC. FWIW, this is what the overutilized flag of EAS does. It basically
says, if there isn't enough idle time in the system (cfs_util is too close
to cap_cfs), don't bother looking at the util signals because they'll be
kinda wrong.

So what about something like, go to max freq if overutilized ? Or
something similar on a per cpufreq policy basis ?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/11] cpufreq/schedutil: use dl utilization tracking
  2018-06-22 15:24   ` Peter Zijlstra
@ 2018-06-22 17:22     ` Vincent Guittot
  0 siblings, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Rafael J. Wysocki, Juri Lelli,
	Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Quentin Perret, Ingo Molnar

On Fri, 22 Jun 2018 at 17:24, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 08, 2018 at 02:09:49PM +0200, Vincent Guittot wrote:
> > -      * Ideally we would like to set util_dl as min/guaranteed freq and
> > -      * util_cfs + util_dl as requested freq. However, cpufreq is not yet
> > -      * ready for such an interface. So, we only do the latter for now.
>
> Please don't delete that comment. It is not less relevant.

ok i will keep it in next version

>
> > -static inline unsigned long cpu_util_dl(struct rq *rq)
> > +static inline unsigned long cpu_bw_dl(struct rq *rq)
>
> I think you forgot to fix-up ignore_dl_rate_limit().

yes you're right

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/11] cpufreq/schedutil: use rt utilization tracking
  2018-06-22 15:22         ` Peter Zijlstra
  2018-06-22 15:30           ` Quentin Perret
@ 2018-06-22 17:24           ` Vincent Guittot
  1 sibling, 0 replies; 56+ messages in thread
From: Vincent Guittot @ 2018-06-22 17:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Quentin Perret, Ingo Molnar, linux-kernel, Rafael J. Wysocki,
	Juri Lelli, Dietmar Eggemann, Morten Rasmussen, viresh kumar,
	Valentin Schneider, Patrick Bellasi, Joel Fernandes,
	Daniel Lezcano, Ingo Molnar

On Fri, 22 Jun 2018 at 17:23, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 22, 2018 at 01:37:13PM +0200, Peter Zijlstra wrote:
> > That is true.. So we could limit the scaling to the case where there is
> > no idle time, something like:
> >
> >       util = sg_cpu->util_cfs;
> >
> >       cap_cfs = (1024 - (sg_cpu->util_rt + ...));
> >       if (util == cap_cfs)
> >               util = sg_cpu->max;
> >
>
> OK, it appears this is more or less what the patches do. And I think
> there's a small risk/hole with this where util ~= cap_cfs but very close
> due to some unaccounted time.
>
> FWIW, when looking, I saw no reason why sugov_get_util() and
> sugov_aggregate_util() were in fact separate functions.

good point

>
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -53,11 +53,7 @@ struct sugov_cpu {
>         unsigned int            iowait_boost_max;
>         u64                     last_update;
>
> -       /* The fields below are only needed when sharing a policy: */
> -       unsigned long           util_cfs;
>         unsigned long           util_dl;
> -       unsigned long           bw_dl;
> -       unsigned long           util_rt;
>         unsigned long           max;
>
>         /* The field below is for single-CPU policies only: */
> @@ -181,44 +177,38 @@ static unsigned int get_next_freq(struct
>         return cpufreq_driver_resolve_freq(policy, freq);
>  }
>
> -static void sugov_get_util(struct sugov_cpu *sg_cpu)
> +static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
>         struct rq *rq = cpu_rq(sg_cpu->cpu);
> +       unsigned long util, max;
>
> -       sg_cpu->max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
> -       sg_cpu->util_cfs = cpu_util_cfs(rq);
> -       sg_cpu->util_dl  = cpu_util_dl(rq);
> -       sg_cpu->bw_dl    = cpu_bw_dl(rq);
> -       sg_cpu->util_rt  = cpu_util_rt(rq);
> -}
> -
> -static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> -{
> -       struct rq *rq = cpu_rq(sg_cpu->cpu);
> -       unsigned long util;
> +       sg_cpu->max = max = arch_scale_cpu_capacity(NULL, sg_cpu->cpu);
> +       sg_cpu->util_dl   = cpu_util_dl(rq);
>
>         if (rq->rt.rt_nr_running)
> -               return sg_cpu->max;
> +               return max;
>
> -       util = sg_cpu->util_cfs;
> -       util += sg_cpu->util_rt;
> +       util  = cpu_util_cfs(rq);
> +       util += cpu_util_rt(rq);
>
> -       if ((util + sg_cpu->util_dl) >= sg_cpu->max)
> -               return sg_cpu->max;
> +       /*
> +        * If there is no idle time, we should run at max frequency.
> +        */
> +       if ((util + cpu_util_dl(rq)) >= max)
> +               return max;
>
>         /*
> -        * As there is still idle time on the CPU, we need to compute the
> -        * utilization level of the CPU.
>          * Bandwidth required by DEADLINE must always be granted while, for
>          * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
>          * to gracefully reduce the frequency when no tasks show up for longer
>          * periods of time.
> +        *
> +        * Ideally we would like to set bw_dl as min/guaranteed freq and bw_dl
> +        * + util as requested freq. However, cpufreq is not yet ready for such
> +        * an interface. So, we only do the latter for now.
>          */
>
> -       /* Add DL bandwidth requirement */
> -       util += sg_cpu->bw_dl;
> -
> -       return min(sg_cpu->max, util);
> +       return min(max, cpu_bw_dl(rq) + util);
>  }
>
>  /**
> @@ -396,9 +386,8 @@ static void sugov_update_single(struct u
>
>         busy = sugov_cpu_is_busy(sg_cpu);
>
> -       sugov_get_util(sg_cpu);
> +       util = sugov_get_util(sg_cpu);
>         max = sg_cpu->max;
> -       util = sugov_aggregate_util(sg_cpu);
>         sugov_iowait_apply(sg_cpu, time, &util, &max);
>         next_f = get_next_freq(sg_policy, util, max);
>         /*
> @@ -437,9 +426,8 @@ static unsigned int sugov_next_freq_shar
>                 struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j);
>                 unsigned long j_util, j_max;
>
> -               sugov_get_util(j_sg_cpu);
> +               j_util = sugov_get_util(j_sg_cpu);
>                 j_max = j_sg_cpu->max;
> -               j_util = sugov_aggregate_util(j_sg_cpu);
>                 sugov_iowait_apply(j_sg_cpu, time, &j_util, &j_max);
>
>                 if (j_util * max > j_max * util) {
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-06-22 17:25 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 12:09 [PATCH v6 00/11] track CPU utilization Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 01/11] sched/pelt: Move pelt related code in a dedicated file Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 02/11] sched/pelt: remove blank line Vincent Guittot
2018-06-21 14:33   ` Peter Zijlstra
2018-06-21 18:42     ` Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 03/11] sched/rt: add rt_rq utilization tracking Vincent Guittot
2018-06-15 11:52   ` Dietmar Eggemann
2018-06-15 12:18     ` Vincent Guittot
2018-06-15 14:55       ` Dietmar Eggemann
2018-06-21 18:50   ` Peter Zijlstra
2018-06-08 12:09 ` [PATCH v6 04/11] cpufreq/schedutil: use rt " Vincent Guittot
2018-06-18  9:00   ` Dietmar Eggemann
2018-06-18 12:58     ` Vincent Guittot
2018-06-21 18:45   ` Peter Zijlstra
2018-06-21 18:57     ` Peter Zijlstra
2018-06-22  8:10       ` Vincent Guittot
2018-06-22 11:41         ` Peter Zijlstra
2018-06-22 12:14           ` Vincent Guittot
2018-06-22  7:58     ` Juri Lelli
2018-06-22  7:58     ` Quentin Perret
2018-06-22 11:37       ` Peter Zijlstra
2018-06-22 11:44         ` Peter Zijlstra
2018-06-22 12:23         ` Vincent Guittot
2018-06-22 13:26           ` Peter Zijlstra
2018-06-22 13:52             ` Peter Zijlstra
2018-06-22 13:54             ` Vincent Guittot
2018-06-22 13:57               ` Vincent Guittot
2018-06-22 14:46                 ` Peter Zijlstra
2018-06-22 14:49                   ` Vincent Guittot
2018-06-22 14:11               ` Peter Zijlstra
2018-06-22 14:48                 ` Peter Zijlstra
2018-06-22 14:12               ` Vincent Guittot
2018-06-22 12:54         ` Quentin Perret
2018-06-22 13:29           ` Peter Zijlstra
2018-06-22 15:22         ` Peter Zijlstra
2018-06-22 15:30           ` Quentin Perret
2018-06-22 17:24           ` Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 05/11] sched/dl: add dl_rq " Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 06/11] cpufreq/schedutil: use dl " Vincent Guittot
2018-06-08 12:39   ` Juri Lelli
2018-06-08 12:48     ` Vincent Guittot
2018-06-08 12:54       ` Juri Lelli
2018-06-08 13:36         ` Juri Lelli
2018-06-08 13:38           ` Vincent Guittot
2018-06-22 15:24   ` Peter Zijlstra
2018-06-22 17:22     ` Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 07/11] sched/irq: add irq " Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 08/11] cpufreq/schedutil: take into account interrupt Vincent Guittot
2018-06-12  8:54   ` Dietmar Eggemann
2018-06-12  9:10     ` Vincent Guittot
2018-06-12  9:16       ` Vincent Guittot
2018-06-12  9:20         ` Quentin Perret
2018-06-12  9:26           ` Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 09/11] sched: use pelt for scale_rt_capacity() Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 10/11] sched: remove rt_avg code Vincent Guittot
2018-06-08 12:09 ` [PATCH v6 11/11] proc/sched: remove unused sched_time_avg_ms Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).