linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
@ 2017-07-05  8:58 Juri Lelli
  2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
                   ` (9 more replies)
  0 siblings, 10 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:58 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli

Hi,

v1 of the RFC set implementing frequency/cpu invariance and OPP selection for
SCHED_DEADLINE [1]. The set is based on tip/sched/core as of today
(72298e5c92c5), which now already includes Luca's "CPU reclaiming for
SCHED_DEADLINE".

Thanks a lot for reviewing RFCv0!

Patches high level description:

 o [01-02]/08 add the necessary links to start accounting DEADLINE contribution
              to OPP selection 
 o 03/08      it's a temporary solution to make possible (on ARM) to change
              frequency for DEADLINE tasks (that would possibly delay the SCHED_FIFO
              worker kthread); proper solution would be to be able to issue frequency
              transition from an atomic ctx
 o [04-05]/08 it's a schedutil change that copes with the fact that DEADLINE
              doesn't require periodic OPP selection triggering point
 o [06-07]/08 make arch_scale_{freq,cpu}_capacity() function available on !CONFIG_SMP
              configurations too
 o 08/08      implements frequency/cpu invariance for tasks' reservation
              parameters; which basically means that we implement GRUB-PA [2]

Changes w.r.t. RFCv0:

 - rebase on tip/sched/core
 - make use of BW_SHIFT for calculations (Peter)
 - added a comment about guaranteed/requested frequency (Peter)
 - use a high bit for sched_flags SCHED_FLAG_SPECIAL hack; don't expose it to
   userspace and add comments (Peter)
 - refactor aggregation of utilization from scheduling classes

Please have a look. Feedback and comments are, as usual, more than welcome.

In case you would like to test this out:

 git://linux-arm.org/linux-jl.git upstream/deadline/freq-rfc-v1

Best,

- Juri


Juri Lelli (8):
  sched/cpufreq_schedutil: make use of DEADLINE utilization signal
  sched/deadline: move cpu frequency selection triggering points
  sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  sched/cpufreq_schedutil: split utilization signals
  sched/cpufreq_schedutil: always consider all CPUs when deciding next
    freq
  sched/sched.h: remove sd arch_scale_freq_capacity parameter
  sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
  sched/deadline: make bandwidth enforcement scale-invariant

 include/linux/sched.h            |  1 +
 include/linux/sched/cpufreq.h    |  2 --
 include/linux/sched/topology.h   | 12 +++----
 kernel/sched/core.c              | 15 ++++++--
 kernel/sched/cpufreq_schedutil.c | 77 ++++++++++++++++++++++++++--------------
 kernel/sched/deadline.c          | 46 ++++++++++++++++++++----
 kernel/sched/fair.c              |  4 +--
 kernel/sched/sched.h             | 51 +++++++++++++++++++-------
 8 files changed, 149 insertions(+), 59 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
@ 2017-07-05  8:58 ` Juri Lelli
  2017-07-07  7:20   ` Viresh Kumar
  2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:58 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

SCHED_DEADLINE tracks active utilization signal with a per dl_rq
variable named running_bw.

Make use of that to drive cpu frequency selection: add up FAIR and
DEADLINE contribution to get the required CPU capacity to handle both
requirements (while RT still selects max frequency).

Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
---
Changes from RFCv0:

 - use BW_SHIFT (Peter)
 - add comment about guaranteed and requested freq (Peter)
 - modify comment about go to max behaviour (Claudio)
---
 include/linux/sched/cpufreq.h    |  2 --
 kernel/sched/cpufreq_schedutil.c | 25 +++++++++++++++----------
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d2be2ccbb372..39640bb3a8ee 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -11,8 +11,6 @@
 #define SCHED_CPUFREQ_DL	(1U << 1)
 #define SCHED_CPUFREQ_IOWAIT	(1U << 2)
 
-#define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
-
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
        void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 076a2e31951c..f2494d1fc8ef 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -157,12 +157,17 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 static void sugov_get_util(unsigned long *util, unsigned long *max)
 {
 	struct rq *rq = this_rq();
-	unsigned long cfs_max;
+	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
+				>> BW_SHIFT;
 
-	cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
 
-	*util = min(rq->cfs.avg.util_avg, cfs_max);
-	*max = cfs_max;
+	/*
+	 * Ideally we would like to set util_dl as min/guaranteed freq and
+	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
+	 * ready for such an interface. So, we only do the latter for now.
+	 */
+	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -226,7 +231,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	if (flags & SCHED_CPUFREQ_RT_DL) {
+	if (flags & SCHED_CPUFREQ_RT) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
 		sugov_get_util(&util, &max);
@@ -266,7 +271,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 			j_sg_cpu->iowait_boost = 0;
 			continue;
 		}
-		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
 		j_util = j_sg_cpu->util;
@@ -302,7 +307,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT_DL)
+		if (flags & SCHED_CPUFREQ_RT)
 			next_f = sg_policy->policy->cpuinfo.max_freq;
 		else
 			next_f = sugov_next_freq_shared(sg_cpu, time);
@@ -332,9 +337,9 @@ static void sugov_irq_work(struct irq_work *irq_work)
 	sg_policy = container_of(irq_work, struct sugov_policy, irq_work);
 
 	/*
-	 * For RT and deadline tasks, the schedutil governor shoots the
-	 * frequency to maximum. Special care must be taken to ensure that this
-	 * kthread doesn't result in the same behavior.
+	 * For RT tasks, the schedutil governor shoots the frequency to maximum.
+	 * Special care must be taken to ensure that this kthread doesn't result
+	 * in the same behavior.
 	 *
 	 * This is (mostly) guaranteed by the work_in_progress flag. The flag is
 	 * updated only at the end of the sugov_work() function and before that
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
  2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
@ 2017-07-05  8:58 ` Juri Lelli
  2017-07-07  7:21   ` Viresh Kumar
  2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:58 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
fraction of CPU bandwidth to tasks admitted to the system), there is no
point in evaluating frequency changes during each tick event.

Move frequency selection triggering points to where running_bw changes.

Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
---
Changes from RFCv0:

 - modify comment regarding periodic RT updates (Claudio)
---
 kernel/sched/deadline.c |  7 ++++---
 kernel/sched/sched.h    | 12 ++++++------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a84299f44b5d..6912f7f35f9b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -85,6 +85,8 @@ void add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 	dl_rq->running_bw += dl_bw;
 	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
 	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
 }
 
 static inline
@@ -97,6 +99,8 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 	SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
 }
 
 static inline
@@ -1135,9 +1139,6 @@ static void update_curr_dl(struct rq *rq)
 		return;
 	}
 
-	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_DL);
-
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3086d1..d8798bb54ace 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2057,14 +2057,14 @@ DECLARE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
  * The way cpufreq is currently arranged requires it to evaluate the CPU
  * performance state (frequency/voltage) on a regular basis to prevent it from
  * being stuck in a completely inadequate performance level for too long.
- * That is not guaranteed to happen if the updates are only triggered from CFS,
- * though, because they may not be coming in if RT or deadline tasks are active
- * all the time (or there are RT and DL tasks only).
+ * That is not guaranteed to happen if the updates are only triggered from CFS
+ * and DL, though, because they may not be coming in if only RT tasks are
+ * active all the time (or there are RT tasks only).
  *
- * As a workaround for that issue, this function is called by the RT and DL
- * sched classes to trigger extra cpufreq updates to prevent it from stalling,
+ * As a workaround for that issue, this function is called periodically by the
+ * RT sched class to trigger extra cpufreq updates to prevent it from stalling,
  * but that really is a band-aid.  Going forward it should be replaced with
- * solutions targeted more specifically at RT and DL tasks.
+ * solutions targeted more specifically at RT tasks.
  */
 static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
 {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
  2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
  2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-07  3:56   ` Joel Fernandes
                     ` (2 more replies)
  2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
                   ` (6 subsequent siblings)
  9 siblings, 3 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Worker kthread needs to be able to change frequency for all other
threads.

Make it special, just under STOP class.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
Changes from RFCv0:

 - use a high bit of sched_flags and don't expose the new flag to
   userspace (also add derisory comments) (Peter)
---
 include/linux/sched.h            |  1 +
 kernel/sched/core.c              | 15 +++++++++++++--
 kernel/sched/cpufreq_schedutil.c | 15 ++++++++++++---
 kernel/sched/deadline.c          | 13 +++++++++++++
 kernel/sched/sched.h             | 20 +++++++++++++++++++-
 5 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1f0f427e0292..0ba767fdedae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1359,6 +1359,7 @@ extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern int sched_setattr(struct task_struct *, const struct sched_attr *);
+extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
 extern struct task_struct *idle_task(int cpu);
 
 /**
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5186797908dc..0e40c7ff2bf4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3998,7 +3998,9 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	if (attr->sched_flags &
-		~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+		~(SCHED_FLAG_RESET_ON_FORK |
+		  SCHED_FLAG_RECLAIM |
+		  SCHED_FLAG_SPECIAL))
 		return -EINVAL;
 
 	/*
@@ -4065,6 +4067,9 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	if (user) {
+		if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+			return -EPERM;
+
 		retval = security_task_setscheduler(p);
 		if (retval)
 			return retval;
@@ -4120,7 +4125,8 @@ static int __sched_setscheduler(struct task_struct *p,
 		}
 #endif
 #ifdef CONFIG_SMP
-		if (dl_bandwidth_enabled() && dl_policy(policy)) {
+		if (dl_bandwidth_enabled() && dl_policy(policy) &&
+				!(attr->sched_flags & SCHED_FLAG_SPECIAL)) {
 			cpumask_t *span = rq->rd->span;
 
 			/*
@@ -4250,6 +4256,11 @@ int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
 }
 EXPORT_SYMBOL_GPL(sched_setattr);
 
+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
+{
+	return __sched_setscheduler(p, attr, false, true);
+}
+
 /**
  * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
  * @p: the task in question.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index f2494d1fc8ef..ba6227625f24 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -424,7 +424,16 @@ static void sugov_policy_free(struct sugov_policy *sg_policy)
 static int sugov_kthread_create(struct sugov_policy *sg_policy)
 {
 	struct task_struct *thread;
-	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+	struct sched_attr attr = {
+		.size = sizeof(struct sched_attr),
+		.sched_policy = SCHED_DEADLINE,
+		.sched_flags = SCHED_FLAG_SPECIAL,
+		.sched_nice = 0,
+		.sched_priority = 0,
+		.sched_runtime = 0,
+		.sched_deadline = 0,
+		.sched_period = 0,
+	};
 	struct cpufreq_policy *policy = sg_policy->policy;
 	int ret;
 
@@ -442,10 +451,10 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
 		return PTR_ERR(thread);
 	}
 
-	ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
+	ret = sched_setattr_nocheck(thread, &attr);
 	if (ret) {
 		kthread_stop(thread);
-		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
 		return ret;
 	}
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6912f7f35f9b..4ec82fa56b04 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -220,6 +220,9 @@ static void task_non_contending(struct task_struct *p)
 	if (dl_se->dl_runtime == 0)
 		return;
 
+	if (unlikely(dl_entity_is_special(dl_se)))
+		return;
+
 	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
 	WARN_ON(dl_se->dl_non_contending);
 
@@ -1150,6 +1153,9 @@ static void update_curr_dl(struct rq *rq)
 
 	sched_rt_avg_update(rq, delta_exec);
 
+	if (unlikely(dl_entity_is_special(dl_se)))
+		return;
+
 	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
 		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
 	dl_se->runtime -= delta_exec;
@@ -2447,6 +2453,9 @@ int sched_dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;
 
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return 0;
+
 	/* !deadline task may carry old deadline bandwidth */
 	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
 		return 0;
@@ -2533,6 +2542,10 @@ void __getparam_dl(struct task_struct *p, struct sched_attr *attr)
  */
 bool __checkparam_dl(const struct sched_attr *attr)
 {
+	/* special dl tasks don't actually use any parameter */
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return true;
+
 	/* deadline != 0 */
 	if (attr->sched_deadline == 0)
 		return false;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d8798bb54ace..1be5dac728a4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -156,12 +156,30 @@ static inline int task_has_dl_policy(struct task_struct *p)
 }
 
 /*
+ * !! For sched_setattr_nocheck() (kernel) only !!
+ *
+ * This is actually gross. :(
+ *
+ * It is used to make schedutil kworker(s) higher priority than SCHED_DEADLINE
+ * tasks, but still be able to sleep. We need this on platforms that cannot
+ * atomically change clock frequency. Remove once fast switching will be
+ * available on such platforms.
+ */
+#define SCHED_FLAG_SPECIAL	0x10000000
+
+static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
+{
+	return dl_se->flags & SCHED_FLAG_SPECIAL;
+}
+
+/*
  * Tells if entity @a should preempt entity @b.
  */
 static inline bool
 dl_entity_preempt(struct sched_dl_entity *a, struct sched_dl_entity *b)
 {
-	return dl_time_before(a->deadline, b->deadline);
+	return dl_entity_is_special(a) ||
+	       dl_time_before(a->deadline, b->deadline);
 }
 
 /*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (2 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-07  3:26   ` Joel Fernandes
                     ` (2 more replies)
  2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
                   ` (5 subsequent siblings)
  9 siblings, 3 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

To be able to treat utilization signals of different scheduling classes
in different ways (e.g., CFS signal might be stale while DEADLINE signal
is never stale by design) we need to split sugov_cpu::util signal in two:
util_cfs and util_dl.

This patch does that by also changing sugov_get_util() parameter list.
After this change, aggregation of the different signals has to be performed
by sugov_get_util() users (so that they can decide what to do with the
different signals).

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
Changes from RFCv0:

 - refactor aggregation of utilization in sugov_aggregate_util()
---
 kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index ba6227625f24..e835fa886225 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -58,7 +58,8 @@ struct sugov_cpu {
 	u64 last_update;
 
 	/* The fields below are only needed when sharing a policy. */
-	unsigned long util;
+	unsigned long util_cfs;
+	unsigned long util_dl;
 	unsigned long max;
 	unsigned int flags;
 
@@ -154,20 +155,24 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-static void sugov_get_util(unsigned long *util, unsigned long *max)
+static void sugov_get_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = this_rq();
-	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
-				>> BW_SHIFT;
 
-	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
+	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
+			  >> BW_SHIFT;
+}
 
+static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
+{
 	/*
 	 * Ideally we would like to set util_dl as min/guaranteed freq and
 	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 	 * ready for such an interface. So, we only do the latter for now.
 	 */
-	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
+	return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -234,7 +239,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (flags & SCHED_CPUFREQ_RT) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
-		sugov_get_util(&util, &max);
+		sugov_get_util(sg_cpu);
+		max = sg_cpu->max;
+		util = sugov_aggregate_util(sg_cpu);
 		sugov_iowait_boost(sg_cpu, &util, &max);
 		next_f = get_next_freq(sg_policy, util, max);
 		/*
@@ -274,8 +281,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
-		j_util = j_sg_cpu->util;
 		j_max = j_sg_cpu->max;
+		j_util = sugov_aggregate_util(j_sg_cpu);
 		if (j_util * max > j_max * util) {
 			util = j_util;
 			max = j_max;
@@ -292,15 +299,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
-	unsigned long util, max;
 	unsigned int next_f;
 
-	sugov_get_util(&util, &max);
 
 	raw_spin_lock(&sg_policy->update_lock);
 
-	sg_cpu->util = util;
-	sg_cpu->max = max;
+	sugov_get_util(sg_cpu);
 	sg_cpu->flags = flags;
 
 	sugov_set_iowait_boost(sg_cpu, time, flags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (3 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-07  8:59   ` Viresh Kumar
  2017-07-11 16:17   ` Peter Zijlstra
  2017-07-05  8:59 ` [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

No assumption can be made upon the rate at which frequency updates get
triggered, as there are scheduling policies (like SCHED_DEADLINE) which
don't trigger them so frequently.

Remove such assumption from the code, by always considering
SCHED_DEADLINE utilization signal as not stale.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 kernel/sched/cpufreq_schedutil.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e835fa886225..066b876d81e7 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -267,17 +267,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		s64 delta_ns;
 
 		/*
-		 * If the CPU utilization was last updated before the previous
-		 * frequency update and the time elapsed between the last update
-		 * of the CPU utilization and the last frequency update is long
-		 * enough, don't take the CPU into account as it probably is
-		 * idle now (and clear iowait_boost for it).
+		 * If the CFS CPU utilization was last updated before the
+		 * previous frequency update and the time elapsed between the
+		 * last update of the CPU utilization and the last frequency
+		 * update is long enough, reset iowait_boost and util_cfs, as
+		 * they are now probably stale. However, still consider the
+		 * CPU contribution if it has some DEADLINE utilization
+		 * (util_dl).
 		 */
 		delta_ns = time - j_sg_cpu->last_update;
 		if (delta_ns > TICK_NSEC) {
 			j_sg_cpu->iowait_boost = 0;
-			continue;
+			j_sg_cpu->util_cfs = 0;
+			if (j_sg_cpu->util_dl == 0)
+				continue;
 		}
+
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (4 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar

sd parameter is never used in arch_scale_freq_capacity (and it's hard to
see where information coming from scheduling domains might help doing
frequency invariance scaling).

Remove it; also in anticipation of moving arch_scale_freq_capacity
outside CONFIG_SMP.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 2 +-
 kernel/sched/sched.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 008c514dc241..97bd5aae285d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2875,7 +2875,7 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
 	u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
 	u64 periods;
 
-	scale_freq = arch_scale_freq_capacity(NULL, cpu);
+	scale_freq = arch_scale_freq_capacity(cpu);
 	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
 	delta += sa->period_contrib;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1be5dac728a4..ec5769288927 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1673,7 +1673,7 @@ extern void sched_avg_update(struct rq *rq);
 
 #ifndef arch_scale_freq_capacity
 static __always_inline
-unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
+unsigned long arch_scale_freq_capacity(int cpu)
 {
 	return SCHED_CAPACITY_SCALE;
 }
@@ -1692,7 +1692,7 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 {
-	rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
+	rq->rt_avg += rt_delta * arch_scale_freq_capacity(cpu_of(rq));
 	sched_avg_update(rq);
 }
 #else
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (5 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-07 22:04   ` Steven Rostedt
  2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar

Currently, frequency and cpu capacity scaling is only performed on
CONFIG_SMP systems (as CFS PELT signals are only present for such
systems). However, other scheduling classes want to do freq/cpu scaling,
and for !CONFIG_SMP configurations as well.

arch_scale_freq_capacity is useful to implement frequency scaling even
on !CONFIG_SMP platforms, so we simply move it outside CONFIG_SMP
ifdeffery.

Even if arch_scale_cpu_capacity is not useful on !CONFIG_SMP platforms,
we make a default implementation available for such configurations anyway
to simplify scheduler code doing CPU scale invariance.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/topology.h | 12 ++++++------
 kernel/sched/sched.h           | 13 ++++++++++---
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7d065abc7a47..953cf4f889ec 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -6,6 +6,12 @@
 #include <linux/sched/idle.h>
 
 /*
+ * Increase resolution of cpu_capacity calculations
+ */
+#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
+#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
+
+/*
  * sched-domains (multiprocessor balancing) declarations:
  */
 #ifdef CONFIG_SMP
@@ -26,12 +32,6 @@
 #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
 #define SD_NUMA			0x4000	/* cross-node balancing */
 
-/*
- * Increase resolution of cpu_capacity calculations
- */
-#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
-#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
-
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec5769288927..178f4a5df2fa 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1668,9 +1668,6 @@ static inline int hrtick_enabled(struct rq *rq)
 
 #endif /* CONFIG_SCHED_HRTICK */
 
-#ifdef CONFIG_SMP
-extern void sched_avg_update(struct rq *rq);
-
 #ifndef arch_scale_freq_capacity
 static __always_inline
 unsigned long arch_scale_freq_capacity(int cpu)
@@ -1679,6 +1676,9 @@ unsigned long arch_scale_freq_capacity(int cpu)
 }
 #endif
 
+#ifdef CONFIG_SMP
+extern void sched_avg_update(struct rq *rq);
+
 #ifndef arch_scale_cpu_capacity
 static __always_inline
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
@@ -1696,6 +1696,13 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 	sched_avg_update(rq);
 }
 #else
+#ifndef arch_scale_cpu_capacity
+static __always_inline
+unsigned long arch_scale_cpu_capacity(void __always_unused *sd, int cpu)
+{
+	return SCHED_CAPACITY_SCALE;
+}
+#endif
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (6 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
@ 2017-07-05  8:59 ` Juri Lelli
  2017-07-19  7:21   ` Peter Zijlstra
  2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
  2017-07-06 21:08 ` Rafael J. Wysocki
  9 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-05  8:59 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Apply frequency and cpu scale-invariance correction factor to bandwidth
enforcement (similar to what we already do to fair utilization tracking).

Each delta_exec gets scaled considering current frequency and maximum
cpu capacity; which means that the reservation runtime parameter (that
need to be specified profiling the task execution at max frequency on
biggest capacity core) gets thus scaled accordingly.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 kernel/sched/deadline.c | 26 ++++++++++++++++++++++----
 kernel/sched/fair.c     |  2 --
 kernel/sched/sched.h    |  2 ++
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 4ec82fa56b04..5bed1a2fae06 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1122,7 +1122,8 @@ static void update_curr_dl(struct rq *rq)
 {
 	struct task_struct *curr = rq->curr;
 	struct sched_dl_entity *dl_se = &curr->dl;
-	u64 delta_exec;
+	u64 delta_exec, scaled_delta_exec;
+	int cpu = cpu_of(rq);
 
 	if (!dl_task(curr) || !on_dl_rq(dl_se))
 		return;
@@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
 	if (unlikely(dl_entity_is_special(dl_se)))
 		return;
 
-	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
-		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
-	dl_se->runtime -= delta_exec;
+	/*
+	 * For tasks that participate in GRUB, we implement GRUB-PA: the
+	 * spare reclaimed bandwidth is used to clock down frequency.
+	 *
+	 * For the others, we still need to scale reservation parameters
+	 * according to current frequency and CPU maximum capacity.
+	 */
+	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
+		scaled_delta_exec = grub_reclaim(delta_exec,
+						 rq,
+						 &curr->dl);
+	} else {
+		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
+		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
+		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
+	}
+
+	dl_se->runtime -= scaled_delta_exec;
 
 throttle:
 	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 97bd5aae285d..a6e8a4f5786f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2844,8 +2844,6 @@ static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
 	return c1 + c2 + c3;
 }
 
-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
-
 /*
  * Accumulate the three separate parts of the sum; d1 the remainder
  * of the last (incomplete) period, d2 the span of full periods and d3
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 178f4a5df2fa..688185761eaf 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,6 +155,8 @@ static inline int task_has_dl_policy(struct task_struct *p)
 	return dl_policy(p->policy);
 }
 
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+
 /*
  * !! For sched_setattr_nocheck() (kernel) only !!
  *
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (7 preceding siblings ...)
  2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
@ 2017-07-06 15:57 ` Steven Rostedt
  2017-07-06 16:08   ` Juri Lelli
  2017-07-06 16:15   ` Peter Zijlstra
  2017-07-06 21:08 ` Rafael J. Wysocki
  9 siblings, 2 replies; 45+ messages in thread
From: Steven Rostedt @ 2017-07-06 15:57 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, tkjos, joelaf, andresoportus, morten.rasmussen,
	dietmar.eggemann, patrick.bellasi

On Wed,  5 Jul 2017 09:58:57 +0100
Juri Lelli <juri.lelli@arm.com> wrote:

> Hi,
> 
> v1 of the RFC set implementing frequency/cpu invariance and OPP selection for

It would be nice if you specify what "OPP" stands for. A quick google
search shows "Other Peoples Privates", which isn't the type of
selection I would be looking for.

-- Steve


> SCHED_DEADLINE [1]. The set is based on tip/sched/core as of today
> (72298e5c92c5), which now already includes Luca's "CPU reclaiming for
> SCHED_DEADLINE".
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
@ 2017-07-06 16:08   ` Juri Lelli
  2017-07-06 16:15   ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-06 16:08 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, tkjos, joelaf, andresoportus, morten.rasmussen,
	dietmar.eggemann, patrick.bellasi

On 06/07/17 11:57, Steven Rostedt wrote:
> On Wed,  5 Jul 2017 09:58:57 +0100
> Juri Lelli <juri.lelli@arm.com> wrote:
> 
> > Hi,
> > 
> > v1 of the RFC set implementing frequency/cpu invariance and OPP selection for
> 
> It would be nice if you specify what "OPP" stands for. A quick google
> search shows "Other Peoples Privates", which isn't the type of
> selection I would be looking for.
> 

Not something we could influence from the scheduler anyway. :)

Just to clarify, OPP in this context stands for "Operating Performance
Point": (frequency, voltage) tuple [1], used in this context as a
synonym of clock frequency.

I'll add in next version.

Thanks,

- Juri

[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0024a/CHDGICAJ.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
  2017-07-06 16:08   ` Juri Lelli
@ 2017-07-06 16:15   ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-06 16:15 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juri Lelli, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Thu, Jul 06, 2017 at 11:57:15AM -0400, Steven Rostedt wrote:
> On Wed,  5 Jul 2017 09:58:57 +0100
> Juri Lelli <juri.lelli@arm.com> wrote:
> 
> > Hi,
> > 
> > v1 of the RFC set implementing frequency/cpu invariance and OPP selection for
> 
> It would be nice if you specify what "OPP" stands for. A quick google
> search shows "Other Peoples Privates", which isn't the type of
> selection I would be looking for.

Its ARM speak for P-state. It stands for OPerating Point or something
like that. But yes, I too always play the Naughty by Nature song in my
head when I read that.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (8 preceding siblings ...)
  2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
@ 2017-07-06 21:08 ` Rafael J. Wysocki
  9 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2017-07-06 21:08 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Wednesday, July 05, 2017 09:58:57 AM Juri Lelli wrote:
> Hi,
> 
> v1 of the RFC set implementing frequency/cpu invariance and OPP selection for
> SCHED_DEADLINE [1]. The set is based on tip/sched/core as of today
> (72298e5c92c5), which now already includes Luca's "CPU reclaiming for
> SCHED_DEADLINE".
> 
> Thanks a lot for reviewing RFCv0!
> 
> Patches high level description:
> 
>  o [01-02]/08 add the necessary links to start accounting DEADLINE contribution
>               to OPP selection 
>  o 03/08      it's a temporary solution to make possible (on ARM) to change
>               frequency for DEADLINE tasks (that would possibly delay the SCHED_FIFO
>               worker kthread); proper solution would be to be able to issue frequency
>               transition from an atomic ctx
>  o [04-05]/08 it's a schedutil change that copes with the fact that DEADLINE
>               doesn't require periodic OPP selection triggering point
>  o [06-07]/08 make arch_scale_{freq,cpu}_capacity() function available on !CONFIG_SMP
>               configurations too
>  o 08/08      implements frequency/cpu invariance for tasks' reservation
>               parameters; which basically means that we implement GRUB-PA [2]
> 
> Changes w.r.t. RFCv0:
> 
>  - rebase on tip/sched/core
>  - make use of BW_SHIFT for calculations (Peter)
>  - added a comment about guaranteed/requested frequency (Peter)
>  - use a high bit for sched_flags SCHED_FLAG_SPECIAL hack; don't expose it to
>    userspace and add comments (Peter)
>  - refactor aggregation of utilization from scheduling classes
> 
> Please have a look. Feedback and comments are, as usual, more than welcome.
> 
> In case you would like to test this out:
> 
>  git://linux-arm.org/linux-jl.git upstream/deadline/freq-rfc-v1
> 
> Best,
> 
> - Juri
> 
> 
> Juri Lelli (8):
>   sched/cpufreq_schedutil: make use of DEADLINE utilization signal
>   sched/deadline: move cpu frequency selection triggering points
>   sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
>   sched/cpufreq_schedutil: split utilization signals
>   sched/cpufreq_schedutil: always consider all CPUs when deciding next
>     freq
>   sched/sched.h: remove sd arch_scale_freq_capacity parameter
>   sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
>   sched/deadline: make bandwidth enforcement scale-invariant
> 
>  include/linux/sched.h            |  1 +
>  include/linux/sched/cpufreq.h    |  2 --
>  include/linux/sched/topology.h   | 12 +++----
>  kernel/sched/core.c              | 15 ++++++--
>  kernel/sched/cpufreq_schedutil.c | 77 ++++++++++++++++++++++++++--------------
>  kernel/sched/deadline.c          | 46 ++++++++++++++++++++----
>  kernel/sched/fair.c              |  4 +--
>  kernel/sched/sched.h             | 51 +++++++++++++++++++-------
>  8 files changed, 149 insertions(+), 59 deletions(-)

The schedutil changes in this series look OK to me, so please feel free to add

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

to these patches.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
@ 2017-07-07  3:26   ` Joel Fernandes
  2017-07-07  8:58   ` Viresh Kumar
  2017-07-10  7:05   ` Viresh Kumar
  2 siblings, 0 replies; 45+ messages in thread
From: Joel Fernandes @ 2017-07-07  3:26 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar,
	LKML, Linux PM, Thomas Gleixner, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

Hi Juri,

On Wed, Jul 5, 2017 at 1:59 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> To be able to treat utilization signals of different scheduling classes
> in different ways (e.g., CFS signal might be stale while DEADLINE signal
> is never stale by design) we need to split sugov_cpu::util signal in two:
> util_cfs and util_dl.
>
> This patch does that by also changing sugov_get_util() parameter list.
> After this change, aggregation of the different signals has to be performed
> by sugov_get_util() users (so that they can decide what to do with the
> different signals).
>
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> Cc: Claudio Scordino <claudio@evidence.eu.com>
> ---
> Changes from RFCv0:
>
>  - refactor aggregation of utilization in sugov_aggregate_util()
> ---
>  kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index ba6227625f24..e835fa886225 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -58,7 +58,8 @@ struct sugov_cpu {
>         u64 last_update;
>
>         /* The fields below are only needed when sharing a policy. */
> -       unsigned long util;
> +       unsigned long util_cfs;
> +       unsigned long util_dl;
>         unsigned long max;
>         unsigned int flags;
>
> @@ -154,20 +155,24 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         return cpufreq_driver_resolve_freq(policy, freq);
>  }
>
> -static void sugov_get_util(unsigned long *util, unsigned long *max)
> +static void sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
>         struct rq *rq = this_rq();
> -       unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> -                               >> BW_SHIFT;
>
> -       *max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +       sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +       sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> +       sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> +                         >> BW_SHIFT;
> +}
>
> +static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> +{
>         /*
>          * Ideally we would like to set util_dl as min/guaranteed freq and
>          * util_cfs + util_dl as requested freq. However, cpufreq is not yet
>          * ready for such an interface. So, we only do the latter for now.
>          */
> -       *util = min(rq->cfs.avg.util_avg + dl_util, *max);
> +       return min(sg_cpu->util_cfs + sg_cpu->util_dl, sg_cpu->max);
>  }

I am wondering why the need for a separate aggregation API. To me, it
looks like using sugov_get_util to set the sg_cpu util elements and
then do the aggregation at the same time would have the same effect
(without changing the existing parameter list). Is this to handle a
future usecase where aggregation may need to be done differently? For
all the user's of sugov_get_util, aggregation is done in the same way.
Anyway if I missed something, sorry for the noise.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
@ 2017-07-07  3:56   ` Joel Fernandes
  2017-07-07 10:43     ` Juri Lelli
  2017-07-11 12:37     ` Peter Zijlstra
  2017-07-07  7:21   ` Viresh Kumar
  2017-07-11 16:18   ` Peter Zijlstra
  2 siblings, 2 replies; 45+ messages in thread
From: Joel Fernandes @ 2017-07-07  3:56 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar,
	LKML, Linux PM, Thomas Gleixner, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

Hi Juri,

On Wed, Jul 5, 2017 at 1:59 AM, Juri Lelli <juri.lelli@arm.com> wrote:
> Worker kthread needs to be able to change frequency for all other
> threads.
>
> Make it special, just under STOP class.
>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> Cc: Claudio Scordino <claudio@evidence.eu.com>
> ---
> Changes from RFCv0:
>
>  - use a high bit of sched_flags and don't expose the new flag to
>    userspace (also add derisory comments) (Peter)
> ---
>  include/linux/sched.h            |  1 +
>  kernel/sched/core.c              | 15 +++++++++++++--
>  kernel/sched/cpufreq_schedutil.c | 15 ++++++++++++---
>  kernel/sched/deadline.c          | 13 +++++++++++++
>  kernel/sched/sched.h             | 20 +++++++++++++++++++-
>  5 files changed, 58 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1f0f427e0292..0ba767fdedae 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1359,6 +1359,7 @@ extern int idle_cpu(int cpu);
>  extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
>  extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
>  extern int sched_setattr(struct task_struct *, const struct sched_attr *);
> +extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
>  extern struct task_struct *idle_task(int cpu);
>
>  /**
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 5186797908dc..0e40c7ff2bf4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3998,7 +3998,9 @@ static int __sched_setscheduler(struct task_struct *p,
>         }
>
>         if (attr->sched_flags &
> -               ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> +               ~(SCHED_FLAG_RESET_ON_FORK |
> +                 SCHED_FLAG_RECLAIM |
> +                 SCHED_FLAG_SPECIAL))
>                 return -EINVAL;

I was thinking if its better to name SCHED_FLAG_SPECIAL something more
descriptive than "special", since it will not be clear looking at
other parts of the code that this is used for the frequency scaling
thread or that its a DL task. I was thinking since this flag is
specifically setup for the sugov thread frequency scaling purpose, a
better flag name than SPECIAL might be SCHED_FLAG_DL_FREQ_SCALE or
SCHED_FLAG_FREQ_SCALE. But I am Ok with whatever makes sense to do
here. Thanks,

Regards,

-Joel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal
  2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
@ 2017-07-07  7:20   ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2017-07-07  7:20 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:58, Juri Lelli wrote:
> SCHED_DEADLINE tracks active utilization signal with a per dl_rq
> variable named running_bw.
> 
> Make use of that to drive cpu frequency selection: add up FAIR and
> DEADLINE contribution to get the required CPU capacity to handle both
> requirements (while RT still selects max frequency).
> 
> Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> ---
> Changes from RFCv0:
> 
>  - use BW_SHIFT (Peter)
>  - add comment about guaranteed and requested freq (Peter)
>  - modify comment about go to max behaviour (Claudio)
> ---
>  include/linux/sched/cpufreq.h    |  2 --
>  kernel/sched/cpufreq_schedutil.c | 25 +++++++++++++++----------
>  2 files changed, 15 insertions(+), 12 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points
  2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
@ 2017-07-07  7:21   ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2017-07-07  7:21 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:58, Juri Lelli wrote:
> Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
> fraction of CPU bandwidth to tasks admitted to the system), there is no
> point in evaluating frequency changes during each tick event.
> 
> Move frequency selection triggering points to where running_bw changes.
> 
> Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> ---
> Changes from RFCv0:
> 
>  - modify comment regarding periodic RT updates (Claudio)
> ---
>  kernel/sched/deadline.c |  7 ++++---
>  kernel/sched/sched.h    | 12 ++++++------
>  2 files changed, 10 insertions(+), 9 deletions(-)

Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
  2017-07-07  3:56   ` Joel Fernandes
@ 2017-07-07  7:21   ` Viresh Kumar
  2017-07-11 16:18   ` Peter Zijlstra
  2 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2017-07-07  7:21 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:59, Juri Lelli wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index f2494d1fc8ef..ba6227625f24 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -424,7 +424,16 @@ static void sugov_policy_free(struct sugov_policy *sg_policy)
>  static int sugov_kthread_create(struct sugov_policy *sg_policy)
>  {
>  	struct task_struct *thread;
> -	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
> +	struct sched_attr attr = {
> +		.size = sizeof(struct sched_attr),
> +		.sched_policy = SCHED_DEADLINE,
> +		.sched_flags = SCHED_FLAG_SPECIAL,
> +		.sched_nice = 0,
> +		.sched_priority = 0,
> +		.sched_runtime = 0,
> +		.sched_deadline = 0,
> +		.sched_period = 0,
> +	};
>  	struct cpufreq_policy *policy = sg_policy->policy;
>  	int ret;
>  
> @@ -442,10 +451,10 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
>  		return PTR_ERR(thread);
>  	}
>  
> -	ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
> +	ret = sched_setattr_nocheck(thread, &attr);
>  	if (ret) {
>  		kthread_stop(thread);
> -		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
> +		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
>  		return ret;
>  	}

Acked-by: Viresh Kumar <viresh.kumar@linaro.org> (schedutil)

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
  2017-07-07  3:26   ` Joel Fernandes
@ 2017-07-07  8:58   ` Viresh Kumar
  2017-07-07 10:59     ` Juri Lelli
  2017-07-10  7:05   ` Viresh Kumar
  2 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2017-07-07  8:58 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:59, Juri Lelli wrote:
> To be able to treat utilization signals of different scheduling classes
> in different ways (e.g., CFS signal might be stale while DEADLINE signal
> is never stale by design) we need to split sugov_cpu::util signal in two:
> util_cfs and util_dl.
> 
> This patch does that by also changing sugov_get_util() parameter list.
> After this change, aggregation of the different signals has to be performed
> by sugov_get_util() users (so that they can decide what to do with the
> different signals).
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Are you referring to this response here ?

https://marc.info/?l=linux-kernel&m=149095102600847&w=2

If yes, then I don't think it was about having separate APIs, but just storing
util_cfs/dl separately.

> -static void sugov_get_util(unsigned long *util, unsigned long *max)
> +static void sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
>  	struct rq *rq = this_rq();
> -	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> -				>> BW_SHIFT;
>  
> -	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> +	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> +			  >> BW_SHIFT;
> +}
>  
> +static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)

As Joel already mentioned, I don't think we should create two separate routines
here.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq
  2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
@ 2017-07-07  8:59   ` Viresh Kumar
  2017-07-11 16:17   ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2017-07-07  8:59 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:59, Juri Lelli wrote:
> No assumption can be made upon the rate at which frequency updates get
> triggered, as there are scheduling policies (like SCHED_DEADLINE) which
> don't trigger them so frequently.
> 
> Remove such assumption from the code, by always considering
> SCHED_DEADLINE utilization signal as not stale.
> 
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> Cc: Claudio Scordino <claudio@evidence.eu.com>
> ---
>  kernel/sched/cpufreq_schedutil.c | 17 +++++++++++------
>  1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e835fa886225..066b876d81e7 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -267,17 +267,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
>  		s64 delta_ns;
>  
>  		/*
> -		 * If the CPU utilization was last updated before the previous
> -		 * frequency update and the time elapsed between the last update
> -		 * of the CPU utilization and the last frequency update is long
> -		 * enough, don't take the CPU into account as it probably is
> -		 * idle now (and clear iowait_boost for it).
> +		 * If the CFS CPU utilization was last updated before the
> +		 * previous frequency update and the time elapsed between the
> +		 * last update of the CPU utilization and the last frequency
> +		 * update is long enough, reset iowait_boost and util_cfs, as
> +		 * they are now probably stale. However, still consider the
> +		 * CPU contribution if it has some DEADLINE utilization
> +		 * (util_dl).
>  		 */
>  		delta_ns = time - j_sg_cpu->last_update;
>  		if (delta_ns > TICK_NSEC) {
>  			j_sg_cpu->iowait_boost = 0;
> -			continue;
> +			j_sg_cpu->util_cfs = 0;
> +			if (j_sg_cpu->util_dl == 0)
> +				continue;
>  		}
> +
>  		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
>  			return policy->cpuinfo.max_freq;
>  

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07  3:56   ` Joel Fernandes
@ 2017-07-07 10:43     ` Juri Lelli
  2017-07-07 10:46       ` Thomas Gleixner
  2017-07-11 12:37     ` Peter Zijlstra
  1 sibling, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-07 10:43 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar,
	LKML, Linux PM, Thomas Gleixner, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

Hi,

On 06/07/17 20:56, Joel Fernandes wrote:
> Hi Juri,
> 
> On Wed, Jul 5, 2017 at 1:59 AM, Juri Lelli <juri.lelli@arm.com> wrote:

[...]

> >         if (attr->sched_flags &
> > -               ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> > +               ~(SCHED_FLAG_RESET_ON_FORK |
> > +                 SCHED_FLAG_RECLAIM |
> > +                 SCHED_FLAG_SPECIAL))
> >                 return -EINVAL;
> 
> I was thinking if its better to name SCHED_FLAG_SPECIAL something more
> descriptive than "special", since it will not be clear looking at
> other parts of the code that this is used for the frequency scaling
> thread or that its a DL task. I was thinking since this flag is
> specifically setup for the sugov thread frequency scaling purpose, a
> better flag name than SPECIAL might be SCHED_FLAG_DL_FREQ_SCALE or
> SCHED_FLAG_FREQ_SCALE. But I am Ok with whatever makes sense to do
> here. Thanks,
> 

Sure.

How about SCHED_FLAG_SUGOV then?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 10:43     ` Juri Lelli
@ 2017-07-07 10:46       ` Thomas Gleixner
  2017-07-07 10:53         ` Juri Lelli
  0 siblings, 1 reply; 45+ messages in thread
From: Thomas Gleixner @ 2017-07-07 10:46 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Joel Fernandes, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Viresh Kumar, LKML, Linux PM, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Fri, 7 Jul 2017, Juri Lelli wrote:
> How about SCHED_FLAG_SUGOV then?

I know sugo della carne, but what's sugo V?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 10:46       ` Thomas Gleixner
@ 2017-07-07 10:53         ` Juri Lelli
  2017-07-07 13:11           ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-07 10:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joel Fernandes, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Viresh Kumar, LKML, Linux PM, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On 07/07/17 12:46, Thomas Gleixner wrote:
> On Fri, 7 Jul 2017, Juri Lelli wrote:
> > How about SCHED_FLAG_SUGOV then?
> 
> I know sugo della carne, but what's sugo V?
> 

Right.. can't really help not thinking about the same (especially close
to lunch) :)

But the abbreviation stands for SchedUtil GOVernor (AFAIK). We already
have a SUGOV_KTHREAD_PRIORITY (that I just noticed I need to remove BTW)
in sched/cpufreq_schedutil.c.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-07  8:58   ` Viresh Kumar
@ 2017-07-07 10:59     ` Juri Lelli
  2017-07-10  7:46       ` Joel Fernandes
  0 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-07 10:59 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

Hi,

On 07/07/17 14:28, Viresh Kumar wrote:
> On 05-07-17, 09:59, Juri Lelli wrote:
> > To be able to treat utilization signals of different scheduling classes
> > in different ways (e.g., CFS signal might be stale while DEADLINE signal
> > is never stale by design) we need to split sugov_cpu::util signal in two:
> > util_cfs and util_dl.
> > 
> > This patch does that by also changing sugov_get_util() parameter list.
> > After this change, aggregation of the different signals has to be performed
> > by sugov_get_util() users (so that they can decide what to do with the
> > different signals).
> > 
> > Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Are you referring to this response here ?
> 
> https://marc.info/?l=linux-kernel&m=149095102600847&w=2
> 

Yep.

> If yes, then I don't think it was about having separate APIs, but just storing
> util_cfs/dl separately.
> 
> > -static void sugov_get_util(unsigned long *util, unsigned long *max)
> > +static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >  {
> >  	struct rq *rq = this_rq();
> > -	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> > -				>> BW_SHIFT;
> >  
> > -	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> > +	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> > +	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> > +	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
> > +			  >> BW_SHIFT;
> > +}
> >  
> > +static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
> 
> As Joel already mentioned, I don't think we should create two separate routines
> here.
> 

Mmm, it makes retrieving of utilization in sugov_update_shared and
aggregating values for the domain in sugov_next_freq_shared cleaner,
IMHO.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 10:53         ` Juri Lelli
@ 2017-07-07 13:11           ` Rafael J. Wysocki
  2017-07-07 21:58             ` Steven Rostedt
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2017-07-07 13:11 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Thomas Gleixner, Joel Fernandes, Peter Zijlstra, Ingo Molnar,
	Viresh Kumar, LKML, Linux PM, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Friday, July 07, 2017 11:53:16 AM Juri Lelli wrote:
> On 07/07/17 12:46, Thomas Gleixner wrote:
> > On Fri, 7 Jul 2017, Juri Lelli wrote:
> > > How about SCHED_FLAG_SUGOV then?
> > 
> > I know sugo della carne, but what's sugo V?
> > 
> 
> Right.. can't really help not thinking about the same (especially close
> to lunch) :)
> 
> But the abbreviation stands for SchedUtil GOVernor (AFAIK). We already
> have a SUGOV_KTHREAD_PRIORITY (that I just noticed I need to remove BTW)
> in sched/cpufreq_schedutil.c.

SCHED_FLAG_CPUFREQ_WORKER comes to mind, but it's somewhat long.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 13:11           ` Rafael J. Wysocki
@ 2017-07-07 21:58             ` Steven Rostedt
  2017-07-07 22:07               ` Joel Fernandes
  0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2017-07-07 21:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, Thomas Gleixner, Joel Fernandes, Peter Zijlstra,
	Ingo Molnar, Viresh Kumar, LKML, Linux PM, Vincent Guittot,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Fri, 07 Jul 2017 15:11:45 +0200
"Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:

> On Friday, July 07, 2017 11:53:16 AM Juri Lelli wrote:
> > On 07/07/17 12:46, Thomas Gleixner wrote:  
> > > On Fri, 7 Jul 2017, Juri Lelli wrote:  
> > > > How about SCHED_FLAG_SUGOV then?  
> > > 
> > > I know sugo della carne, but what's sugo V?
> > >   
> > 
> > Right.. can't really help not thinking about the same (especially close
> > to lunch) :)
> > 
> > But the abbreviation stands for SchedUtil GOVernor (AFAIK). We already
> > have a SUGOV_KTHREAD_PRIORITY (that I just noticed I need to remove BTW)
> > in sched/cpufreq_schedutil.c.  
> 
> SCHED_FLAG_CPUFREQ_WORKER comes to mind, but it's somewhat long.
> 

It is rather long. Although I actually hate the SUGOV, it is easily
grepable. Just comment what it stands for. We can always change it
later.

-- Steve

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
  2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
@ 2017-07-07 22:04   ` Steven Rostedt
  0 siblings, 0 replies; 45+ messages in thread
From: Steven Rostedt @ 2017-07-07 22:04 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, luca.abeni, claudio, tommaso.cucinotta, bristot,
	mathieu.poirier, tkjos, joelaf, andresoportus, morten.rasmussen,
	dietmar.eggemann, patrick.bellasi, Ingo Molnar

On Wed,  5 Jul 2017 09:59:04 +0100
Juri Lelli <juri.lelli@arm.com> wrote:

> Currently, frequency and cpu capacity scaling is only performed on
> CONFIG_SMP systems (as CFS PELT signals are only present for such
> systems). However, other scheduling classes want to do freq/cpu scaling,
> and for !CONFIG_SMP configurations as well.
> 
> arch_scale_freq_capacity is useful to implement frequency scaling even
> on !CONFIG_SMP platforms, so we simply move it outside CONFIG_SMP
> ifdeffery.
> 
> Even if arch_scale_cpu_capacity is not useful on !CONFIG_SMP platforms,
> we make a default implementation available for such configurations anyway
> to simplify scheduler code doing CPU scale invariance.
> 

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/sched/topology.h | 12 ++++++------
>  kernel/sched/sched.h           | 13 ++++++++++---
>  2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> index 7d065abc7a47..953cf4f889ec 100644
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -6,6 +6,12 @@
>  #include <linux/sched/idle.h>
>  
>  /*
> + * Increase resolution of cpu_capacity calculations
> + */
> +#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
> +#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
> +
> +/*
>   * sched-domains (multiprocessor balancing) declarations:
>   */
>  #ifdef CONFIG_SMP
> @@ -26,12 +32,6 @@
>  #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
>  #define SD_NUMA			0x4000	/* cross-node balancing */
>  
> -/*
> - * Increase resolution of cpu_capacity calculations
> - */
> -#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
> -#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
> -
>  #ifdef CONFIG_SCHED_SMT
>  static inline int cpu_smt_flags(void)
>  {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index ec5769288927..178f4a5df2fa 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1668,9 +1668,6 @@ static inline int hrtick_enabled(struct rq *rq)
>  
>  #endif /* CONFIG_SCHED_HRTICK */
>  
> -#ifdef CONFIG_SMP
> -extern void sched_avg_update(struct rq *rq);
> -
>  #ifndef arch_scale_freq_capacity
>  static __always_inline
>  unsigned long arch_scale_freq_capacity(int cpu)
> @@ -1679,6 +1676,9 @@ unsigned long arch_scale_freq_capacity(int cpu)
>  }
>  #endif
>  
> +#ifdef CONFIG_SMP
> +extern void sched_avg_update(struct rq *rq);
> +
>  #ifndef arch_scale_cpu_capacity
>  static __always_inline
>  unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
> @@ -1696,6 +1696,13 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
>  	sched_avg_update(rq);
>  }
>  #else
> +#ifndef arch_scale_cpu_capacity
> +static __always_inline
> +unsigned long arch_scale_cpu_capacity(void __always_unused *sd, int cpu)
> +{
> +	return SCHED_CAPACITY_SCALE;
> +}
> +#endif
>  static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
>  static inline void sched_avg_update(struct rq *rq) { }
>  #endif

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 21:58             ` Steven Rostedt
@ 2017-07-07 22:07               ` Joel Fernandes
  2017-07-07 22:15                 ` Steven Rostedt
  0 siblings, 1 reply; 45+ messages in thread
From: Joel Fernandes @ 2017-07-07 22:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rafael J. Wysocki, Juri Lelli, Thomas Gleixner, Peter Zijlstra,
	Ingo Molnar, Viresh Kumar, LKML, Linux PM, Vincent Guittot,
	Luca Abeni, claudio, Tommaso Cucinotta, bristot, Mathieu Poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

Hi,

On Fri, Jul 7, 2017 at 2:58 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 07 Jul 2017 15:11:45 +0200
> "Rafael J. Wysocki" <rjw@rjwysocki.net> wrote:
>
>> On Friday, July 07, 2017 11:53:16 AM Juri Lelli wrote:
>> > On 07/07/17 12:46, Thomas Gleixner wrote:
>> > > On Fri, 7 Jul 2017, Juri Lelli wrote:
>> > > > How about SCHED_FLAG_SUGOV then?
>> > >
>> > > I know sugo della carne, but what's sugo V?
>> > >
>> >
>> > Right.. can't really help not thinking about the same (especially close
>> > to lunch) :)
>> >
>> > But the abbreviation stands for SchedUtil GOVernor (AFAIK). We already
>> > have a SUGOV_KTHREAD_PRIORITY (that I just noticed I need to remove BTW)
>> > in sched/cpufreq_schedutil.c.
>>
>> SCHED_FLAG_CPUFREQ_WORKER comes to mind, but it's somewhat long.
>>
>
> It is rather long. Although I actually hate the SUGOV, it is easily
> grepable. Just comment what it stands for. We can always change it
> later.

I was thinking why not just SCHED_FLAG_CPUFREQ. That says its for
cpufreq purposes, and is a bit self-documenting. "WORKER" is a bit
redundant and can be dropped in my opinion.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 22:07               ` Joel Fernandes
@ 2017-07-07 22:15                 ` Steven Rostedt
  2017-07-07 22:57                   ` Joel Fernandes
  0 siblings, 1 reply; 45+ messages in thread
From: Steven Rostedt @ 2017-07-07 22:15 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Rafael J. Wysocki, Juri Lelli, Thomas Gleixner, Peter Zijlstra,
	Ingo Molnar, Viresh Kumar, LKML, Linux PM, Vincent Guittot,
	Luca Abeni, claudio, Tommaso Cucinotta, bristot, Mathieu Poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Fri, 7 Jul 2017 15:07:09 -0700
Joel Fernandes <joelaf@google.com> wrote:


> > It is rather long. Although I actually hate the SUGOV, it is easily
> > grepable. Just comment what it stands for. We can always change it
> > later.  
> 
> I was thinking why not just SCHED_FLAG_CPUFREQ. That says its for
> cpufreq purposes, and is a bit self-documenting. "WORKER" is a bit
> redundant and can be dropped in my opinion.

I was thinking that too, but was wondering how tightly coupled is this
with SCHED_DEADLINE? I like the searchability of SUGOV, where as
CPUFREQ is still quite broad.

-- Steve

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07 22:15                 ` Steven Rostedt
@ 2017-07-07 22:57                   ` Joel Fernandes
  0 siblings, 0 replies; 45+ messages in thread
From: Joel Fernandes @ 2017-07-07 22:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Rafael J. Wysocki, Juri Lelli, Thomas Gleixner, Peter Zijlstra,
	Ingo Molnar, Viresh Kumar, LKML, Linux PM, Vincent Guittot,
	Luca Abeni, claudio, Tommaso Cucinotta, bristot, Mathieu Poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Fri, Jul 7, 2017 at 3:15 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Fri, 7 Jul 2017 15:07:09 -0700
> Joel Fernandes <joelaf@google.com> wrote:
>
>
>> > It is rather long. Although I actually hate the SUGOV, it is easily
>> > grepable. Just comment what it stands for. We can always change it
>> > later.
>>
>> I was thinking why not just SCHED_FLAG_CPUFREQ. That says its for
>> cpufreq purposes, and is a bit self-documenting. "WORKER" is a bit
>> redundant and can be dropped in my opinion.
>
> I was thinking that too, but was wondering how tightly coupled is this
> with SCHED_DEADLINE? I like the searchability of SUGOV, where as
> CPUFREQ is still quite broad.

Yes this is tightly coupled with DL so in that case probably a more
specific name is better as you mentioned.

thanks,

-Joel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
  2017-07-07  3:26   ` Joel Fernandes
  2017-07-07  8:58   ` Viresh Kumar
@ 2017-07-10  7:05   ` Viresh Kumar
  2 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2017-07-10  7:05 UTC (permalink / raw)
  To: Juri Lelli
  Cc: peterz, mingo, rjw, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 05-07-17, 09:59, Juri Lelli wrote:
> To be able to treat utilization signals of different scheduling classes
> in different ways (e.g., CFS signal might be stale while DEADLINE signal
> is never stale by design) we need to split sugov_cpu::util signal in two:
> util_cfs and util_dl.
> 
> This patch does that by also changing sugov_get_util() parameter list.
> After this change, aggregation of the different signals has to be performed
> by sugov_get_util() users (so that they can decide what to do with the
> different signals).
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Cc: Luca Abeni <luca.abeni@santannapisa.it>
> Cc: Claudio Scordino <claudio@evidence.eu.com>
> ---
> Changes from RFCv0:
> 
>  - refactor aggregation of utilization in sugov_aggregate_util()
> ---
>  kernel/sched/cpufreq_schedutil.c | 28 ++++++++++++++++------------
>  1 file changed, 16 insertions(+), 12 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-07-07 10:59     ` Juri Lelli
@ 2017-07-10  7:46       ` Joel Fernandes
  0 siblings, 0 replies; 45+ messages in thread
From: Joel Fernandes @ 2017-07-10  7:46 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Viresh Kumar, Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	LKML, Linux PM, Thomas Gleixner, Vincent Guittot, Steven Rostedt,
	Luca Abeni, claudio, Tommaso Cucinotta, bristot, Mathieu Poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Fri, Jul 7, 2017 at 3:59 AM, Juri Lelli <juri.lelli@arm.com> wrote:
[..]
>
>> If yes, then I don't think it was about having separate APIs, but just storing
>> util_cfs/dl separately.
>>
>> > -static void sugov_get_util(unsigned long *util, unsigned long *max)
>> > +static void sugov_get_util(struct sugov_cpu *sg_cpu)
>> >  {
>> >     struct rq *rq = this_rq();
>> > -   unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
>> > -                           >> BW_SHIFT;
>> >
>> > -   *max = arch_scale_cpu_capacity(NULL, smp_processor_id());
>> > +   sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
>> > +   sg_cpu->util_cfs = rq->cfs.avg.util_avg;
>> > +   sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE)
>> > +                     >> BW_SHIFT;
>> > +}
>> >
>> > +static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu)
>>
>> As Joel already mentioned, I don't think we should create two separate routines
>> here.
>>
>
> Mmm, it makes retrieving of utilization in sugov_update_shared and
> aggregating values for the domain in sugov_next_freq_shared cleaner,
> IMHO.
>

I agree, thanks.

-Joel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-07  3:56   ` Joel Fernandes
  2017-07-07 10:43     ` Juri Lelli
@ 2017-07-11 12:37     ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-11 12:37 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Juri Lelli, Ingo Molnar, Rafael J. Wysocki, Viresh Kumar, LKML,
	Linux PM, Thomas Gleixner, Vincent Guittot, Steven Rostedt,
	luca.abeni, claudio, Tommaso Cucinotta, bristot, mathieu.poirier,
	Todd Kjos, Andres Oportus, Morten Rasmussen, Dietmar Eggemann,
	Patrick Bellasi, Ingo Molnar, Rafael J . Wysocki

On Thu, Jul 06, 2017 at 08:56:32PM -0700, Joel Fernandes wrote:

> > @@ -3998,7 +3998,9 @@ static int __sched_setscheduler(struct task_struct *p,
> >         }
> >
> >         if (attr->sched_flags &
> > -               ~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> > +               ~(SCHED_FLAG_RESET_ON_FORK |
> > +                 SCHED_FLAG_RECLAIM |
> > +                 SCHED_FLAG_SPECIAL))
> >                 return -EINVAL;
> 
> I was thinking if its better to name SCHED_FLAG_SPECIAL something more
> descriptive than "special", since it will not be clear looking at
> other parts of the code that this is used for the frequency scaling
> thread or that its a DL task. I was thinking since this flag is
> specifically setup for the sugov thread frequency scaling purpose, a
> better flag name than SPECIAL might be SCHED_FLAG_DL_FREQ_SCALE or
> SCHED_FLAG_FREQ_SCALE. But I am Ok with whatever makes sense to do
> here. Thanks,

SCHED_FLAG_IM_DOING_IT_WRONG ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq
  2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
  2017-07-07  8:59   ` Viresh Kumar
@ 2017-07-11 16:17   ` Peter Zijlstra
  2017-07-11 17:18     ` Juri Lelli
  1 sibling, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:17 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Wed, Jul 05, 2017 at 09:59:02AM +0100, Juri Lelli wrote:
>  		delta_ns = time - j_sg_cpu->last_update;
>  		if (delta_ns > TICK_NSEC) {
>  			j_sg_cpu->iowait_boost = 0;
> -			continue;
> +			j_sg_cpu->util_cfs = 0;

this is slighly confusing. Is this because we might not 'continue' with
the new code?

> +			if (j_sg_cpu->util_dl == 0)
> +				continue;
>  		}
> +
>  		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
>  			return policy->cpuinfo.max_freq;
>  
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
  2017-07-07  3:56   ` Joel Fernandes
  2017-07-07  7:21   ` Viresh Kumar
@ 2017-07-11 16:18   ` Peter Zijlstra
  2017-07-11 17:02     ` Juri Lelli
  2 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-11 16:18 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Wed, Jul 05, 2017 at 09:59:00AM +0100, Juri Lelli wrote:
> @@ -4065,6 +4067,9 @@ static int __sched_setscheduler(struct task_struct *p,
>  	}
>  
>  	if (user) {
> +		if (attr->sched_flags & SCHED_FLAG_SPECIAL)
> +			return -EPERM;

Should be -EINVAL I think, as if the bit didn't exist at all (it
doesn't, from a userspace perspective).

> +
>  		retval = security_task_setscheduler(p);
>  		if (retval)
>  			return retval;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-07-11 16:18   ` Peter Zijlstra
@ 2017-07-11 17:02     ` Juri Lelli
  0 siblings, 0 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-11 17:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 11/07/17 18:18, Peter Zijlstra wrote:
> On Wed, Jul 05, 2017 at 09:59:00AM +0100, Juri Lelli wrote:
> > @@ -4065,6 +4067,9 @@ static int __sched_setscheduler(struct task_struct *p,
> >  	}
> >  
> >  	if (user) {
> > +		if (attr->sched_flags & SCHED_FLAG_SPECIAL)
> > +			return -EPERM;
> 
> Should be -EINVAL I think, as if the bit didn't exist at all (it
> doesn't, from a userspace perspective).
> 

Makes sense. I'll change it.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq
  2017-07-11 16:17   ` Peter Zijlstra
@ 2017-07-11 17:18     ` Juri Lelli
  0 siblings, 0 replies; 45+ messages in thread
From: Juri Lelli @ 2017-07-11 17:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 11/07/17 18:17, Peter Zijlstra wrote:
> On Wed, Jul 05, 2017 at 09:59:02AM +0100, Juri Lelli wrote:
> >  		delta_ns = time - j_sg_cpu->last_update;
> >  		if (delta_ns > TICK_NSEC) {
> >  			j_sg_cpu->iowait_boost = 0;
> > -			continue;
> > +			j_sg_cpu->util_cfs = 0;
> 
> this is slighly confusing. Is this because we might not 'continue' with
> the new code?
> 

This is because, after TICK_NSEC, we only want to discard CFS
contribution and (yes) continue (so don't take into account
j_sg_cpu contribution) if DEADLINE contribution is zero as well.

> > +			if (j_sg_cpu->util_dl == 0)
> > +				continue;
> >  		}
> > +

With this change we might not continue if some DEADLINE utilization is
present for j_sg_cpu.

> >  		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
> >  			return policy->cpuinfo.max_freq;
> >  
> > -- 
> > 2.11.0
> > 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
@ 2017-07-19  7:21   ` Peter Zijlstra
  2017-07-19  9:20     ` Juri Lelli
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-19  7:21 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
>  	if (unlikely(dl_entity_is_special(dl_se)))
>  		return;
>  
> -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> -	dl_se->runtime -= delta_exec;
> +	/*
> +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> +	 * spare reclaimed bandwidth is used to clock down frequency.
> +	 *
> +	 * For the others, we still need to scale reservation parameters
> +	 * according to current frequency and CPU maximum capacity.
> +	 */
> +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> +		scaled_delta_exec = grub_reclaim(delta_exec,
> +						 rq,
> +						 &curr->dl);
> +	} else {
> +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> +
> +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> +	}
> +
> +	dl_se->runtime -= scaled_delta_exec;
>  

This I don't get... 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-19  7:21   ` Peter Zijlstra
@ 2017-07-19  9:20     ` Juri Lelli
  2017-07-19 11:00       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-19  9:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 19/07/17 09:21, Peter Zijlstra wrote:
> On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> >  	if (unlikely(dl_entity_is_special(dl_se)))
> >  		return;
> >  
> > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > -	dl_se->runtime -= delta_exec;
> > +	/*
> > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > +	 *
> > +	 * For the others, we still need to scale reservation parameters
> > +	 * according to current frequency and CPU maximum capacity.
> > +	 */
> > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > +						 rq,
> > +						 &curr->dl);
> > +	} else {
> > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > +
> > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > +	}
> > +
> > +	dl_se->runtime -= scaled_delta_exec;
> >  
> 
> This I don't get... 


Considering that we use GRUB's active utilization to drive clock
frequency selection, rationale is that GRUB tasks don't need any special
scaling, as their delta_exec is already scaled according to GRUB rules.
OTOH, normal tasks need to have their runtime (delta_exec) explicitly
scaled considering current frequency (and CPU max capacity), otherwise
they are going to receive less runtime than granted at AC, when
frequency is reduced.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-19  9:20     ` Juri Lelli
@ 2017-07-19 11:00       ` Peter Zijlstra
  2017-07-19 11:16         ` Juri Lelli
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-19 11:00 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:
> On 19/07/17 09:21, Peter Zijlstra wrote:
> > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > >  		return;
> > >  
> > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > -	dl_se->runtime -= delta_exec;
> > > +	/*
> > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > +	 *
> > > +	 * For the others, we still need to scale reservation parameters
> > > +	 * according to current frequency and CPU maximum capacity.
> > > +	 */
> > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > +						 rq,
> > > +						 &curr->dl);
> > > +	} else {
> > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > +
> > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > +	}
> > > +
> > > +	dl_se->runtime -= scaled_delta_exec;
> > >  
> > 
> > This I don't get... 
> 
> 
> Considering that we use GRUB's active utilization to drive clock
> frequency selection, rationale is that GRUB tasks don't need any special
> scaling, as their delta_exec is already scaled according to GRUB rules.
> OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> scaled considering current frequency (and CPU max capacity), otherwise
> they are going to receive less runtime than granted at AC, when
> frequency is reduced.

I don't think that quite works out. Given that the frequency selection
will never quite end up at exactly the same fraction (if the hardware
listens to your requests at all).

Also, by not scaling the GRUB stuff, don't you run the risk of
attempting to hand out more idle time than there actually is?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-19 11:00       ` Peter Zijlstra
@ 2017-07-19 11:16         ` Juri Lelli
  2017-07-24 16:43           ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2017-07-19 11:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On 19/07/17 13:00, Peter Zijlstra wrote:
> On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:
> > On 19/07/17 09:21, Peter Zijlstra wrote:
> > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > > >  		return;
> > > >  
> > > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > -	dl_se->runtime -= delta_exec;
> > > > +	/*
> > > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > > +	 *
> > > > +	 * For the others, we still need to scale reservation parameters
> > > > +	 * according to current frequency and CPU maximum capacity.
> > > > +	 */
> > > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > > +						 rq,
> > > > +						 &curr->dl);
> > > > +	} else {
> > > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > +
> > > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > +	}
> > > > +
> > > > +	dl_se->runtime -= scaled_delta_exec;
> > > >  
> > > 
> > > This I don't get... 
> > 
> > 
> > Considering that we use GRUB's active utilization to drive clock
> > frequency selection, rationale is that GRUB tasks don't need any special
> > scaling, as their delta_exec is already scaled according to GRUB rules.
> > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > scaled considering current frequency (and CPU max capacity), otherwise
> > they are going to receive less runtime than granted at AC, when
> > frequency is reduced.
> 
> I don't think that quite works out. Given that the frequency selection
> will never quite end up at exactly the same fraction (if the hardware
> listens to your requests at all).
> 

It's an approximation yes (how big it depends on the granularity of the
available frequencies). But, for the !GRUB tasks it should be OK, as we
always select a frequency (among the available ones) bigger than the
current active utilization.

Also, for platforms/archs that don't redefine arch_scale_* this is not
used. In case they are defined instead the assumption is that either hw
listens to requests or scaling factors can be derived in some other ways
(avgs?).

> Also, by not scaling the GRUB stuff, don't you run the risk of
> attempting to hand out more idle time than there actually is?

The way I understand it is that for GRUB tasks we always scale
considering the "correct" factor. Then frequency could be higher, but
this spare idle time will be reclaimed by other GRUB tasks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-19 11:16         ` Juri Lelli
@ 2017-07-24 16:43           ` Peter Zijlstra
  2017-07-25  7:03             ` Luca Abeni
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-24 16:43 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote:
> On 19/07/17 13:00, Peter Zijlstra wrote:
> > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:
> > > On 19/07/17 09:21, Peter Zijlstra wrote:
> > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:
> > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > > > >  		return;
> > > > >  
> > > > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > > -	dl_se->runtime -= delta_exec;
> > > > > +	/*
> > > > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > > > +	 *
> > > > > +	 * For the others, we still need to scale reservation parameters
> > > > > +	 * according to current frequency and CPU maximum capacity.
> > > > > +	 */
> > > > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > > > +						 rq,
> > > > > +						 &curr->dl);
> > > > > +	} else {
> > > > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > > +
> > > > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > > +	}
> > > > > +
> > > > > +	dl_se->runtime -= scaled_delta_exec;
> > > > >  
> > > > 
> > > > This I don't get... 
> > > 
> > > 
> > > Considering that we use GRUB's active utilization to drive clock
> > > frequency selection, rationale is that GRUB tasks don't need any special
> > > scaling, as their delta_exec is already scaled according to GRUB rules.
> > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > > scaled considering current frequency (and CPU max capacity), otherwise
> > > they are going to receive less runtime than granted at AC, when
> > > frequency is reduced.
> > 
> > I don't think that quite works out. Given that the frequency selection
> > will never quite end up at exactly the same fraction (if the hardware
> > listens to your requests at all).
> > 
> 
> It's an approximation yes (how big it depends on the granularity of the
> available frequencies). But, for the !GRUB tasks it should be OK, as we
> always select a frequency (among the available ones) bigger than the
> current active utilization.
> 
> Also, for platforms/archs that don't redefine arch_scale_* this is not
> used. In case they are defined instead the assumption is that either hw
> listens to requests or scaling factors can be derived in some other ways
> (avgs?).
> 
> > Also, by not scaling the GRUB stuff, don't you run the risk of
> > attempting to hand out more idle time than there actually is?
> 
> The way I understand it is that for GRUB tasks we always scale
> considering the "correct" factor. Then frequency could be higher, but
> this spare idle time will be reclaimed by other GRUB tasks.

I'm still confused..

So GRUB does:

	dq = Uact -dt

right? (yeah, I know, the code does something a little more complicated,
but it should still be more or less the same of you take out the 'extra'
bits).

Now, you do DVFS using that same Uact. If we lower the clock, we need
more time, so would we then not end up with something like:

	dq = 1/Uact -dt

After all; our budget assignment is such that we're able to complete
our work at max freq. Therefore, when we lower the frequency, we'll have
to increase budget pro rata, otherwise we'll not complete our work and
badness happens.

Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
double the time to complete.

Now, if we fold these two together, you'd get:

	dq = Uact/Uact -dt = -dt

Because, after all, if we lowered the clock to consume all idle time,
there's no further idle time to reclaim.

Now, of course, our DVFS assignment isn't as precise nor as
deterministic as this, so we'll get a slightly different ratio, lets
call that Udvfs.

So would then not GRUB change into something like:

	dq = Uact/Udvfs -dt

Allowing it to only reclaim that idle time that exists because our DVFS
level is strictly higher than required?

This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it
to reclaim just the additional 100Mhz of idle time.


Or am I completely off the rails now?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-24 16:43           ` Peter Zijlstra
@ 2017-07-25  7:03             ` Luca Abeni
  2017-07-25 13:51               ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Luca Abeni @ 2017-07-25  7:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

Hi Peter,

On Mon, 24 Jul 2017 18:43:49 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Jul 19, 2017 at 12:16:24PM +0100, Juri Lelli wrote:
> > On 19/07/17 13:00, Peter Zijlstra wrote:  
> > > On Wed, Jul 19, 2017 at 10:20:29AM +0100, Juri Lelli wrote:  
> > > > On 19/07/17 09:21, Peter Zijlstra wrote:  
> > > > > On Wed, Jul 05, 2017 at 09:59:05AM +0100, Juri Lelli wrote:  
> > > > > > @@ -1156,9 +1157,26 @@ static void update_curr_dl(struct rq *rq)
> > > > > >  	if (unlikely(dl_entity_is_special(dl_se)))
> > > > > >  		return;
> > > > > >  
> > > > > > -	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
> > > > > > -		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
> > > > > > -	dl_se->runtime -= delta_exec;
> > > > > > +	/*
> > > > > > +	 * For tasks that participate in GRUB, we implement GRUB-PA: the
> > > > > > +	 * spare reclaimed bandwidth is used to clock down frequency.
> > > > > > +	 *
> > > > > > +	 * For the others, we still need to scale reservation parameters
> > > > > > +	 * according to current frequency and CPU maximum capacity.
> > > > > > +	 */
> > > > > > +	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
> > > > > > +		scaled_delta_exec = grub_reclaim(delta_exec,
> > > > > > +						 rq,
> > > > > > +						 &curr->dl);
> > > > > > +	} else {
> > > > > > +		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
> > > > > > +		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
> > > > > > +
> > > > > > +		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
> > > > > > +		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
> > > > > > +	}
> > > > > > +
> > > > > > +	dl_se->runtime -= scaled_delta_exec;
> > > > > >    
> > > > > 
> > > > > This I don't get...   
> > > > 
> > > > 
> > > > Considering that we use GRUB's active utilization to drive clock
> > > > frequency selection, rationale is that GRUB tasks don't need any special
> > > > scaling, as their delta_exec is already scaled according to GRUB rules.
> > > > OTOH, normal tasks need to have their runtime (delta_exec) explicitly
> > > > scaled considering current frequency (and CPU max capacity), otherwise
> > > > they are going to receive less runtime than granted at AC, when
> > > > frequency is reduced.  
> > > 
> > > I don't think that quite works out. Given that the frequency selection
> > > will never quite end up at exactly the same fraction (if the hardware
> > > listens to your requests at all).
> > >   
> > 
> > It's an approximation yes (how big it depends on the granularity of the
> > available frequencies). But, for the !GRUB tasks it should be OK, as we
> > always select a frequency (among the available ones) bigger than the
> > current active utilization.
> > 
> > Also, for platforms/archs that don't redefine arch_scale_* this is not
> > used. In case they are defined instead the assumption is that either hw
> > listens to requests or scaling factors can be derived in some other ways
> > (avgs?).
> >   
> > > Also, by not scaling the GRUB stuff, don't you run the risk of
> > > attempting to hand out more idle time than there actually is?  
> > 
> > The way I understand it is that for GRUB tasks we always scale
> > considering the "correct" factor. Then frequency could be higher, but
> > this spare idle time will be reclaimed by other GRUB tasks.  
> 
> I'm still confused..
> 
> So GRUB does:
> 
> 	dq = Uact -dt
> 
> right?

Right. This is what the original (single processor) GRUB did. And this
was used by the "GRUB-PA" algorithm:
https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf

(basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU
frequency based on Uact)


> Now, you do DVFS using that same Uact. If we lower the clock, we need
> more time, so would we then not end up with something like:
> 
> 	dq = 1/Uact -dt

Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to
give more runtime to the task... Since Uact is < 1, doing
	dq = - Uact * dt
means that we decrease the current runtime by a smaller amount of time.
And so we end up giving more runtime to the task: instead of giving 
dl_runtime every dl_period, we give "dl_runtime / Uact" every
dl_period... And since the CPU is slower (by a ratio Uact), this is
equivalent to giving dl_runtime at the maximum CPU speed / frequency
(at least, in theory :).


> After all; our budget assignment is such that we're able to complete
> our work at max freq. Therefore, when we lower the frequency, we'll have
> to increase budget pro rata, otherwise we'll not complete our work and
> badness happens.

Right. But instead of increasing dl_runtime, GRUB-PA decreases the
amount of time accounted to the current runtime.


> Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
> double the time to complete.
> 
> Now, if we fold these two together, you'd get:
> 
> 	dq = Uact/Uact -dt = -dt

Not sure why " / Uact"... According to the GRUB-PA algorithm, you just
do
	dq = - Uact * dt = -0.5dt
and you end up giving the CPU to the task for 2 * dl_runtime every
dl_period (as expected)

> Because, after all, if we lowered the clock to consume all idle time,
> there's no further idle time to reclaim.

The idea of GRUB-PA is that you do not change dl_runtime... So, the
task is still "nominally reserved" dl_runtime every dl_period (in
this example, 1/2*dl_period every dl_period)... It is the reclaiming
mechanism that allows the task to execute for dl_runtime/Uact every
dl_period (in this example, for dl_period every dl_period, so it
reclaims 1/2dl_period every dl_period).


> Now, of course, our DVFS assignment isn't as precise nor as
> deterministic as this, so we'll get a slightly different ratio, lets
> call that Udvfs.

This is where GRUB-PA starts to have issues... :)
The implicit assumption is (I think) that is the DVFS mechanism cannot
assign exactly the requested frequency then it makes a "conservative"
assignment (assigning a frequency higher than the requested one).


> So would then not GRUB change into something like:
> 
> 	dq = Uact/Udvfs -dt
> 
> Allowing it to only reclaim that idle time that exists because our DVFS
> level is strictly higher than required?

I think GRUB should still do "dq = -Uact * dt", trying to reclaim all
the idle CPU time (up to the configured limit, of course).



				Luca

> 
> This way, on our 1 GHz part, with Uact=.5 but Udvfs=.6, we'll allow it
> to reclaim just the additional 100Mhz of idle time.
> 
> 
> Or am I completely off the rails now?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-25  7:03             ` Luca Abeni
@ 2017-07-25 13:51               ` Peter Zijlstra
  2017-07-26 13:50                 ` luca abeni
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-25 13:51 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Juri Lelli, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Tue, Jul 25, 2017 at 09:03:08AM +0200, Luca Abeni wrote:

> > I'm still confused..
> > 
> > So GRUB does:
> > 
> > 	dq = Uact -dt
> > 
> > right?
> 
> Right. This is what the original (single processor) GRUB did. And this
> was used by the "GRUB-PA" algorithm:
> https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf
> 
> (basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU
> frequency based on Uact)
> 
> 
> > Now, you do DVFS using that same Uact. If we lower the clock, we need
> > more time, so would we then not end up with something like:
> > 
> > 	dq = 1/Uact -dt
> 
> Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism used to
> give more runtime to the task... Since Uact is < 1, doing
> 	dq = - Uact * dt
> means that we decrease the current runtime by a smaller amount of time.
> And so we end up giving more runtime to the task: instead of giving 
> dl_runtime every dl_period, we give "dl_runtime / Uact" every
> dl_period... And since the CPU is slower (by a ratio Uact), this is
> equivalent to giving dl_runtime at the maximum CPU speed / frequency
> (at least, in theory :).
> 
> 
> > After all; our budget assignment is such that we're able to complete
> > our work at max freq. Therefore, when we lower the frequency, we'll have
> > to increase budget pro rata, otherwise we'll not complete our work and
> > badness happens.
> 
> Right. But instead of increasing dl_runtime, GRUB-PA decreases the
> amount of time accounted to the current runtime.
> 
> 
> > Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
> > double the time to complete.
> > 
> > Now, if we fold these two together, you'd get:
> > 
> > 	dq = Uact/Uact -dt = -dt
> 
> Not sure why " / Uact"... According to the GRUB-PA algorithm, you just
> do
> 	dq = - Uact * dt = -0.5dt
> and you end up giving the CPU to the task for 2 * dl_runtime every
> dl_period (as expected)

Yeah, I seem to have gone off the rails there... Bah I'm terminally
confused now. Let me try and get my brain the right way up.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-07-25 13:51               ` Peter Zijlstra
@ 2017-07-26 13:50                 ` luca abeni
  0 siblings, 0 replies; 45+ messages in thread
From: luca abeni @ 2017-07-26 13:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Tue, 25 Jul 2017 15:51:05 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Jul 25, 2017 at 09:03:08AM +0200, Luca Abeni wrote:
> 
> > > I'm still confused..
> > > 
> > > So GRUB does:
> > > 
> > > 	dq = Uact -dt
> > > 
> > > right?  
> > 
> > Right. This is what the original (single processor) GRUB did. And
> > this was used by the "GRUB-PA" algorithm:
> > https://www.researchgate.net/profile/Giuseppe_Lipari/publication/220800940_Using_resource_reservation_techniques_for_power-aware_scheduling/links/09e41513639b2703fc000000.pdf
> > 
> > (basically, GRUB-PA uses GRUB for reclaiming, and scales the CPU
> > frequency based on Uact)
> > 
> >   
> > > Now, you do DVFS using that same Uact. If we lower the clock, we
> > > need more time, so would we then not end up with something like:
> > > 
> > > 	dq = 1/Uact -dt  
> > 
> > Well, in the GRUB-PA algorithm GRUB reclaiming is the mechanism
> > used to give more runtime to the task... Since Uact is < 1, doing
> > 	dq = - Uact * dt
> > means that we decrease the current runtime by a smaller amount of
> > time. And so we end up giving more runtime to the task: instead of
> > giving dl_runtime every dl_period, we give "dl_runtime / Uact" every
> > dl_period... And since the CPU is slower (by a ratio Uact), this is
> > equivalent to giving dl_runtime at the maximum CPU speed / frequency
> > (at least, in theory :).
> > 
> >   
> > > After all; our budget assignment is such that we're able to
> > > complete our work at max freq. Therefore, when we lower the
> > > frequency, we'll have to increase budget pro rata, otherwise
> > > we'll not complete our work and badness happens.  
> > 
> > Right. But instead of increasing dl_runtime, GRUB-PA decreases the
> > amount of time accounted to the current runtime.
> > 
> >   
> > > Say we have a 1 Ghz part and Uact=0.5 we'd select 500 Mhz and need
> > > double the time to complete.
> > > 
> > > Now, if we fold these two together, you'd get:
> > > 
> > > 	dq = Uact/Uact -dt = -dt  
> > 
> > Not sure why " / Uact"... According to the GRUB-PA algorithm, you
> > just do
> > 	dq = - Uact * dt = -0.5dt
> > and you end up giving the CPU to the task for 2 * dl_runtime every
> > dl_period (as expected)  
> 
> Yeah, I seem to have gone off the rails there... Bah I'm terminally
> confused now. Let me try and get my brain the right way up.

This stuff always confuses me too... :)

The parts that gives me more headaches is how to combine GRUB-PA with
non-reclaiming tasks, and how to cope with "real world issues" (such as
an actual DVFS frequency different from the theoretical one, GRUB
reclaiming less than 100% of the CPU time, etc...)

Anyway, Claudio is running some experiments with this patchset,
measuring power saving and missed deadlines for various sets of
periodic real-time tasks... We hope to present the results at RTLWS.


			Luca

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-07-26 13:50 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-05  8:58 [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
2017-07-05  8:58 ` [RFC PATCH v1 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
2017-07-07  7:20   ` Viresh Kumar
2017-07-05  8:58 ` [RFC PATCH v1 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
2017-07-07  7:21   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
2017-07-07  3:56   ` Joel Fernandes
2017-07-07 10:43     ` Juri Lelli
2017-07-07 10:46       ` Thomas Gleixner
2017-07-07 10:53         ` Juri Lelli
2017-07-07 13:11           ` Rafael J. Wysocki
2017-07-07 21:58             ` Steven Rostedt
2017-07-07 22:07               ` Joel Fernandes
2017-07-07 22:15                 ` Steven Rostedt
2017-07-07 22:57                   ` Joel Fernandes
2017-07-11 12:37     ` Peter Zijlstra
2017-07-07  7:21   ` Viresh Kumar
2017-07-11 16:18   ` Peter Zijlstra
2017-07-11 17:02     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
2017-07-07  3:26   ` Joel Fernandes
2017-07-07  8:58   ` Viresh Kumar
2017-07-07 10:59     ` Juri Lelli
2017-07-10  7:46       ` Joel Fernandes
2017-07-10  7:05   ` Viresh Kumar
2017-07-05  8:59 ` [RFC PATCH v1 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
2017-07-07  8:59   ` Viresh Kumar
2017-07-11 16:17   ` Peter Zijlstra
2017-07-11 17:18     ` Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
2017-07-05  8:59 ` [RFC PATCH v1 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
2017-07-07 22:04   ` Steven Rostedt
2017-07-05  8:59 ` [RFC PATCH v1 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
2017-07-19  7:21   ` Peter Zijlstra
2017-07-19  9:20     ` Juri Lelli
2017-07-19 11:00       ` Peter Zijlstra
2017-07-19 11:16         ` Juri Lelli
2017-07-24 16:43           ` Peter Zijlstra
2017-07-25  7:03             ` Luca Abeni
2017-07-25 13:51               ` Peter Zijlstra
2017-07-26 13:50                 ` luca abeni
2017-07-06 15:57 ` [RFC PATCH v1 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Steven Rostedt
2017-07-06 16:08   ` Juri Lelli
2017-07-06 16:15   ` Peter Zijlstra
2017-07-06 21:08 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).