linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
@ 2017-05-23  8:53 Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli

Hi,

this RFC set implements frequency/cpu invariance and OPP selection for
SCHED_DEADLINE. The set has been slightly tested on a Juno platform. The
current incarnation of the patches stems both from previous RFD[1] review
comments and discussion at OSPM-summit[2], during which we seemed to agree
that:

 - we probably want to use running_bw (instead of this_bw), as it is less
   pessimistic (we should save more energy)
 - special kworker hack seems acceptable as a mid term solution to foster
   further SCHED_DEADLINE/schedutil development/adoption

A point that is still very much up for discussion (more that the others :) is
how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
grub_reclaim(), as the function already scales their reservation runtime
considering other reservations and maximum bandwidth a CPU has to offer.
However, for normal !RECLAIM tasks multiple things can be implemented which
seem to make sense:

 - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
   by AC
 - go to max as soon as a normal task in enqueued: this because dimensioning of
   parameters is usually done at max OPP/biggest CPU and normal task assume
   that this is always the condition when they run
 - scale runtime acconding to current frequency and max CPU capacity: this is
   what this set is currently implementing

Opinions?

The set is based on tip/sched/core as of today (a9e7f6544b9c) plus some
schedutil fixes coming from linux-pm/linux-next and Luca's "CPU reclaiming for
SCHED_DEADLINE" [3].

Patches high level description:

 o [01-02]/08 add the necessary links to start accounting DEADLINE contribution
              to OPP selection 
 o 03/08      it's a temporary solution to make possible (on ARM) to change
              frequency for DEADLINE tasks (that would possibly delay the SCHED_FIFO
              worker kthread); proper solution would be to be able to issue frequency
              transition from an atomic ctx
 o [04-05]/08 it's a schedutil change that copes with the fact that DEADLINE
              doesn't require periodic OPP selection triggering point
 o [06-07]/08 make arch_scale_{freq,cpu}_capacity() function available on !CONFIG_SMP
              configurations too
 o 08/08      implements frequency/cpu invariance for tasks' reservation
              parameters; which basically means that we implement GRUB-PA [4]

Changes w.r.t. RFD:

 - use grub_reclaim for RECLAIM and scale freq/cpu for !RECLAIM
 - discard CFS contribution only, after TICK_NSEC
 - added patches 06 and 07 to fix !CONFIG_SMP builds

Please have a look. Feedback and comments are, as usual, more than welcome.

In case you would like to test this out:

 git://linux-arm.org/linux-jl.git upstream/deadline/freq-rfc

Best,

- Juri

[1] http://marc.info/?l=linux-kernel&m=149036457909119&w=2
[2] http://retis.sssup.it/ospm-summit/program.html
    https://lwn.net/Articles/721573/
[3] http://marc.info/?l=linux-kernel&m=149513848804404
[4] C. Scordino, G. Lipari, A Resource Reservation Algorithm for Power-Aware
    Scheduling of Periodic and Aperiodic Real-Time Tasks, IEEE Transactions
    on Computers, December 2006.

Juri Lelli (8):
  sched/cpufreq_schedutil: make use of DEADLINE utilization signal
  sched/deadline: move cpu frequency selection triggering points
  sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  sched/cpufreq_schedutil: split utilization signals
  sched/cpufreq_schedutil: always consider all CPUs when deciding next
    freq
  sched/sched.h: remove sd arch_scale_freq_capacity parameter
  sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
  sched/deadline: make bandwidth enforcement scale-invariant

 include/linux/sched.h            |  1 +
 include/linux/sched/cpufreq.h    |  2 --
 include/linux/sched/topology.h   | 12 ++++----
 include/uapi/linux/sched.h       |  1 +
 kernel/sched/core.c              | 19 ++++++++++--
 kernel/sched/cpufreq_schedutil.c | 62 ++++++++++++++++++++++++----------------
 kernel/sched/deadline.c          | 39 ++++++++++++++++++++-----
 kernel/sched/fair.c              |  4 +--
 kernel/sched/sched.h             | 27 +++++++++++++----
 9 files changed, 116 insertions(+), 51 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

SCHED_DEADLINE tracks active utilization signal with a per dl_rq
variable named running_bw.

Make use of that to drive cpu frequency selection: add up FAIR and
DEADLINE contribution to get the required CPU capacity to handle both
requirements (while RT still selects max frequency).

Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
---
 include/linux/sched/cpufreq.h    |  2 --
 kernel/sched/cpufreq_schedutil.c | 13 ++++++-------
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d2be2ccbb372..39640bb3a8ee 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -11,8 +11,6 @@
 #define SCHED_CPUFREQ_DL	(1U << 1)
 #define SCHED_CPUFREQ_IOWAIT	(1U << 2)
 
-#define SCHED_CPUFREQ_RT_DL	(SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
-
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
        void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 622eed1b7658..7f1913e265d1 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -160,12 +160,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 static void sugov_get_util(unsigned long *util, unsigned long *max)
 {
 	struct rq *rq = this_rq();
-	unsigned long cfs_max;
+	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
 
-	cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
 
-	*util = min(rq->cfs.avg.util_avg, cfs_max);
-	*max = cfs_max;
+	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -229,7 +228,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
 	busy = sugov_cpu_is_busy(sg_cpu);
 
-	if (flags & SCHED_CPUFREQ_RT_DL) {
+	if (flags & SCHED_CPUFREQ_RT) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
 		sugov_get_util(&util, &max);
@@ -269,7 +268,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 			j_sg_cpu->iowait_boost = 0;
 			continue;
 		}
-		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
 		j_util = j_sg_cpu->util;
@@ -305,7 +304,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		if (flags & SCHED_CPUFREQ_RT_DL)
+		if (flags & SCHED_CPUFREQ_RT)
 			next_f = sg_policy->policy->cpuinfo.max_freq;
 		else
 			next_f = sugov_next_freq_shared(sg_cpu, time);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 2/8] sched/deadline: move cpu frequency selection triggering points
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Since SCHED_DEADLINE doesn't track utilization signal (but reserves a
fraction of CPU bandwidth to tasks admitted to the system), there is no
point in evaluating frequency changes during each tick event.

Move frequency selection triggering points to where running_bw changes.

Co-authored-by: Claudio Scordino <claudio@evidence.eu.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
---
 kernel/sched/deadline.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1da44b36fae0..fed54b078240 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -52,6 +52,8 @@ void add_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 	dl_rq->running_bw += dl_bw;
 	SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */
 	SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
 }
 
 static inline
@@ -64,6 +66,8 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)
 	SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
 	if (dl_rq->running_bw > old)
 		dl_rq->running_bw = 0;
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
+	cpufreq_update_this_cpu(rq_of_dl_rq(dl_rq), SCHED_CPUFREQ_DL);
 }
 
 static inline
@@ -1021,9 +1025,6 @@ static void update_curr_dl(struct rq *rq)
 		return;
 	}
 
-	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_this_cpu(rq, SCHED_CPUFREQ_DL);
-
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23 18:52   ` Peter Zijlstra
  2017-05-23  8:53 ` [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Worker kthread needs to be able to change frequency for all other
threads.

Make it special, just under STOP class.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 include/linux/sched.h            |  1 +
 include/uapi/linux/sched.h       |  1 +
 kernel/sched/core.c              | 19 +++++++++++++++++--
 kernel/sched/cpufreq_schedutil.c | 15 ++++++++++++---
 kernel/sched/deadline.c          |  6 ++++++
 kernel/sched/sched.h             |  8 +++++++-
 6 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8127f8bbef56..fd30bec5e199 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1360,6 +1360,7 @@ extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern int sched_setattr(struct task_struct *, const struct sched_attr *);
+extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
 extern struct task_struct *idle_task(int cpu);
 
 /**
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index e2a6c7b3510b..72723859ef74 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -48,5 +48,6 @@
  */
 #define SCHED_FLAG_RESET_ON_FORK	0x01
 #define SCHED_FLAG_RECLAIM		0x02
+#define SCHED_FLAG_SPECIAL		0x04
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7fc2011c3ce7..ba57e2ef9aef 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2495,6 +2495,9 @@ static int dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;
 
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return 0;
+
 	/* !deadline task may carry old deadline bandwidth */
 	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
 		return 0;
@@ -4119,6 +4122,10 @@ __getparam_dl(struct task_struct *p, struct sched_attr *attr)
 static bool
 __checkparam_dl(const struct sched_attr *attr)
 {
+	/* special dl tasks don't actually use any parameter */
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return true;
+
 	/* deadline != 0 */
 	if (attr->sched_deadline == 0)
 		return false;
@@ -4205,7 +4212,9 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	if (attr->sched_flags &
-		~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+		~(SCHED_FLAG_RESET_ON_FORK |
+		  SCHED_FLAG_RECLAIM |
+		  SCHED_FLAG_SPECIAL))
 		return -EINVAL;
 
 	/*
@@ -4327,7 +4336,8 @@ static int __sched_setscheduler(struct task_struct *p,
 		}
 #endif
 #ifdef CONFIG_SMP
-		if (dl_bandwidth_enabled() && dl_policy(policy)) {
+		if (dl_bandwidth_enabled() && dl_policy(policy) &&
+				!(attr->sched_flags & SCHED_FLAG_SPECIAL)) {
 			cpumask_t *span = rq->rd->span;
 
 			/*
@@ -4457,6 +4467,11 @@ int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
 }
 EXPORT_SYMBOL_GPL(sched_setattr);
 
+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
+{
+	return __sched_setscheduler(p, attr, false, true);
+}
+
 /**
  * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
  * @p: the task in question.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 7f1913e265d1..1508109c7f19 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -421,7 +421,16 @@ static void sugov_policy_free(struct sugov_policy *sg_policy)
 static int sugov_kthread_create(struct sugov_policy *sg_policy)
 {
 	struct task_struct *thread;
-	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+	struct sched_attr attr = {
+		.size = sizeof(struct sched_attr),
+		.sched_policy = SCHED_DEADLINE,
+		.sched_flags = SCHED_FLAG_SPECIAL,
+		.sched_nice = 0,
+		.sched_priority = 0,
+		.sched_runtime = 0,
+		.sched_deadline = 0,
+		.sched_period = 0,
+	};
 	struct cpufreq_policy *policy = sg_policy->policy;
 	int ret;
 
@@ -439,10 +448,10 @@ static int sugov_kthread_create(struct sugov_policy *sg_policy)
 		return PTR_ERR(thread);
 	}
 
-	ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
+	ret = sched_setattr_nocheck(thread, &attr);
 	if (ret) {
 		kthread_stop(thread);
-		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
 		return ret;
 	}
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fed54b078240..5ee4fd9b1c7f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -187,6 +187,9 @@ static void task_non_contending(struct task_struct *p)
 	if (dl_se->dl_runtime == 0)
 		return;
 
+	if (unlikely(dl_entity_is_special(dl_se)))
+		return;
+
 	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
 	WARN_ON(dl_se->dl_non_contending);
 
@@ -1036,6 +1039,9 @@ static void update_curr_dl(struct rq *rq)
 
 	sched_rt_avg_update(rq, delta_exec);
 
+	if (unlikely(dl_entity_is_special(dl_se)))
+		return;
+
 	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
 		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
 	dl_se->runtime -= delta_exec;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f1e400c6403c..e48cc8762e04 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,13 +155,19 @@ static inline int task_has_dl_policy(struct task_struct *p)
 	return dl_policy(p->policy);
 }
 
+static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
+{
+	return dl_se->flags & SCHED_FLAG_SPECIAL;
+}
+
 /*
  * Tells if entity @a should preempt entity @b.
  */
 static inline bool
 dl_entity_preempt(struct sched_dl_entity *a, struct sched_dl_entity *b)
 {
-	return dl_time_before(a->deadline, b->deadline);
+	return dl_entity_is_special(a) ||
+	       dl_time_before(a->deadline, b->deadline);
 }
 
 /*
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (2 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23 19:04   ` Peter Zijlstra
  2017-05-23 19:29   ` Peter Zijlstra
  2017-05-23  8:53 ` [PATCH RFC 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

To be able to treat utilization signals of different scheduling classes
in different ways (e.g., CFS signal might be stale while DEADLINE signal
is never stale by design) we need to split sugov_cpu::util signal in two:
util_cfs and util_dl.

This patch does that by also changing sugov_get_util() parameter list.
After this change aggregation of the different signals has to be performed
by sugov_get_util() users (so that they can decide what to do with the
different signals).

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 kernel/sched/cpufreq_schedutil.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 1508109c7f19..f930cec4c3d4 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -58,7 +58,8 @@ struct sugov_cpu {
 	u64 last_update;
 
 	/* The fields below are only needed when sharing a policy. */
-	unsigned long util;
+	unsigned long util_cfs;
+	unsigned long util_dl;
 	unsigned long max;
 	unsigned int flags;
 
@@ -157,14 +158,13 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-static void sugov_get_util(unsigned long *util, unsigned long *max)
+static void sugov_get_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = this_rq();
-	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
 
-	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
-
-	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
+	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
+	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
+	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
 }
 
 static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -231,7 +231,9 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 	if (flags & SCHED_CPUFREQ_RT) {
 		next_f = policy->cpuinfo.max_freq;
 	} else {
-		sugov_get_util(&util, &max);
+		sugov_get_util(sg_cpu);
+		max = sg_cpu->max;
+		util = min(sg_cpu->util_cfs + sg_cpu->util_dl, max);
 		sugov_iowait_boost(sg_cpu, &util, &max);
 		next_f = get_next_freq(sg_policy, util, max);
 		/*
@@ -271,8 +273,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
-		j_util = j_sg_cpu->util;
 		j_max = j_sg_cpu->max;
+		j_util = min(j_sg_cpu->util_cfs + j_sg_cpu->util_dl, j_max);
 		if (j_util * max > j_max * util) {
 			util = j_util;
 			max = j_max;
@@ -289,15 +291,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
-	unsigned long util, max;
 	unsigned int next_f;
 
-	sugov_get_util(&util, &max);
 
 	raw_spin_lock(&sg_policy->update_lock);
 
-	sg_cpu->util = util;
-	sg_cpu->max = max;
+	sugov_get_util(sg_cpu);
 	sg_cpu->flags = flags;
 
 	sugov_set_iowait_boost(sg_cpu, time, flags);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (3 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

No assumption can be made upon the rate at which frequency updates get
triggered, as there are scheduling policies (like SCHED_DEADLINE) which
don't trigger them so frequently.

Remove such assumption from the code, by always considering
SCHED_DEADLINE utilization signal as not stale.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
Changes from RFD
 - discard CFS contribution only as stale (as suggested by Rafael)
---
 kernel/sched/cpufreq_schedutil.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index f930cec4c3d4..688bd11c2641 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -259,17 +259,22 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
 		s64 delta_ns;
 
 		/*
-		 * If the CPU utilization was last updated before the previous
-		 * frequency update and the time elapsed between the last update
-		 * of the CPU utilization and the last frequency update is long
-		 * enough, don't take the CPU into account as it probably is
-		 * idle now (and clear iowait_boost for it).
+		 * If the CFS CPU utilization was last updated before the
+		 * previous frequency update and the time elapsed between the
+		 * last update of the CPU utilization and the last frequency
+		 * update is long enough, reset iowait_boost and util_cfs, as
+		 * they are now probably stale. However, still consider the
+		 * CPU contribution if it has some DEADLINE utilization
+		 * (util_dl).
 		 */
 		delta_ns = time - j_sg_cpu->last_update;
 		if (delta_ns > TICK_NSEC) {
 			j_sg_cpu->iowait_boost = 0;
-			continue;
+			j_sg_cpu->util_cfs = 0;
+			if (j_sg_cpu->util_dl == 0)
+				continue;
 		}
+
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (4 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar

sd parameter is never used in arch_scale_freq_capacity (and it's hard to
see where information coming from scheduling domains might help doing
frequency invariance scaling).

Remove it; also in anticipation of moving arch_scale_freq_capacity
outside CONFIG_SMP.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 2 +-
 kernel/sched/sched.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 219fe58e3023..b0f31064bbbd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2812,7 +2812,7 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
 	u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
 	u64 periods;
 
-	scale_freq = arch_scale_freq_capacity(NULL, cpu);
+	scale_freq = arch_scale_freq_capacity(cpu);
 	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
 
 	delta += sa->period_contrib;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e48cc8762e04..063fd8c47e75 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1641,7 +1641,7 @@ extern void sched_avg_update(struct rq *rq);
 
 #ifndef arch_scale_freq_capacity
 static __always_inline
-unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
+unsigned long arch_scale_freq_capacity(int cpu)
 {
 	return SCHED_CAPACITY_SCALE;
 }
@@ -1660,7 +1660,7 @@ unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 {
-	rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
+	rq->rt_avg += rt_delta * arch_scale_freq_capacity(cpu_of(rq));
 	sched_avg_update(rq);
 }
 #else
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (5 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23  8:53 ` [PATCH RFC 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
  2017-05-23 20:23 ` [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Peter Zijlstra
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar

Currently, frequency and cpu capacity scaling is only performed on
CONFIG_SMP system (as CFS PELT signals are only present for such systems
as well). However, other scheduling class want to do freq/cpu scaling as
well, and for !CONFIG_SMP configurations too.

arch_scale_freq_capacity is useful to implement frequency scaling even
on !CONFIG_SMP platforms, so we simply move it outside CONFIG_SMP
ifdeffery.

Even if arch_scale_cpu_capacity is not useful on !CONFIG_SMP platforms,
we make a default implementation available for such configurations anyway
to simplify scheduler code doing CPU scale invariance.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/topology.h | 12 ++++++------
 kernel/sched/sched.h           | 13 ++++++++++---
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7d065abc7a47..953cf4f889ec 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -6,6 +6,12 @@
 #include <linux/sched/idle.h>
 
 /*
+ * Increase resolution of cpu_capacity calculations
+ */
+#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
+#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
+
+/*
  * sched-domains (multiprocessor balancing) declarations:
  */
 #ifdef CONFIG_SMP
@@ -26,12 +32,6 @@
 #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
 #define SD_NUMA			0x4000	/* cross-node balancing */
 
-/*
- * Increase resolution of cpu_capacity calculations
- */
-#define SCHED_CAPACITY_SHIFT	SCHED_FIXEDPOINT_SHIFT
-#define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
-
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 063fd8c47e75..cc474c62cd18 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1636,9 +1636,6 @@ static inline int hrtick_enabled(struct rq *rq)
 
 #endif /* CONFIG_SCHED_HRTICK */
 
-#ifdef CONFIG_SMP
-extern void sched_avg_update(struct rq *rq);
-
 #ifndef arch_scale_freq_capacity
 static __always_inline
 unsigned long arch_scale_freq_capacity(int cpu)
@@ -1647,6 +1644,9 @@ unsigned long arch_scale_freq_capacity(int cpu)
 }
 #endif
 
+#ifdef CONFIG_SMP
+extern void sched_avg_update(struct rq *rq);
+
 #ifndef arch_scale_cpu_capacity
 static __always_inline
 unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
@@ -1664,6 +1664,13 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 	sched_avg_update(rq);
 }
 #else
+#ifndef arch_scale_cpu_capacity
+static __always_inline
+unsigned long arch_scale_cpu_capacity(void __always_unused *sd, int cpu)
+{
+	return SCHED_CAPACITY_SCALE;
+}
+#endif
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
 #endif
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC 8/8] sched/deadline: make bandwidth enforcement scale-invariant
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (6 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
@ 2017-05-23  8:53 ` Juri Lelli
  2017-05-23 20:23 ` [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Peter Zijlstra
  8 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-23  8:53 UTC (permalink / raw)
  To: peterz, mingo, rjw, viresh.kumar
  Cc: linux-kernel, linux-pm, tglx, vincent.guittot, rostedt,
	luca.abeni, claudio, tommaso.cucinotta, bristot, mathieu.poirier,
	tkjos, joelaf, andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, juri.lelli, Ingo Molnar, Rafael J . Wysocki

Apply frequency and cpu scale-invariance correction factor to bandwidth
enforcement (similar to what we already do to fair utilization tracking).

Each delta_exec gets scaled considering current frequency and maximum
cpu capacity; which means that the reservation runtime parameter (that
need to be specified profiling the task execution at max frequency on
biggest capacity core) gets thus scaled accordingly.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
Changes from RFD
 - either apply grub_reclaim or perform freq/cpu scaling;
   what's the correct thing to do it's actually very much up
   for discussion
---
 kernel/sched/deadline.c | 26 ++++++++++++++++++++++----
 kernel/sched/fair.c     |  2 --
 kernel/sched/sched.h    |  2 ++
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5ee4fd9b1c7f..b6c3886478c3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1008,7 +1008,8 @@ static void update_curr_dl(struct rq *rq)
 {
 	struct task_struct *curr = rq->curr;
 	struct sched_dl_entity *dl_se = &curr->dl;
-	u64 delta_exec;
+	u64 delta_exec, scaled_delta_exec;
+	int cpu = cpu_of(rq);
 
 	if (!dl_task(curr) || !on_dl_rq(dl_se))
 		return;
@@ -1042,9 +1043,26 @@ static void update_curr_dl(struct rq *rq)
 	if (unlikely(dl_entity_is_special(dl_se)))
 		return;
 
-	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
-		delta_exec = grub_reclaim(delta_exec, rq, &curr->dl);
-	dl_se->runtime -= delta_exec;
+	/*
+	 * For tasks that participate in GRUB, we implement GRUB-PA: the
+	 * spare reclaimed bandwidth is used to clock down frequency.
+	 *
+	 * For the others, we still need to scale reservation parameters
+	 * according to current frequency and CPU maximum capacity.
+	 */
+	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM)) {
+		scaled_delta_exec = grub_reclaim(delta_exec,
+						 rq,
+						 &curr->dl);
+	} else {
+		unsigned long scale_freq = arch_scale_freq_capacity(cpu);
+		unsigned long scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
+
+		scaled_delta_exec = cap_scale(delta_exec, scale_freq);
+		scaled_delta_exec = cap_scale(scaled_delta_exec, scale_cpu);
+	}
+
+	dl_se->runtime -= scaled_delta_exec;
 
 throttle:
 	if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b0f31064bbbd..39224813e038 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2781,8 +2781,6 @@ static u32 __accumulate_pelt_segments(u64 periods, u32 d1, u32 d3)
 	return c1 + c2 + c3;
 }
 
-#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
-
 /*
  * Accumulate the three separate parts of the sum; d1 the remainder
  * of the last (incomplete) period, d2 the span of full periods and d3
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc474c62cd18..019c46768ecb 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,6 +155,8 @@ static inline int task_has_dl_policy(struct task_struct *p)
 	return dl_policy(p->policy);
 }
 
+#define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
+
 static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
 {
 	return dl_se->flags & SCHED_FLAG_SPECIAL;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-05-23  8:53 ` [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
@ 2017-05-23 18:52   ` Peter Zijlstra
  2017-05-24  9:31     ` Juri Lelli
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-23 18:52 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Tue, May 23, 2017 at 09:53:46AM +0100, Juri Lelli wrote:
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index e2a6c7b3510b..72723859ef74 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -48,5 +48,6 @@
>   */
>  #define SCHED_FLAG_RESET_ON_FORK	0x01
>  #define SCHED_FLAG_RECLAIM		0x02
> +#define SCHED_FLAG_SPECIAL		0x04
>  
>  #endif /* _UAPI_LINUX_SCHED_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7fc2011c3ce7..ba57e2ef9aef 100644

> @@ -4205,7 +4212,9 @@ static int __sched_setscheduler(struct task_struct *p,
>  	}
>  
>  	if (attr->sched_flags &
> -		~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> +		~(SCHED_FLAG_RESET_ON_FORK |
> +		  SCHED_FLAG_RECLAIM |
> +		  SCHED_FLAG_SPECIAL))
>  		return -EINVAL;
>  
>  	/*

Could we pretty please not expose this gruesome hack to userspace?

So if you stick it in attr->sched_flags, use a high bit and don't put it
in a uapi header. Also make the flags check explicitly fail on it when
@user. Such that only _nocheck() (and thus kernel) callers have access
to it.

Also, there's not nearly enough warnings and other derisory comments
near it.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23  8:53 ` [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
@ 2017-05-23 19:04   ` Peter Zijlstra
  2017-05-24  9:01     ` Juri Lelli
  2017-05-23 19:29   ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-23 19:04 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> @@ -157,14 +158,13 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>  	return cpufreq_driver_resolve_freq(policy, freq);
>  }
>  
> -static void sugov_get_util(unsigned long *util, unsigned long *max)
> +static void sugov_get_util(struct sugov_cpu *sg_cpu)
>  {
>  	struct rq *rq = this_rq();
> -	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
>  
> -	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> -
> -	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
> +	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> +	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> +	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
>  }

Luca just introduced a nice BW_SHIFT for that '20' thing.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23  8:53 ` [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
  2017-05-23 19:04   ` Peter Zijlstra
@ 2017-05-23 19:29   ` Peter Zijlstra
  2017-05-23 23:30     ` Rafael J. Wysocki
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-23 19:29 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> To be able to treat utilization signals of different scheduling classes
> in different ways (e.g., CFS signal might be stale while DEADLINE signal
> is never stale by design) we need to split sugov_cpu::util signal in two:
> util_cfs and util_dl.
> 
> This patch does that by also changing sugov_get_util() parameter list.
> After this change aggregation of the different signals has to be performed
> by sugov_get_util() users (so that they can decide what to do with the
> different signals).

So what I don't see this patch doing; and I don't remember if cpufreq is
ready for this at all, is set the util_dl as min/guaranteed freq and
util_cfs+util_dl as requested freq.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
                   ` (7 preceding siblings ...)
  2017-05-23  8:53 ` [PATCH RFC 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
@ 2017-05-23 20:23 ` Peter Zijlstra
  2017-05-23 20:29   ` Peter Zijlstra
  2017-05-24  9:25   ` Juri Lelli
  8 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-23 20:23 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:

> A point that is still very much up for discussion (more that the others :) is
> how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
> grub_reclaim(), as the function already scales their reservation runtime
> considering other reservations and maximum bandwidth a CPU has to offer.
> However, for normal !RECLAIM tasks multiple things can be implemented which
> seem to make sense:
> 
>  - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
>    by AC
>  - go to max as soon as a normal task in enqueued: this because dimensioning of
>    parameters is usually done at max OPP/biggest CPU and normal task assume
>    that this is always the condition when they run
>  - scale runtime acconding to current frequency and max CPU capacity: this is
>    what this set is currently implementing
> 
> Opinions?


So I'm terribly confused...

By using the active bandwidth to select frequency we effectively
reduce idle time (to 0 if we had infinite granular frequency steps and
no margins).

So !RECLAIM works as expected. They get the time they reserved, since
that was taken into account by active bandwidth.

And RECLAIM works, since that only promises to (re)distribute idle time,
and if there is none that is an easy task.


Where is the problem?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-23 20:23 ` [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Peter Zijlstra
@ 2017-05-23 20:29   ` Peter Zijlstra
  2017-05-24  9:25   ` Juri Lelli
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-23 20:29 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Tue, May 23, 2017 at 10:23:21PM +0200, Peter Zijlstra wrote:
> On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:
> 
> > A point that is still very much up for discussion (more that the others :) is
> > how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
> > grub_reclaim(), as the function already scales their reservation runtime
> > considering other reservations and maximum bandwidth a CPU has to offer.
> > However, for normal !RECLAIM tasks multiple things can be implemented which
> > seem to make sense:
> > 
> >  - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
> >    by AC
> >  - go to max as soon as a normal task in enqueued: this because dimensioning of
> >    parameters is usually done at max OPP/biggest CPU and normal task assume
> >    that this is always the condition when they run
> >  - scale runtime acconding to current frequency and max CPU capacity: this is
> >    what this set is currently implementing
> > 
> > Opinions?
> 
> 
> So I'm terribly confused...
> 
> By using the active bandwidth to select frequency we effectively
> reduce idle time (to 0 if we had infinite granular frequency steps and
> no margins).

When all DL tasks consume their full reservation.

> So !RECLAIM works as expected. They get the time they reserved, since
> that was taken into account by active bandwidth.
> 
> And RECLAIM works, since that only promises to (re)distribute idle time,
> and if there is none that is an easy task.

And if they don't, there will thus be some idle time to redistribute and
that, again, still works as expected.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23 19:29   ` Peter Zijlstra
@ 2017-05-23 23:30     ` Rafael J. Wysocki
  2017-05-24  7:01       ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Rafael J. Wysocki @ 2017-05-23 23:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Juri Lelli, mingo, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar

On Tuesday, May 23, 2017 09:29:27 PM Peter Zijlstra wrote:
> On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> > To be able to treat utilization signals of different scheduling classes
> > in different ways (e.g., CFS signal might be stale while DEADLINE signal
> > is never stale by design) we need to split sugov_cpu::util signal in two:
> > util_cfs and util_dl.
> > 
> > This patch does that by also changing sugov_get_util() parameter list.
> > After this change aggregation of the different signals has to be performed
> > by sugov_get_util() users (so that they can decide what to do with the
> > different signals).
> 
> So what I don't see this patch doing; and I don't remember if cpufreq is
> ready for this at all, is set the util_dl as min/guaranteed freq and
> util_cfs+util_dl as requested freq.

I'm totally unsure what you mean here.

cpufreq doesn't have a "guaranteed frequency" concept of any sort right now.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23 23:30     ` Rafael J. Wysocki
@ 2017-05-24  7:01       ` Peter Zijlstra
  2017-05-24  9:01         ` Juri Lelli
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-24  7:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, mingo, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar

On Wed, May 24, 2017 at 01:30:36AM +0200, Rafael J. Wysocki wrote:
> On Tuesday, May 23, 2017 09:29:27 PM Peter Zijlstra wrote:
> > On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> > > To be able to treat utilization signals of different scheduling classes
> > > in different ways (e.g., CFS signal might be stale while DEADLINE signal
> > > is never stale by design) we need to split sugov_cpu::util signal in two:
> > > util_cfs and util_dl.
> > > 
> > > This patch does that by also changing sugov_get_util() parameter list.
> > > After this change aggregation of the different signals has to be performed
> > > by sugov_get_util() users (so that they can decide what to do with the
> > > different signals).
> > 
> > So what I don't see this patch doing; and I don't remember if cpufreq is
> > ready for this at all, is set the util_dl as min/guaranteed freq and
> > util_cfs+util_dl as requested freq.
> 
> I'm totally unsure what you mean here.

I was thinking of the CPPC/HWP stuff, where you can set different
frequencies with different levels of guarantees.

We'd want to set util_dl as the minimum (guaranteed) performance, and
util_dl + util_cfs as the desired performance level.

> cpufreq doesn't have a "guaranteed frequency" concept of any sort right now.

I was afraid of that ;-) I think we want a comment in the code stating
that this is the desired goal though. Then once cpufreq is ready to deal
with it we can change it..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-24  7:01       ` Peter Zijlstra
@ 2017-05-24  9:01         ` Juri Lelli
  0 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-24  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, mingo, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, luca.abeni, claudio,
	tommaso.cucinotta, bristot, mathieu.poirier, tkjos, joelaf,
	andresoportus, morten.rasmussen, dietmar.eggemann,
	patrick.bellasi, Ingo Molnar

Hi,

On 24/05/17 09:01, Peter Zijlstra wrote:
> On Wed, May 24, 2017 at 01:30:36AM +0200, Rafael J. Wysocki wrote:
> > On Tuesday, May 23, 2017 09:29:27 PM Peter Zijlstra wrote:
> > > On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> > > > To be able to treat utilization signals of different scheduling classes
> > > > in different ways (e.g., CFS signal might be stale while DEADLINE signal
> > > > is never stale by design) we need to split sugov_cpu::util signal in two:
> > > > util_cfs and util_dl.
> > > > 
> > > > This patch does that by also changing sugov_get_util() parameter list.
> > > > After this change aggregation of the different signals has to be performed
> > > > by sugov_get_util() users (so that they can decide what to do with the
> > > > different signals).
> > > 
> > > So what I don't see this patch doing; and I don't remember if cpufreq is
> > > ready for this at all, is set the util_dl as min/guaranteed freq and
> > > util_cfs+util_dl as requested freq.
> > 
> > I'm totally unsure what you mean here.
> 
> I was thinking of the CPPC/HWP stuff, where you can set different
> frequencies with different levels of guarantees.
> 
> We'd want to set util_dl as the minimum (guaranteed) performance, and
> util_dl + util_cfs as the desired performance level.
> 
> > cpufreq doesn't have a "guaranteed frequency" concept of any sort right now.
> 
> I was afraid of that ;-) I think we want a comment in the code stating
> that this is the desired goal though. Then once cpufreq is ready to deal
> with it we can change it..

Sure, I can add that in next version.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals
  2017-05-23 19:04   ` Peter Zijlstra
@ 2017-05-24  9:01     ` Juri Lelli
  0 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-24  9:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

Hi,

On 23/05/17 21:04, Peter Zijlstra wrote:
> On Tue, May 23, 2017 at 09:53:47AM +0100, Juri Lelli wrote:
> > @@ -157,14 +158,13 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >  	return cpufreq_driver_resolve_freq(policy, freq);
> >  }
> >  
> > -static void sugov_get_util(unsigned long *util, unsigned long *max)
> > +static void sugov_get_util(struct sugov_cpu *sg_cpu)
> >  {
> >  	struct rq *rq = this_rq();
> > -	unsigned long dl_util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
> >  
> > -	*max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> > -
> > -	*util = min(rq->cfs.avg.util_avg + dl_util, *max);
> > +	sg_cpu->max = arch_scale_cpu_capacity(NULL, smp_processor_id());
> > +	sg_cpu->util_cfs = rq->cfs.avg.util_avg;
> > +	sg_cpu->util_dl = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> 20;
> >  }
> 
> Luca just introduced a nice BW_SHIFT for that '20' thing.

Right, will use that.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-23 20:23 ` [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Peter Zijlstra
  2017-05-23 20:29   ` Peter Zijlstra
@ 2017-05-24  9:25   ` Juri Lelli
  2017-05-24  9:41     ` Peter Zijlstra
  2017-05-24 10:01     ` Luca Abeni
  1 sibling, 2 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-24  9:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

Hi,

On 23/05/17 22:23, Peter Zijlstra wrote:
> On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:
> 
> > A point that is still very much up for discussion (more that the others :) is
> > how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
> > grub_reclaim(), as the function already scales their reservation runtime
> > considering other reservations and maximum bandwidth a CPU has to offer.
> > However, for normal !RECLAIM tasks multiple things can be implemented which
> > seem to make sense:
> > 
> >  - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
> >    by AC
> >  - go to max as soon as a normal task in enqueued: this because dimensioning of
> >    parameters is usually done at max OPP/biggest CPU and normal task assume
> >    that this is always the condition when they run
> >  - scale runtime acconding to current frequency and max CPU capacity: this is
> >    what this set is currently implementing
> > 
> > Opinions?
> 
> 
> So I'm terribly confused...
> 
> By using the active bandwidth to select frequency we effectively
> reduce idle time (to 0 if we had infinite granular frequency steps and
> no margins).
> 
> So !RECLAIM works as expected. They get the time they reserved, since
> that was taken into account by active bandwidth.
> 

This was my impression as well, but Luca (and please Luca correct me if
I misunderstood your point) argued (in an off-line discussion ahead of
this posting) that !reclaim tasks might not be interested in reclaiming
*at all*. Since scaling frequency down is another way of effectively
reclaiming unused bandwidth (the other being sharing unused bandwidth
among reservations while keeping frequency at max), !reclaim tasks could
not be interested in frequency scaling (my first point above) or require
frequency to be always at max (second point above).

Does this help claryfing a bit? :)

This said however, I'd personally be inclined to go with option 3 above,
which is what this set is currently implementing.

> And RECLAIM works, since that only promises to (re)distribute idle time,
> and if there is none that is an easy task.
> 

Right.

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE
  2017-05-23 18:52   ` Peter Zijlstra
@ 2017-05-24  9:31     ` Juri Lelli
  0 siblings, 0 replies; 25+ messages in thread
From: Juri Lelli @ 2017-05-24  9:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi, Ingo Molnar,
	Rafael J . Wysocki

Hi,

On 23/05/17 20:52, Peter Zijlstra wrote:
> On Tue, May 23, 2017 at 09:53:46AM +0100, Juri Lelli wrote:
> > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > index e2a6c7b3510b..72723859ef74 100644
> > --- a/include/uapi/linux/sched.h
> > +++ b/include/uapi/linux/sched.h
> > @@ -48,5 +48,6 @@
> >   */
> >  #define SCHED_FLAG_RESET_ON_FORK	0x01
> >  #define SCHED_FLAG_RECLAIM		0x02
> > +#define SCHED_FLAG_SPECIAL		0x04
> >  
> >  #endif /* _UAPI_LINUX_SCHED_H */
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 7fc2011c3ce7..ba57e2ef9aef 100644
> 
> > @@ -4205,7 +4212,9 @@ static int __sched_setscheduler(struct task_struct *p,
> >  	}
> >  
> >  	if (attr->sched_flags &
> > -		~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
> > +		~(SCHED_FLAG_RESET_ON_FORK |
> > +		  SCHED_FLAG_RECLAIM |
> > +		  SCHED_FLAG_SPECIAL))
> >  		return -EINVAL;
> >  
> >  	/*
> 
> Could we pretty please not expose this gruesome hack to userspace?
> 
> So if you stick it in attr->sched_flags, use a high bit and don't put it
> in a uapi header. Also make the flags check explicitly fail on it when
> @user. Such that only _nocheck() (and thus kernel) callers have access
> to it.
>

Sure, I should have done it in the first place.

> Also, there's not nearly enough warnings and other derisory comments
> near it.

Eheh, I'll add all the derisory remarks I'm capable of. :/

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-24  9:25   ` Juri Lelli
@ 2017-05-24  9:41     ` Peter Zijlstra
  2017-05-24  9:50       ` Juri Lelli
  2017-05-24 10:01     ` Luca Abeni
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-24  9:41 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Wed, May 24, 2017 at 10:25:05AM +0100, Juri Lelli wrote:
> Hi,
> 
> On 23/05/17 22:23, Peter Zijlstra wrote:
> > On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:
> > 
> > > A point that is still very much up for discussion (more that the others :) is
> > > how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
> > > grub_reclaim(), as the function already scales their reservation runtime
> > > considering other reservations and maximum bandwidth a CPU has to offer.
> > > However, for normal !RECLAIM tasks multiple things can be implemented which
> > > seem to make sense:
> > > 
> > >  - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
> > >    by AC
> > >  - go to max as soon as a normal task in enqueued: this because dimensioning of
> > >    parameters is usually done at max OPP/biggest CPU and normal task assume
> > >    that this is always the condition when they run
> > >  - scale runtime acconding to current frequency and max CPU capacity: this is
> > >    what this set is currently implementing
> > > 
> > > Opinions?
> > 
> > 
> > So I'm terribly confused...
> > 
> > By using the active bandwidth to select frequency we effectively
> > reduce idle time (to 0 if we had infinite granular frequency steps and
> > no margins).
> > 
> > So !RECLAIM works as expected. They get the time they reserved, since
> > that was taken into account by active bandwidth.
> > 
> 
> This was my impression as well, but Luca (and please Luca correct me if
> I misunderstood your point) argued (in an off-line discussion ahead of
> this posting) that !reclaim tasks might not be interested in reclaiming
> *at all*. Since scaling frequency down is another way of effectively
> reclaiming unused bandwidth (the other being sharing unused bandwidth
> among reservations while keeping frequency at max), !reclaim tasks could
> not be interested in frequency scaling (my first point above) or require
> frequency to be always at max (second point above).
> 
> Does this help claryfing a bit? :)

No ;-) As you said, confusion++.

A !RECLAIM task doesn't care (cannot care, should not care etc..) about
any bandwidth not allocated to itself. Therefore it should/must/etc..
not have any opinion on what we do with 'spare' bandwidth.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-24  9:41     ` Peter Zijlstra
@ 2017-05-24  9:50       ` Juri Lelli
  2017-05-24 11:31         ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Juri Lelli @ 2017-05-24  9:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On 24/05/17 11:41, Peter Zijlstra wrote:
> On Wed, May 24, 2017 at 10:25:05AM +0100, Juri Lelli wrote:
> > Hi,
> > 
> > On 23/05/17 22:23, Peter Zijlstra wrote:
> > > On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:
> > > 
> > > > A point that is still very much up for discussion (more that the others :) is
> > > > how we implement frequency/cpu scaling. SCHED_FLAG_RECLAIM tasks only need
> > > > grub_reclaim(), as the function already scales their reservation runtime
> > > > considering other reservations and maximum bandwidth a CPU has to offer.
> > > > However, for normal !RECLAIM tasks multiple things can be implemented which
> > > > seem to make sense:
> > > > 
> > > >  - don't scale at all: normal tasks will only get a % of CPU _time_ as granted
> > > >    by AC
> > > >  - go to max as soon as a normal task in enqueued: this because dimensioning of
> > > >    parameters is usually done at max OPP/biggest CPU and normal task assume
> > > >    that this is always the condition when they run
> > > >  - scale runtime acconding to current frequency and max CPU capacity: this is
> > > >    what this set is currently implementing
> > > > 
> > > > Opinions?
> > > 
> > > 
> > > So I'm terribly confused...
> > > 
> > > By using the active bandwidth to select frequency we effectively
> > > reduce idle time (to 0 if we had infinite granular frequency steps and
> > > no margins).
> > > 
> > > So !RECLAIM works as expected. They get the time they reserved, since
> > > that was taken into account by active bandwidth.
> > > 
> > 
> > This was my impression as well, but Luca (and please Luca correct me if
> > I misunderstood your point) argued (in an off-line discussion ahead of
> > this posting) that !reclaim tasks might not be interested in reclaiming
> > *at all*. Since scaling frequency down is another way of effectively
> > reclaiming unused bandwidth (the other being sharing unused bandwidth
> > among reservations while keeping frequency at max), !reclaim tasks could
> > not be interested in frequency scaling (my first point above) or require
> > frequency to be always at max (second point above).
> > 
> > Does this help claryfing a bit? :)
> 
> No ;-) As you said, confusion++.
> 
> A !RECLAIM task doesn't care (cannot care, should not care etc..) about
> any bandwidth not allocated to itself. Therefore it should/must/etc..
> not have any opinion on what we do with 'spare' bandwidth.
> 

Agreed. However, problem seems to be that

 - in my opinion (current implementation) this translated into scaling
   runtime considering current freq and cpu-max-capacity; and this is
   required when frequency scaling is enabled and we still want to meet
   a task's guaranteed bandwidth
 
 - Luca seemed instead to be inclined to say that, if we scale runtime
   for !reclaim tasks, such tasks are basically allowed to run for more
   time (when frequency is lower than max) by using some of the
   bandwidth not allocated to themselves

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-24  9:25   ` Juri Lelli
  2017-05-24  9:41     ` Peter Zijlstra
@ 2017-05-24 10:01     ` Luca Abeni
  2017-05-24 11:29       ` Peter Zijlstra
  1 sibling, 1 reply; 25+ messages in thread
From: Luca Abeni @ 2017-05-24 10:01 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Wed, 24 May 2017 10:25:05 +0100
Juri Lelli <juri.lelli@arm.com> wrote:

> Hi,
> 
> On 23/05/17 22:23, Peter Zijlstra wrote:
> > On Tue, May 23, 2017 at 09:53:43AM +0100, Juri Lelli wrote:
> >   
> > > A point that is still very much up for discussion (more that the
> > > others :) is how we implement frequency/cpu scaling.
> > > SCHED_FLAG_RECLAIM tasks only need grub_reclaim(), as the
> > > function already scales their reservation runtime considering
> > > other reservations and maximum bandwidth a CPU has to offer.
> > > However, for normal !RECLAIM tasks multiple things can be
> > > implemented which seem to make sense:
> > > 
> > >  - don't scale at all: normal tasks will only get a % of CPU
> > > _time_ as granted by AC
> > >  - go to max as soon as a normal task in enqueued: this because
> > > dimensioning of parameters is usually done at max OPP/biggest CPU
> > > and normal task assume that this is always the condition when
> > > they run
> > >  - scale runtime acconding to current frequency and max CPU
> > > capacity: this is what this set is currently implementing
> > > 
> > > Opinions?  
> > 
> > 
> > So I'm terribly confused...
> > 
> > By using the active bandwidth to select frequency we effectively
> > reduce idle time (to 0 if we had infinite granular frequency steps
> > and no margins).
> > 
> > So !RECLAIM works as expected. They get the time they reserved,
> > since that was taken into account by active bandwidth.
> >   
> 
> This was my impression as well, but Luca (and please Luca correct me
> if I misunderstood your point) argued (in an off-line discussion
> ahead of this posting) that !reclaim tasks might not be interested in
> reclaiming *at all*.

Well, I also admitted that I am almost completely ignorant about many
people's requirements...

What I know is that there are some people using SCHED_DEADLINE to make
sure that a task can make progress (executing with a "high priority")
without consuming more than a specified fraction of CPU time... So,
they for example schedule a CPU-hungry task with runtime=10ms and
period=100ms to make sure that the task can execute every 100ms (giving
the impression of a "fluid progress") without stealing more than 10% of
CPU time to other tasks.

In this case, if the CPU frequency change the goal is still to
"reserve" 10% of CPU time (not more, even if the CPU is slower) to the
task. So, no runtime rescaling (or reclaiming) is required in this case.


My proposal was that if a task is not interested in a fixed
runtime / fraction of CPU time but wants to adapt the runtime when the
CPU frequency scales, then it can select the RECLAIMING flag.

But of course there might be different requirements or other use-cases.



			Luca

> Since scaling frequency down is another way of
> effectively reclaiming unused bandwidth (the other being sharing
> unused bandwidth among reservations while keeping frequency at
> max), !reclaim tasks could not be interested in frequency scaling (my
> first point above) or require frequency to be always at max (second
> point above).
> 
> Does this help claryfing a bit? :)
> 
> This said however, I'd personally be inclined to go with option 3
> above, which is what this set is currently implementing.
> 
> > And RECLAIM works, since that only promises to (re)distribute idle
> > time, and if there is none that is an easy task.
> >   
> 
> Right.
> 
> Thanks,
> 
> - Juri

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-24 10:01     ` Luca Abeni
@ 2017-05-24 11:29       ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-24 11:29 UTC (permalink / raw)
  To: Luca Abeni
  Cc: Juri Lelli, mingo, rjw, viresh.kumar, linux-kernel, linux-pm,
	tglx, vincent.guittot, rostedt, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Wed, May 24, 2017 at 12:01:51PM +0200, Luca Abeni wrote:
> > > So I'm terribly confused...
> > > 
> > > By using the active bandwidth to select frequency we effectively
> > > reduce idle time (to 0 if we had infinite granular frequency steps
> > > and no margins).
> > > 
> > > So !RECLAIM works as expected. They get the time they reserved,
> > > since that was taken into account by active bandwidth.


> Well, I also admitted that I am almost completely ignorant about many
> people's requirements...
> 
> What I know is that there are some people using SCHED_DEADLINE to make
> sure that a task can make progress (executing with a "high priority")
> without consuming more than a specified fraction of CPU time... So,
> they for example schedule a CPU-hungry task with runtime=10ms and
> period=100ms to make sure that the task can execute every 100ms (giving
> the impression of a "fluid progress") without stealing more than 10% of
> CPU time to other tasks.
> 
> In this case, if the CPU frequency change the goal is still to
> "reserve" 10% of CPU time (not more, even if the CPU is slower) to the
> task. So, no runtime rescaling (or reclaiming) is required in this case.
> 
> 
> My proposal was that if a task is not interested in a fixed
> runtime / fraction of CPU time but wants to adapt the runtime when the
> CPU frequency scales, then it can select the RECLAIMING flag.

I think these people are doing it wrong :-)

Firstly, the runtime budget is a WCET. This very much means it is
subject to CPU frequency; after all, when the CPU runs slower, that same
amount of work takes longer. So being subject to cpufreq is the natural
state and should not require a special marker.

Secondly, if you want a steady progress of 10%, I don't see the problem
with giving them more at slower frequency, they get the 'same' amount of
'work' done without bothering other people.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection
  2017-05-24  9:50       ` Juri Lelli
@ 2017-05-24 11:31         ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-05-24 11:31 UTC (permalink / raw)
  To: Juri Lelli
  Cc: mingo, rjw, viresh.kumar, linux-kernel, linux-pm, tglx,
	vincent.guittot, rostedt, luca.abeni, claudio, tommaso.cucinotta,
	bristot, mathieu.poirier, tkjos, joelaf, andresoportus,
	morten.rasmussen, dietmar.eggemann, patrick.bellasi

On Wed, May 24, 2017 at 10:50:53AM +0100, Juri Lelli wrote:

> Agreed. However, problem seems to be that
> 
>  - in my opinion (current implementation) this translated into scaling
>    runtime considering current freq and cpu-max-capacity; and this is
>    required when frequency scaling is enabled and we still want to meet
>    a task's guaranteed bandwidth

Just so. The bandwidth they request is based on instructions/work. We
need to get a certain amount of instructions sorted. Nobody cares we get
an exact 10% at random frequency if they loose they finger because we
didn't get that final instruction out that stops the saw blade.

>  - Luca seemed instead to be inclined to say that, if we scale runtime
>    for !reclaim tasks, such tasks are basically allowed to run for more
>    time (when frequency is lower than max) by using some of the
>    bandwidth not allocated to themselves

Yes, that's a wrong view :-) We don't care about 'time', we care about
getting the instruction stream / work completed.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-05-24 11:32 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23  8:53 [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 1/8] sched/cpufreq_schedutil: make use of DEADLINE utilization signal Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 2/8] sched/deadline: move cpu frequency selection triggering points Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 3/8] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE Juri Lelli
2017-05-23 18:52   ` Peter Zijlstra
2017-05-24  9:31     ` Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 4/8] sched/cpufreq_schedutil: split utilization signals Juri Lelli
2017-05-23 19:04   ` Peter Zijlstra
2017-05-24  9:01     ` Juri Lelli
2017-05-23 19:29   ` Peter Zijlstra
2017-05-23 23:30     ` Rafael J. Wysocki
2017-05-24  7:01       ` Peter Zijlstra
2017-05-24  9:01         ` Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 5/8] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 6/8] sched/sched.h: remove sd arch_scale_freq_capacity parameter Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 7/8] sched/sched.h: move arch_scale_{freq,cpu}_capacity outside CONFIG_SMP Juri Lelli
2017-05-23  8:53 ` [PATCH RFC 8/8] sched/deadline: make bandwidth enforcement scale-invariant Juri Lelli
2017-05-23 20:23 ` [PATCH RFC 0/8] SCHED_DEADLINE freq/cpu invariance and OPP selection Peter Zijlstra
2017-05-23 20:29   ` Peter Zijlstra
2017-05-24  9:25   ` Juri Lelli
2017-05-24  9:41     ` Peter Zijlstra
2017-05-24  9:50       ` Juri Lelli
2017-05-24 11:31         ` Peter Zijlstra
2017-05-24 10:01     ` Luca Abeni
2017-05-24 11:29       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).