All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE
@ 2020-05-20 13:42 Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

The SCHED_DEADLINE (DL) Admission Control (AC) and task placement do
not work correctly on heterogeneous (asymmetric CPU capacity) systems
such as Arm big.LITTLE or DynamIQ.

Let's fix this by explicitly considering CPU capacity in AC and task
placement.

The DL sched class now attempts to avoid missing task deadlines due to
smaller CPU (CPU capacity < 1024) not being capable enough to finish a
task in time. It does so by trying to place a task so that its CPU
capacity scaled deadline is not smaller than its runtime.

This patch-set only supports capacity awareness in the idle scenario
(cpudl::free_cpus not empty). Capacity awareness for the non-idle
case should be added in a later series.

Changes v2 [1] -> v3:

Discussion about that if 'rq->rd == def_root_domain' AC should be
performed against the capacity of the CPU the task is running on rather
the rd CPU capacity sum.
Since this issue already exists w/o capacity awareness a 'XXX Fix:'
comment was added for now.

Per-patch changes:

(1) Patch 'sched/topology: Store root domain CPU capacity sum' removed
    since rd->sum_cpu_capacity is not needed anymore [v2 patch 1/6]

(2) Redesign of dl_bw_capacity() and 'XXX Fix:' comment (mentioned 
    above) added [patch 2/5]

(3) Favor task_cpu(p) if it has max capacity of !fitting CPUs
    [patch 5/5]

Changes v1 [2] -> v2:

Discussion about capacity awareness in idle and non-idle scenarios
indicated that the current patch-set only supports the former.

Per-patch changes:

(1) Use rq->cpu_capacity_orig or capacity_orig_of() instead of
    arch_scale_cpu_capacity() [patch 1,6/6]

(2) Optimize dl_bw_cpus(), i.e. return weight of rd->span if rd->span
    &sube cpu_active_mask [patch 2/6]

(3) Replace rd_capacity() with dl_bw_capacity() [patch 3/6]

Changes RFC [3] -> v1:

Only use static values for CPU bandwidth (sched_dl_entity::dl_runtime,
::dl_deadline) and CPU capacity (arch_scale_cpu_capacity()) to fix AC.

Dynamic values for CPU bandwidth (sched_dl_entity::runtime, ::deadline)
and CPU capacity (capacity_of()) are considered to be more related to
energy trade-off calculations which could be later introduced using the
Energy Model.

Since the design of the DL and RT sched classes are very similar, the
implementation follows the overall design of RT capacity awareness
(commit 804d402fb6f6 ("sched/rt: Make RT capacity-aware")).

Per-patch changes:

(1) Store CPU capacity sum in the root domain during
    build_sched_domains() [patch 1/4]

(2) Adjust to RT capacity awareness design [patch 3/4]

(3) Remove CPU capacity aware placement in switched_to_dl()
    (dl_migrate callback) [RFC patch 3/6]

    Balance callbacks (push, pull) run only in schedule_tail()
    __schedule(), rt_mutex_setprio() or __sched_setscheduler().
    DL throttling leads to a call to __dequeue_task_dl() which is not a
    full task dequeue. The task is still enqueued and only removed from
    the rq.
    So a queue_balance_callback() call in update_curr_dl()->
    __dequeue_task_dl() will not be followed by a balance_callback()
    call in one of the 4 functions mentioned above.

(4) Remove 'dynamic CPU bandwidth' consideration and only support
    'static CPU bandwidth' (ratio between sched_dl_entity::dl_runtime
    and ::dl_deadline) [RFC patch 4/6]

(5) Remove modification to migration logic which tried to schedule
    small tasks on LITTLE CPUs [RFC patch 6/6]

[1] https://lore.kernel.org/r/20200427083709.30262-1-dietmar.eggemann@arm.com
[2] https://lore.kernel.org/r/20200408095012.3819-1-dietmar.eggemann@arm.com
[3] https://lore.kernel.org/r/20190506044836.2914-1-luca.abeni@santannapisa.it

The following rt-app testcase tailored to Arm64 Hikey960:

root@h960:~# cat /sys/devices/system/cpu/cpu*/cpu_capacity
462
462
462
462
1024
1024
1024
1024

shows the expected behavior.

According to the following condition in dl_task_fits_capacity()

    cap_scale(dl_deadline, arch_scale_cpu_capacity(cpu)) >= dl_runtime

thread0-[0-3] are placed on a big CPUs whereas thread1-[0-3] run on a
LITTLE CPU respectively.

The 'delay' parameter for the little tasks makes sure that they start
later than the big tasks allowing the big tasks to choose big CPUs.

...
"tasks" : {
 "thread0" : {
  "policy" : "SCHED_DEADLINE",
  "instance" : 4,
  "timer" : { "ref" : "unique0", "period" : 16000, "mode" : "absolute" },
  "run" : 10000,
  "dl-runtime" : 11000,
  "dl-period" : 16000,
  "dl-deadline" : 16000
},
 "thread1" : {
  "policy" : "SCHED_DEADLINE",
  "instance" : 4,
  "delay" : 1000,
  "timer" : { "ref" : "unique1", "period" : 16000, "mode" : "absolute" },
  "run" : 5500,
  "dl-runtime" : 6500			
  "dl-period" : 16000,
  "dl-deadline" : 16000
}
...

Tests were run with Performance CPUfreq governor so that the Schedutil
CPUfreq governor DL threads (sugov:[0,4]), necessary on a
slow-switching platform like Hikey960, do not interfere with the
rt-app test tasks. Using Schedutil would require to lower the number of
tasks to 3 instances each.

Dietmar Eggemann (2):
  sched/deadline: Optimize dl_bw_cpus()
  sched/deadline: Add dl_bw_capacity()

Luca Abeni (3):
  sched/deadline: Improve admission control for asymmetric CPU
    capacities
  sched/deadline: Make DL capacity-aware
  sched/deadline: Implement fallback mechanism for !fit case

 kernel/sched/cpudeadline.c | 24 ++++++++++
 kernel/sched/deadline.c    | 89 ++++++++++++++++++++++++++++++--------
 kernel/sched/sched.h       | 21 +++++++--
 3 files changed, 113 insertions(+), 21 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus()
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
@ 2020-05-20 13:42 ` Dietmar Eggemann
  2020-05-22 14:57   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity() Dietmar Eggemann
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

Return the weight of the root domain (rd) span in case it is a subset
of the cpu_active_mask.

Continue to compute the number of CPUs over rd span and cpu_active_mask
when in hotplug.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/deadline.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 504d2f51b0d6..4ae22bfc37ae 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -54,10 +54,16 @@ static inline struct dl_bw *dl_bw_of(int i)
 static inline int dl_bw_cpus(int i)
 {
 	struct root_domain *rd = cpu_rq(i)->rd;
-	int cpus = 0;
+	int cpus;
 
 	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
 			 "sched RCU must be held");
+
+	if (cpumask_subset(rd->span, cpu_active_mask))
+		return cpumask_weight(rd->span);
+
+	cpus = 0;
+
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity()
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
@ 2020-05-20 13:42 ` Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities Dietmar Eggemann
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

Capacity-aware SCHED_DEADLINE Admission Control (AC) needs root domain
(rd) CPU capacity sum.

Introduce dl_bw_capacity() which for a symmetric rd w/ a CPU capacity
of SCHED_CAPACITY_SCALE simply relies on dl_bw_cpus() to return #CPUs
multiplied by SCHED_CAPACITY_SCALE.

For an asymmetric rd or a CPU capacity < SCHED_CAPACITY_SCALE it
computes the CPU capacity sum over rd span and cpu_active_mask.

A 'XXX Fix:' comment was added to highlight that if 'rq->rd ==
def_root_domain' AC should be performed against the capacity of the
CPU the task is running on rather the rd CPU capacity sum. This
issue already exists w/o capacity awareness.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/deadline.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 4ae22bfc37ae..ea7282ce484c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -69,6 +69,34 @@ static inline int dl_bw_cpus(int i)
 
 	return cpus;
 }
+
+static inline unsigned long __dl_bw_capacity(int i)
+{
+	struct root_domain *rd = cpu_rq(i)->rd;
+	unsigned long cap = 0;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
+
+	for_each_cpu_and(i, rd->span, cpu_active_mask)
+		cap += capacity_orig_of(i);
+
+	return cap;
+}
+
+/*
+ * XXX Fix: If 'rq->rd == def_root_domain' perform AC against capacity
+ * of the CPU the task is running on rather rd's \Sum CPU capacity.
+ */
+static inline unsigned long dl_bw_capacity(int i)
+{
+	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
+	    capacity_orig_of(i) == SCHED_CAPACITY_SCALE) {
+		return dl_bw_cpus(i) << SCHED_CAPACITY_SHIFT;
+	} else {
+		return __dl_bw_capacity(i);
+	}
+}
 #else
 static inline struct dl_bw *dl_bw_of(int i)
 {
@@ -79,6 +107,11 @@ static inline int dl_bw_cpus(int i)
 {
 	return 1;
 }
+
+static inline unsigned long dl_bw_capacity(int i)
+{
+	return SCHED_CAPACITY_SCALE;
+}
 #endif
 
 static inline
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
  2020-05-20 13:42 ` [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity() Dietmar Eggemann
@ 2020-05-20 13:42 ` Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  2020-05-20 13:42 ` [PATCH v3 4/5] sched/deadline: Make DL capacity-aware Dietmar Eggemann
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

From: Luca Abeni <luca.abeni@santannapisa.it>

The current SCHED_DEADLINE (DL) admission control ensures that

    sum of reserved CPU bandwidth < x * M

where

    x = /proc/sys/kernel/sched_rt_{runtime,period}_us
    M = # CPUs in root domain.

DL admission control works well for homogeneous systems where the
capacity of all CPUs are equal (1024). I.e. bounded tardiness for DL
and non-starvation of non-DL tasks is guaranteed.

But on heterogeneous systems where capacity of CPUs are different it
could fail by over-allocating CPU time on smaller capacity CPUs.

On an Arm big.LITTLE/DynamIQ system DL tasks can easily starve other
tasks making it unusable.

Fix this by explicitly considering the CPU capacity in the DL admission
test by replacing M with the root domain CPU capacity sum.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/deadline.c | 30 +++++++++++++++++-------------
 kernel/sched/sched.h    |  6 +++---
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ea7282ce484c..fa8566517715 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2590,11 +2590,12 @@ void sched_dl_do_global(void)
 int sched_dl_overflow(struct task_struct *p, int policy,
 		      const struct sched_attr *attr)
 {
-	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 	u64 period = attr->sched_period ?: attr->sched_deadline;
 	u64 runtime = attr->sched_runtime;
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
-	int cpus, err = -1;
+	int cpus, err = -1, cpu = task_cpu(p);
+	struct dl_bw *dl_b = dl_bw_of(cpu);
+	unsigned long cap;
 
 	if (attr->sched_flags & SCHED_FLAG_SUGOV)
 		return 0;
@@ -2609,15 +2610,17 @@ int sched_dl_overflow(struct task_struct *p, int policy,
 	 * allocated bandwidth of the container.
 	 */
 	raw_spin_lock(&dl_b->lock);
-	cpus = dl_bw_cpus(task_cpu(p));
+	cpus = dl_bw_cpus(cpu);
+	cap = dl_bw_capacity(cpu);
+
 	if (dl_policy(policy) && !task_has_dl_policy(p) &&
-	    !__dl_overflow(dl_b, cpus, 0, new_bw)) {
+	    !__dl_overflow(dl_b, cap, 0, new_bw)) {
 		if (hrtimer_active(&p->dl.inactive_timer))
 			__dl_sub(dl_b, p->dl.dl_bw, cpus);
 		__dl_add(dl_b, new_bw, cpus);
 		err = 0;
 	} else if (dl_policy(policy) && task_has_dl_policy(p) &&
-		   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
+		   !__dl_overflow(dl_b, cap, p->dl.dl_bw, new_bw)) {
 		/*
 		 * XXX this is slightly incorrect: when the task
 		 * utilization decreases, we should delay the total
@@ -2753,19 +2756,19 @@ bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
 #ifdef CONFIG_SMP
 int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed)
 {
+	unsigned long flags, cap;
 	unsigned int dest_cpu;
 	struct dl_bw *dl_b;
 	bool overflow;
-	int cpus, ret;
-	unsigned long flags;
+	int ret;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);
 
 	rcu_read_lock_sched();
 	dl_b = dl_bw_of(dest_cpu);
 	raw_spin_lock_irqsave(&dl_b->lock, flags);
-	cpus = dl_bw_cpus(dest_cpu);
-	overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
+	cap = dl_bw_capacity(dest_cpu);
+	overflow = __dl_overflow(dl_b, cap, 0, p->dl.dl_bw);
 	if (overflow) {
 		ret = -EBUSY;
 	} else {
@@ -2775,6 +2778,8 @@ int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allo
 		 * We will free resources in the source root_domain
 		 * later on (see set_cpus_allowed_dl()).
 		 */
+		int cpus = dl_bw_cpus(dest_cpu);
+
 		__dl_add(dl_b, p->dl.dl_bw, cpus);
 		ret = 0;
 	}
@@ -2807,16 +2812,15 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
 
 bool dl_cpu_busy(unsigned int cpu)
 {
-	unsigned long flags;
+	unsigned long flags, cap;
 	struct dl_bw *dl_b;
 	bool overflow;
-	int cpus;
 
 	rcu_read_lock_sched();
 	dl_b = dl_bw_of(cpu);
 	raw_spin_lock_irqsave(&dl_b->lock, flags);
-	cpus = dl_bw_cpus(cpu);
-	overflow = __dl_overflow(dl_b, cpus, 0, 0);
+	cap = dl_bw_capacity(cpu);
+	overflow = __dl_overflow(dl_b, cap, 0, 0);
 	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 	rcu_read_unlock_sched();
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 21416b30c520..14cb6a97e2d2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -310,11 +310,11 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
 	__dl_update(dl_b, -((s32)tsk_bw / cpus));
 }
 
-static inline
-bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
+static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
+				 u64 old_bw, u64 new_bw)
 {
 	return dl_b->bw != -1 &&
-	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
+	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
 }
 
 extern void init_dl_bw(struct dl_bw *dl_b);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/5] sched/deadline: Make DL capacity-aware
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
                   ` (2 preceding siblings ...)
  2020-05-20 13:42 ` [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities Dietmar Eggemann
@ 2020-05-20 13:42 ` Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  2020-05-20 13:42 ` [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case Dietmar Eggemann
  2020-06-10 10:26 ` [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Peter Zijlstra
  5 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

From: Luca Abeni <luca.abeni@santannapisa.it>

The current SCHED_DEADLINE (DL) scheduler uses a global EDF scheduling
algorithm w/o considering CPU capacity or task utilization.
This works well on homogeneous systems where DL tasks are guaranteed
to have a bounded tardiness but presents issues on heterogeneous
systems.

A DL task can migrate to a CPU which does not have enough CPU capacity
to correctly serve the task (e.g. a task w/ 70ms runtime and 100ms
period on a CPU w/ 512 capacity).

Add the DL fitness function dl_task_fits_capacity() for DL admission
control on heterogeneous systems. A task fits onto a CPU if:

    CPU original capacity / 1024 >= task runtime / task deadline

Use this function on heterogeneous systems to try to find a CPU which
meets this criterion during task wakeup, push and offline migration.

On homogeneous systems the original behavior of the DL admission
control should be retained.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/cpudeadline.c | 14 +++++++++++++-
 kernel/sched/deadline.c    | 18 ++++++++++++++----
 kernel/sched/sched.h       | 15 +++++++++++++++
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5cc4012572ec..8630f2a40a3f 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -121,7 +121,19 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 
 	if (later_mask &&
 	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
-		return 1;
+		int cpu;
+
+		if (!static_branch_unlikely(&sched_asym_cpucapacity))
+			return 1;
+
+		/* Ensure the capacity of the CPUs fits the task. */
+		for_each_cpu(cpu, later_mask) {
+			if (!dl_task_fits_capacity(p, cpu))
+				cpumask_clear_cpu(cpu, later_mask);
+		}
+
+		if (!cpumask_empty(later_mask))
+			return 1;
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fa8566517715..f2e8f5a36707 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1643,6 +1643,7 @@ static int
 select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
 {
 	struct task_struct *curr;
+	bool select_rq;
 	struct rq *rq;
 
 	if (sd_flag != SD_BALANCE_WAKE)
@@ -1662,10 +1663,19 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	if (unlikely(dl_task(curr)) &&
-	    (curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &curr->dl)) &&
-	    (p->nr_cpus_allowed > 1)) {
+	select_rq = unlikely(dl_task(curr)) &&
+		    (curr->nr_cpus_allowed < 2 ||
+		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		    p->nr_cpus_allowed > 1;
+
+	/*
+	 * Take the capacity of the CPU into account to
+	 * ensure it fits the requirement of the task.
+	 */
+	if (static_branch_unlikely(&sched_asym_cpucapacity))
+		select_rq |= !dl_task_fits_capacity(p, cpu);
+
+	if (select_rq) {
 		int target = find_later_rq(p);
 
 		if (target != -1 &&
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 14cb6a97e2d2..6ebbb1f353c4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -317,6 +317,21 @@ static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
 	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
 }
 
+/*
+ * Verify the fitness of task @p to run on @cpu taking into account the
+ * CPU original capacity and the runtime/deadline ratio of the task.
+ *
+ * The function will return true if the CPU original capacity of the
+ * @cpu scaled by SCHED_CAPACITY_SCALE >= runtime/deadline ratio of the
+ * task and false otherwise.
+ */
+static inline bool dl_task_fits_capacity(struct task_struct *p, int cpu)
+{
+	unsigned long cap = arch_scale_cpu_capacity(cpu);
+
+	return cap_scale(p->dl.dl_deadline, cap) >= p->dl.dl_runtime;
+}
+
 extern void init_dl_bw(struct dl_bw *dl_b);
 extern int  sched_dl_global_validate(void);
 extern void sched_dl_do_global(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
                   ` (3 preceding siblings ...)
  2020-05-20 13:42 ` [PATCH v3 4/5] sched/deadline: Make DL capacity-aware Dietmar Eggemann
@ 2020-05-20 13:42 ` Dietmar Eggemann
  2020-05-22 14:59   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  2020-06-10 10:26 ` [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Peter Zijlstra
  5 siblings, 2 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2020-05-20 13:42 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli
  Cc: Vincent Guittot, Steven Rostedt, Luca Abeni,
	Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

From: Luca Abeni <luca.abeni@santannapisa.it>

When a task has a runtime that cannot be served within the scheduling
deadline by any of the idle CPU (later_mask) the task is doomed to miss
its deadline.

This can happen since the SCHED_DEADLINE admission control guarantees
only bounded tardiness and not the hard respect of all deadlines.
In this case try to select the idle CPU with the largest CPU capacity
to minimize tardiness.

Favor task_cpu(p) if it has max capacity of !fitting CPUs so that
find_later_rq() can potentially still return it (most likely cache-hot)
early.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/cpudeadline.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 8630f2a40a3f..8cb06c8c7eb1 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -121,19 +121,31 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 
 	if (later_mask &&
 	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
-		int cpu;
+		unsigned long cap, max_cap = 0;
+		int cpu, max_cpu = -1;
 
 		if (!static_branch_unlikely(&sched_asym_cpucapacity))
 			return 1;
 
 		/* Ensure the capacity of the CPUs fits the task. */
 		for_each_cpu(cpu, later_mask) {
-			if (!dl_task_fits_capacity(p, cpu))
+			if (!dl_task_fits_capacity(p, cpu)) {
 				cpumask_clear_cpu(cpu, later_mask);
+
+				cap = capacity_orig_of(cpu);
+
+				if (cap > max_cap ||
+				    (cpu == task_cpu(p) && cap == max_cap)) {
+					max_cap = cap;
+					max_cpu = cpu;
+				}
+			}
 		}
 
-		if (!cpumask_empty(later_mask))
-			return 1;
+		if (cpumask_empty(later_mask))
+			cpumask_set_cpu(max_cpu, later_mask);
+
+		return 1;
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus()
  2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
@ 2020-05-22 14:57   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
  1 sibling, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2020-05-22 14:57 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On 20/05/20 15:42, Dietmar Eggemann wrote:
> Return the weight of the root domain (rd) span in case it is a subset
> of the cpu_active_mask.
> 
> Continue to compute the number of CPUs over rd span and cpu_active_mask
> when in hotplug.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/deadline.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 504d2f51b0d6..4ae22bfc37ae 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -54,10 +54,16 @@ static inline struct dl_bw *dl_bw_of(int i)
>  static inline int dl_bw_cpus(int i)
>  {
>  	struct root_domain *rd = cpu_rq(i)->rd;
> -	int cpus = 0;
> +	int cpus;
>  
>  	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
>  			 "sched RCU must be held");
> +
> +	if (cpumask_subset(rd->span, cpu_active_mask))
> +		return cpumask_weight(rd->span);
> +
> +	cpus = 0;
> +
>  	for_each_cpu_and(i, rd->span, cpu_active_mask)
>  		cpus++;
>  
> -- 

Acked-by: Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity()
  2020-05-20 13:42 ` [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity() Dietmar Eggemann
@ 2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
  1 sibling, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2020-05-22 14:58 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On 20/05/20 15:42, Dietmar Eggemann wrote:
> Capacity-aware SCHED_DEADLINE Admission Control (AC) needs root domain
> (rd) CPU capacity sum.
> 
> Introduce dl_bw_capacity() which for a symmetric rd w/ a CPU capacity
> of SCHED_CAPACITY_SCALE simply relies on dl_bw_cpus() to return #CPUs
> multiplied by SCHED_CAPACITY_SCALE.
> 
> For an asymmetric rd or a CPU capacity < SCHED_CAPACITY_SCALE it
> computes the CPU capacity sum over rd span and cpu_active_mask.
> 
> A 'XXX Fix:' comment was added to highlight that if 'rq->rd ==
> def_root_domain' AC should be performed against the capacity of the
> CPU the task is running on rather the rd CPU capacity sum. This
> issue already exists w/o capacity awareness.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/deadline.c | 33 +++++++++++++++++++++++++++++++++
>  1 file changed, 33 insertions(+)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 4ae22bfc37ae..ea7282ce484c 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -69,6 +69,34 @@ static inline int dl_bw_cpus(int i)
>  
>  	return cpus;
>  }
> +
> +static inline unsigned long __dl_bw_capacity(int i)
> +{
> +	struct root_domain *rd = cpu_rq(i)->rd;
> +	unsigned long cap = 0;
> +
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
> +			 "sched RCU must be held");
> +
> +	for_each_cpu_and(i, rd->span, cpu_active_mask)
> +		cap += capacity_orig_of(i);
> +
> +	return cap;
> +}
> +
> +/*
> + * XXX Fix: If 'rq->rd == def_root_domain' perform AC against capacity
> + * of the CPU the task is running on rather rd's \Sum CPU capacity.
> + */
> +static inline unsigned long dl_bw_capacity(int i)
> +{
> +	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
> +	    capacity_orig_of(i) == SCHED_CAPACITY_SCALE) {
> +		return dl_bw_cpus(i) << SCHED_CAPACITY_SHIFT;
> +	} else {
> +		return __dl_bw_capacity(i);
> +	}
> +}
>  #else
>  static inline struct dl_bw *dl_bw_of(int i)
>  {
> @@ -79,6 +107,11 @@ static inline int dl_bw_cpus(int i)
>  {
>  	return 1;
>  }
> +
> +static inline unsigned long dl_bw_capacity(int i)
> +{
> +	return SCHED_CAPACITY_SCALE;
> +}
>  #endif
>  
>  static inline
> -- 

Acked-by: Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities
  2020-05-20 13:42 ` [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities Dietmar Eggemann
@ 2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2020-05-22 14:58 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On 20/05/20 15:42, Dietmar Eggemann wrote:
> From: Luca Abeni <luca.abeni@santannapisa.it>
> 
> The current SCHED_DEADLINE (DL) admission control ensures that
> 
>     sum of reserved CPU bandwidth < x * M
> 
> where
> 
>     x = /proc/sys/kernel/sched_rt_{runtime,period}_us
>     M = # CPUs in root domain.
> 
> DL admission control works well for homogeneous systems where the
> capacity of all CPUs are equal (1024). I.e. bounded tardiness for DL
> and non-starvation of non-DL tasks is guaranteed.
> 
> But on heterogeneous systems where capacity of CPUs are different it
> could fail by over-allocating CPU time on smaller capacity CPUs.
> 
> On an Arm big.LITTLE/DynamIQ system DL tasks can easily starve other
> tasks making it unusable.
> 
> Fix this by explicitly considering the CPU capacity in the DL admission
> test by replacing M with the root domain CPU capacity sum.
> 
> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/deadline.c | 30 +++++++++++++++++-------------
>  kernel/sched/sched.h    |  6 +++---
>  2 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index ea7282ce484c..fa8566517715 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2590,11 +2590,12 @@ void sched_dl_do_global(void)
>  int sched_dl_overflow(struct task_struct *p, int policy,
>  		      const struct sched_attr *attr)
>  {
> -	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
>  	u64 period = attr->sched_period ?: attr->sched_deadline;
>  	u64 runtime = attr->sched_runtime;
>  	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
> -	int cpus, err = -1;
> +	int cpus, err = -1, cpu = task_cpu(p);
> +	struct dl_bw *dl_b = dl_bw_of(cpu);
> +	unsigned long cap;
>  
>  	if (attr->sched_flags & SCHED_FLAG_SUGOV)
>  		return 0;
> @@ -2609,15 +2610,17 @@ int sched_dl_overflow(struct task_struct *p, int policy,
>  	 * allocated bandwidth of the container.
>  	 */
>  	raw_spin_lock(&dl_b->lock);
> -	cpus = dl_bw_cpus(task_cpu(p));
> +	cpus = dl_bw_cpus(cpu);
> +	cap = dl_bw_capacity(cpu);
> +
>  	if (dl_policy(policy) && !task_has_dl_policy(p) &&
> -	    !__dl_overflow(dl_b, cpus, 0, new_bw)) {
> +	    !__dl_overflow(dl_b, cap, 0, new_bw)) {
>  		if (hrtimer_active(&p->dl.inactive_timer))
>  			__dl_sub(dl_b, p->dl.dl_bw, cpus);
>  		__dl_add(dl_b, new_bw, cpus);
>  		err = 0;
>  	} else if (dl_policy(policy) && task_has_dl_policy(p) &&
> -		   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
> +		   !__dl_overflow(dl_b, cap, p->dl.dl_bw, new_bw)) {
>  		/*
>  		 * XXX this is slightly incorrect: when the task
>  		 * utilization decreases, we should delay the total
> @@ -2753,19 +2756,19 @@ bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
>  #ifdef CONFIG_SMP
>  int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed)
>  {
> +	unsigned long flags, cap;
>  	unsigned int dest_cpu;
>  	struct dl_bw *dl_b;
>  	bool overflow;
> -	int cpus, ret;
> -	unsigned long flags;
> +	int ret;
>  
>  	dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);
>  
>  	rcu_read_lock_sched();
>  	dl_b = dl_bw_of(dest_cpu);
>  	raw_spin_lock_irqsave(&dl_b->lock, flags);
> -	cpus = dl_bw_cpus(dest_cpu);
> -	overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
> +	cap = dl_bw_capacity(dest_cpu);
> +	overflow = __dl_overflow(dl_b, cap, 0, p->dl.dl_bw);
>  	if (overflow) {
>  		ret = -EBUSY;
>  	} else {
> @@ -2775,6 +2778,8 @@ int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allo
>  		 * We will free resources in the source root_domain
>  		 * later on (see set_cpus_allowed_dl()).
>  		 */
> +		int cpus = dl_bw_cpus(dest_cpu);
> +
>  		__dl_add(dl_b, p->dl.dl_bw, cpus);
>  		ret = 0;
>  	}
> @@ -2807,16 +2812,15 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
>  
>  bool dl_cpu_busy(unsigned int cpu)
>  {
> -	unsigned long flags;
> +	unsigned long flags, cap;
>  	struct dl_bw *dl_b;
>  	bool overflow;
> -	int cpus;
>  
>  	rcu_read_lock_sched();
>  	dl_b = dl_bw_of(cpu);
>  	raw_spin_lock_irqsave(&dl_b->lock, flags);
> -	cpus = dl_bw_cpus(cpu);
> -	overflow = __dl_overflow(dl_b, cpus, 0, 0);
> +	cap = dl_bw_capacity(cpu);
> +	overflow = __dl_overflow(dl_b, cap, 0, 0);
>  	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>  	rcu_read_unlock_sched();
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 21416b30c520..14cb6a97e2d2 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -310,11 +310,11 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
>  	__dl_update(dl_b, -((s32)tsk_bw / cpus));
>  }
>  
> -static inline
> -bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
> +static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
> +				 u64 old_bw, u64 new_bw)
>  {
>  	return dl_b->bw != -1 &&
> -	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
> +	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
>  }
>  
>  extern void init_dl_bw(struct dl_bw *dl_b);
> -- 

Acked-by: Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 4/5] sched/deadline: Make DL capacity-aware
  2020-05-20 13:42 ` [PATCH v3 4/5] sched/deadline: Make DL capacity-aware Dietmar Eggemann
@ 2020-05-22 14:58   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2020-05-22 14:58 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On 20/05/20 15:42, Dietmar Eggemann wrote:
> From: Luca Abeni <luca.abeni@santannapisa.it>
> 
> The current SCHED_DEADLINE (DL) scheduler uses a global EDF scheduling
> algorithm w/o considering CPU capacity or task utilization.
> This works well on homogeneous systems where DL tasks are guaranteed
> to have a bounded tardiness but presents issues on heterogeneous
> systems.
> 
> A DL task can migrate to a CPU which does not have enough CPU capacity
> to correctly serve the task (e.g. a task w/ 70ms runtime and 100ms
> period on a CPU w/ 512 capacity).
> 
> Add the DL fitness function dl_task_fits_capacity() for DL admission
> control on heterogeneous systems. A task fits onto a CPU if:
> 
>     CPU original capacity / 1024 >= task runtime / task deadline
> 
> Use this function on heterogeneous systems to try to find a CPU which
> meets this criterion during task wakeup, push and offline migration.
> 
> On homogeneous systems the original behavior of the DL admission
> control should be retained.
> 
> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/cpudeadline.c | 14 +++++++++++++-
>  kernel/sched/deadline.c    | 18 ++++++++++++++----
>  kernel/sched/sched.h       | 15 +++++++++++++++
>  3 files changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 5cc4012572ec..8630f2a40a3f 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -121,7 +121,19 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
>  
>  	if (later_mask &&
>  	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
> -		return 1;
> +		int cpu;
> +
> +		if (!static_branch_unlikely(&sched_asym_cpucapacity))
> +			return 1;
> +
> +		/* Ensure the capacity of the CPUs fits the task. */
> +		for_each_cpu(cpu, later_mask) {
> +			if (!dl_task_fits_capacity(p, cpu))
> +				cpumask_clear_cpu(cpu, later_mask);
> +		}
> +
> +		if (!cpumask_empty(later_mask))
> +			return 1;
>  	} else {
>  		int best_cpu = cpudl_maximum(cp);
>  
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index fa8566517715..f2e8f5a36707 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1643,6 +1643,7 @@ static int
>  select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
>  {
>  	struct task_struct *curr;
> +	bool select_rq;
>  	struct rq *rq;
>  
>  	if (sd_flag != SD_BALANCE_WAKE)
> @@ -1662,10 +1663,19 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
>  	 * other hand, if it has a shorter deadline, we
>  	 * try to make it stay here, it might be important.
>  	 */
> -	if (unlikely(dl_task(curr)) &&
> -	    (curr->nr_cpus_allowed < 2 ||
> -	     !dl_entity_preempt(&p->dl, &curr->dl)) &&
> -	    (p->nr_cpus_allowed > 1)) {
> +	select_rq = unlikely(dl_task(curr)) &&
> +		    (curr->nr_cpus_allowed < 2 ||
> +		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
> +		    p->nr_cpus_allowed > 1;
> +
> +	/*
> +	 * Take the capacity of the CPU into account to
> +	 * ensure it fits the requirement of the task.
> +	 */
> +	if (static_branch_unlikely(&sched_asym_cpucapacity))
> +		select_rq |= !dl_task_fits_capacity(p, cpu);
> +
> +	if (select_rq) {
>  		int target = find_later_rq(p);
>  
>  		if (target != -1 &&
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 14cb6a97e2d2..6ebbb1f353c4 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -317,6 +317,21 @@ static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
>  	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
>  }
>  
> +/*
> + * Verify the fitness of task @p to run on @cpu taking into account the
> + * CPU original capacity and the runtime/deadline ratio of the task.
> + *
> + * The function will return true if the CPU original capacity of the
> + * @cpu scaled by SCHED_CAPACITY_SCALE >= runtime/deadline ratio of the
> + * task and false otherwise.
> + */
> +static inline bool dl_task_fits_capacity(struct task_struct *p, int cpu)
> +{
> +	unsigned long cap = arch_scale_cpu_capacity(cpu);
> +
> +	return cap_scale(p->dl.dl_deadline, cap) >= p->dl.dl_runtime;
> +}
> +
>  extern void init_dl_bw(struct dl_bw *dl_b);
>  extern int  sched_dl_global_validate(void);
>  extern void sched_dl_do_global(void);
> -- 

Acked-by: Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case
  2020-05-20 13:42 ` [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case Dietmar Eggemann
@ 2020-05-22 14:59   ` Juri Lelli
  2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2020-05-22 14:59 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On 20/05/20 15:42, Dietmar Eggemann wrote:
> From: Luca Abeni <luca.abeni@santannapisa.it>
> 
> When a task has a runtime that cannot be served within the scheduling
> deadline by any of the idle CPU (later_mask) the task is doomed to miss
> its deadline.
> 
> This can happen since the SCHED_DEADLINE admission control guarantees
> only bounded tardiness and not the hard respect of all deadlines.
> In this case try to select the idle CPU with the largest CPU capacity
> to minimize tardiness.
> 
> Favor task_cpu(p) if it has max capacity of !fitting CPUs so that
> find_later_rq() can potentially still return it (most likely cache-hot)
> early.
> 
> Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  kernel/sched/cpudeadline.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 8630f2a40a3f..8cb06c8c7eb1 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -121,19 +121,31 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
>  
>  	if (later_mask &&
>  	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
> -		int cpu;
> +		unsigned long cap, max_cap = 0;
> +		int cpu, max_cpu = -1;
>  
>  		if (!static_branch_unlikely(&sched_asym_cpucapacity))
>  			return 1;
>  
>  		/* Ensure the capacity of the CPUs fits the task. */
>  		for_each_cpu(cpu, later_mask) {
> -			if (!dl_task_fits_capacity(p, cpu))
> +			if (!dl_task_fits_capacity(p, cpu)) {
>  				cpumask_clear_cpu(cpu, later_mask);
> +
> +				cap = capacity_orig_of(cpu);
> +
> +				if (cap > max_cap ||
> +				    (cpu == task_cpu(p) && cap == max_cap)) {
> +					max_cap = cap;
> +					max_cpu = cpu;
> +				}
> +			}
>  		}
>  
> -		if (!cpumask_empty(later_mask))
> -			return 1;
> +		if (cpumask_empty(later_mask))
> +			cpumask_set_cpu(max_cpu, later_mask);
> +
> +		return 1;
>  	} else {
>  		int best_cpu = cpudl_maximum(cp);
>  
> -- 

Acked-by: Juri Lelli <juri.lelli@redhat.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE
  2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
                   ` (4 preceding siblings ...)
  2020-05-20 13:42 ` [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case Dietmar Eggemann
@ 2020-06-10 10:26 ` Peter Zijlstra
  5 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2020-06-10 10:26 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Ingo Molnar, Juri Lelli, Vincent Guittot, Steven Rostedt,
	Luca Abeni, Daniel Bristot de Oliveira, Wei Wang, Quentin Perret,
	Alessio Balsini, Pavan Kondeti, Patrick Bellasi,
	Morten Rasmussen, Valentin Schneider, Qais Yousef, linux-kernel

On Wed, May 20, 2020 at 03:42:38PM +0200, Dietmar Eggemann wrote:
> Dietmar Eggemann (2):
>   sched/deadline: Optimize dl_bw_cpus()
>   sched/deadline: Add dl_bw_capacity()
> 
> Luca Abeni (3):
>   sched/deadline: Improve admission control for asymmetric CPU
>     capacities
>   sched/deadline: Make DL capacity-aware
>   sched/deadline: Implement fallback mechanism for !fit case
> 
>  kernel/sched/cpudeadline.c | 24 ++++++++++
>  kernel/sched/deadline.c    | 89 ++++++++++++++++++++++++++++++--------
>  kernel/sched/sched.h       | 21 +++++++--
>  3 files changed, 113 insertions(+), 21 deletions(-)

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: sched/core] sched/deadline: Make DL capacity-aware
  2020-05-20 13:42 ` [PATCH v3 4/5] sched/deadline: Make DL capacity-aware Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
@ 2020-06-16 12:21   ` tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Luca Abeni @ 2020-06-16 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Luca Abeni, Dietmar Eggemann, Peter Zijlstra (Intel),
	Juri Lelli, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     b4118988fdcb4554ea6687dd8ff68bcab690b8ea
Gitweb:        https://git.kernel.org/tip/b4118988fdcb4554ea6687dd8ff68bcab690b8ea
Author:        Luca Abeni <luca.abeni@santannapisa.it>
AuthorDate:    Wed, 20 May 2020 15:42:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 15 Jun 2020 14:10:05 +02:00

sched/deadline: Make DL capacity-aware

The current SCHED_DEADLINE (DL) scheduler uses a global EDF scheduling
algorithm w/o considering CPU capacity or task utilization.
This works well on homogeneous systems where DL tasks are guaranteed
to have a bounded tardiness but presents issues on heterogeneous
systems.

A DL task can migrate to a CPU which does not have enough CPU capacity
to correctly serve the task (e.g. a task w/ 70ms runtime and 100ms
period on a CPU w/ 512 capacity).

Add the DL fitness function dl_task_fits_capacity() for DL admission
control on heterogeneous systems. A task fits onto a CPU if:

    CPU original capacity / 1024 >= task runtime / task deadline

Use this function on heterogeneous systems to try to find a CPU which
meets this criterion during task wakeup, push and offline migration.

On homogeneous systems the original behavior of the DL admission
control should be retained.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200520134243.19352-5-dietmar.eggemann@arm.com
---
 kernel/sched/cpudeadline.c | 14 +++++++++++++-
 kernel/sched/deadline.c    | 18 ++++++++++++++----
 kernel/sched/sched.h       | 15 +++++++++++++++
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5cc4012..8630f2a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -121,7 +121,19 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 
 	if (later_mask &&
 	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
-		return 1;
+		int cpu;
+
+		if (!static_branch_unlikely(&sched_asym_cpucapacity))
+			return 1;
+
+		/* Ensure the capacity of the CPUs fits the task. */
+		for_each_cpu(cpu, later_mask) {
+			if (!dl_task_fits_capacity(p, cpu))
+				cpumask_clear_cpu(cpu, later_mask);
+		}
+
+		if (!cpumask_empty(later_mask))
+			return 1;
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9ebd0a9..84e84ba 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1643,6 +1643,7 @@ static int
 select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
 {
 	struct task_struct *curr;
+	bool select_rq;
 	struct rq *rq;
 
 	if (sd_flag != SD_BALANCE_WAKE)
@@ -1662,10 +1663,19 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	if (unlikely(dl_task(curr)) &&
-	    (curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &curr->dl)) &&
-	    (p->nr_cpus_allowed > 1)) {
+	select_rq = unlikely(dl_task(curr)) &&
+		    (curr->nr_cpus_allowed < 2 ||
+		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		    p->nr_cpus_allowed > 1;
+
+	/*
+	 * Take the capacity of the CPU into account to
+	 * ensure it fits the requirement of the task.
+	 */
+	if (static_branch_unlikely(&sched_asym_cpucapacity))
+		select_rq |= !dl_task_fits_capacity(p, cpu);
+
+	if (select_rq) {
 		int target = find_later_rq(p);
 
 		if (target != -1 &&
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 91b250f..3368876 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -317,6 +317,21 @@ static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
 	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
 }
 
+/*
+ * Verify the fitness of task @p to run on @cpu taking into account the
+ * CPU original capacity and the runtime/deadline ratio of the task.
+ *
+ * The function will return true if the CPU original capacity of the
+ * @cpu scaled by SCHED_CAPACITY_SCALE >= runtime/deadline ratio of the
+ * task and false otherwise.
+ */
+static inline bool dl_task_fits_capacity(struct task_struct *p, int cpu)
+{
+	unsigned long cap = arch_scale_cpu_capacity(cpu);
+
+	return cap_scale(p->dl.dl_deadline, cap) >= p->dl.dl_runtime;
+}
+
 extern void init_dl_bw(struct dl_bw *dl_b);
 extern int  sched_dl_global_validate(void);
 extern void sched_dl_do_global(void);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/core] sched/deadline: Improve admission control for asymmetric CPU capacities
  2020-05-20 13:42 ` [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
@ 2020-06-16 12:21   ` tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Luca Abeni @ 2020-06-16 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Luca Abeni, Dietmar Eggemann, Peter Zijlstra (Intel),
	Juri Lelli, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     60ffd5edc5e4fa69622c125c54ef8e7d5d894af8
Gitweb:        https://git.kernel.org/tip/60ffd5edc5e4fa69622c125c54ef8e7d5d894af8
Author:        Luca Abeni <luca.abeni@santannapisa.it>
AuthorDate:    Wed, 20 May 2020 15:42:41 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 15 Jun 2020 14:10:05 +02:00

sched/deadline: Improve admission control for asymmetric CPU capacities

The current SCHED_DEADLINE (DL) admission control ensures that

    sum of reserved CPU bandwidth < x * M

where

    x = /proc/sys/kernel/sched_rt_{runtime,period}_us
    M = # CPUs in root domain.

DL admission control works well for homogeneous systems where the
capacity of all CPUs are equal (1024). I.e. bounded tardiness for DL
and non-starvation of non-DL tasks is guaranteed.

But on heterogeneous systems where capacity of CPUs are different it
could fail by over-allocating CPU time on smaller capacity CPUs.

On an Arm big.LITTLE/DynamIQ system DL tasks can easily starve other
tasks making it unusable.

Fix this by explicitly considering the CPU capacity in the DL admission
test by replacing M with the root domain CPU capacity sum.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200520134243.19352-4-dietmar.eggemann@arm.com
---
 kernel/sched/deadline.c | 30 +++++++++++++++++-------------
 kernel/sched/sched.h    |  6 +++---
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 01f474a..9ebd0a9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2590,11 +2590,12 @@ void sched_dl_do_global(void)
 int sched_dl_overflow(struct task_struct *p, int policy,
 		      const struct sched_attr *attr)
 {
-	struct dl_bw *dl_b = dl_bw_of(task_cpu(p));
 	u64 period = attr->sched_period ?: attr->sched_deadline;
 	u64 runtime = attr->sched_runtime;
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
-	int cpus, err = -1;
+	int cpus, err = -1, cpu = task_cpu(p);
+	struct dl_bw *dl_b = dl_bw_of(cpu);
+	unsigned long cap;
 
 	if (attr->sched_flags & SCHED_FLAG_SUGOV)
 		return 0;
@@ -2609,15 +2610,17 @@ int sched_dl_overflow(struct task_struct *p, int policy,
 	 * allocated bandwidth of the container.
 	 */
 	raw_spin_lock(&dl_b->lock);
-	cpus = dl_bw_cpus(task_cpu(p));
+	cpus = dl_bw_cpus(cpu);
+	cap = dl_bw_capacity(cpu);
+
 	if (dl_policy(policy) && !task_has_dl_policy(p) &&
-	    !__dl_overflow(dl_b, cpus, 0, new_bw)) {
+	    !__dl_overflow(dl_b, cap, 0, new_bw)) {
 		if (hrtimer_active(&p->dl.inactive_timer))
 			__dl_sub(dl_b, p->dl.dl_bw, cpus);
 		__dl_add(dl_b, new_bw, cpus);
 		err = 0;
 	} else if (dl_policy(policy) && task_has_dl_policy(p) &&
-		   !__dl_overflow(dl_b, cpus, p->dl.dl_bw, new_bw)) {
+		   !__dl_overflow(dl_b, cap, p->dl.dl_bw, new_bw)) {
 		/*
 		 * XXX this is slightly incorrect: when the task
 		 * utilization decreases, we should delay the total
@@ -2772,19 +2775,19 @@ bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
 #ifdef CONFIG_SMP
 int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed)
 {
+	unsigned long flags, cap;
 	unsigned int dest_cpu;
 	struct dl_bw *dl_b;
 	bool overflow;
-	int cpus, ret;
-	unsigned long flags;
+	int ret;
 
 	dest_cpu = cpumask_any_and(cpu_active_mask, cs_cpus_allowed);
 
 	rcu_read_lock_sched();
 	dl_b = dl_bw_of(dest_cpu);
 	raw_spin_lock_irqsave(&dl_b->lock, flags);
-	cpus = dl_bw_cpus(dest_cpu);
-	overflow = __dl_overflow(dl_b, cpus, 0, p->dl.dl_bw);
+	cap = dl_bw_capacity(dest_cpu);
+	overflow = __dl_overflow(dl_b, cap, 0, p->dl.dl_bw);
 	if (overflow) {
 		ret = -EBUSY;
 	} else {
@@ -2794,6 +2797,8 @@ int dl_task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allo
 		 * We will free resources in the source root_domain
 		 * later on (see set_cpus_allowed_dl()).
 		 */
+		int cpus = dl_bw_cpus(dest_cpu);
+
 		__dl_add(dl_b, p->dl.dl_bw, cpus);
 		ret = 0;
 	}
@@ -2826,16 +2831,15 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
 
 bool dl_cpu_busy(unsigned int cpu)
 {
-	unsigned long flags;
+	unsigned long flags, cap;
 	struct dl_bw *dl_b;
 	bool overflow;
-	int cpus;
 
 	rcu_read_lock_sched();
 	dl_b = dl_bw_of(cpu);
 	raw_spin_lock_irqsave(&dl_b->lock, flags);
-	cpus = dl_bw_cpus(cpu);
-	overflow = __dl_overflow(dl_b, cpus, 0, 0);
+	cap = dl_bw_capacity(cpu);
+	overflow = __dl_overflow(dl_b, cap, 0, 0);
 	raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 	rcu_read_unlock_sched();
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8d5d068..91b250f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -310,11 +310,11 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
 	__dl_update(dl_b, -((s32)tsk_bw / cpus));
 }
 
-static inline
-bool __dl_overflow(struct dl_bw *dl_b, int cpus, u64 old_bw, u64 new_bw)
+static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap,
+				 u64 old_bw, u64 new_bw)
 {
 	return dl_b->bw != -1 &&
-	       dl_b->bw * cpus < dl_b->total_bw - old_bw + new_bw;
+	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
 }
 
 extern void init_dl_bw(struct dl_bw *dl_b);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/core] sched/deadline: Implement fallback mechanism for !fit case
  2020-05-20 13:42 ` [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case Dietmar Eggemann
  2020-05-22 14:59   ` Juri Lelli
@ 2020-06-16 12:21   ` tip-bot2 for Luca Abeni
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Luca Abeni @ 2020-06-16 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Luca Abeni, Dietmar Eggemann, Peter Zijlstra (Intel),
	Juri Lelli, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     23e71d8ba42933bff12e453858fd68c073bc5258
Gitweb:        https://git.kernel.org/tip/23e71d8ba42933bff12e453858fd68c073bc5258
Author:        Luca Abeni <luca.abeni@santannapisa.it>
AuthorDate:    Wed, 20 May 2020 15:42:43 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 15 Jun 2020 14:10:05 +02:00

sched/deadline: Implement fallback mechanism for !fit case

When a task has a runtime that cannot be served within the scheduling
deadline by any of the idle CPU (later_mask) the task is doomed to miss
its deadline.

This can happen since the SCHED_DEADLINE admission control guarantees
only bounded tardiness and not the hard respect of all deadlines.
In this case try to select the idle CPU with the largest CPU capacity
to minimize tardiness.

Favor task_cpu(p) if it has max capacity of !fitting CPUs so that
find_later_rq() can potentially still return it (most likely cache-hot)
early.

Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200520134243.19352-6-dietmar.eggemann@arm.com
---
 kernel/sched/cpudeadline.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 8630f2a..8cb06c8 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -121,19 +121,31 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 
 	if (later_mask &&
 	    cpumask_and(later_mask, cp->free_cpus, p->cpus_ptr)) {
-		int cpu;
+		unsigned long cap, max_cap = 0;
+		int cpu, max_cpu = -1;
 
 		if (!static_branch_unlikely(&sched_asym_cpucapacity))
 			return 1;
 
 		/* Ensure the capacity of the CPUs fits the task. */
 		for_each_cpu(cpu, later_mask) {
-			if (!dl_task_fits_capacity(p, cpu))
+			if (!dl_task_fits_capacity(p, cpu)) {
 				cpumask_clear_cpu(cpu, later_mask);
+
+				cap = capacity_orig_of(cpu);
+
+				if (cap > max_cap ||
+				    (cpu == task_cpu(p) && cap == max_cap)) {
+					max_cap = cap;
+					max_cpu = cpu;
+				}
+			}
 		}
 
-		if (!cpumask_empty(later_mask))
-			return 1;
+		if (cpumask_empty(later_mask))
+			cpumask_set_cpu(max_cpu, later_mask);
+
+		return 1;
 	} else {
 		int best_cpu = cpudl_maximum(cp);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/core] sched/deadline: Add dl_bw_capacity()
  2020-05-20 13:42 ` [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity() Dietmar Eggemann
  2020-05-22 14:58   ` Juri Lelli
@ 2020-06-16 12:21   ` tip-bot2 for Dietmar Eggemann
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Dietmar Eggemann @ 2020-06-16 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dietmar Eggemann, Peter Zijlstra (Intel), Juri Lelli, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     fc9dc698472aa460a8b3b036d9b1d0b751f12f58
Gitweb:        https://git.kernel.org/tip/fc9dc698472aa460a8b3b036d9b1d0b751f12f58
Author:        Dietmar Eggemann <dietmar.eggemann@arm.com>
AuthorDate:    Wed, 20 May 2020 15:42:40 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 15 Jun 2020 14:10:05 +02:00

sched/deadline: Add dl_bw_capacity()

Capacity-aware SCHED_DEADLINE Admission Control (AC) needs root domain
(rd) CPU capacity sum.

Introduce dl_bw_capacity() which for a symmetric rd w/ a CPU capacity
of SCHED_CAPACITY_SCALE simply relies on dl_bw_cpus() to return #CPUs
multiplied by SCHED_CAPACITY_SCALE.

For an asymmetric rd or a CPU capacity < SCHED_CAPACITY_SCALE it
computes the CPU capacity sum over rd span and cpu_active_mask.

A 'XXX Fix:' comment was added to highlight that if 'rq->rd ==
def_root_domain' AC should be performed against the capacity of the
CPU the task is running on rather the rd CPU capacity sum. This
issue already exists w/o capacity awareness.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200520134243.19352-3-dietmar.eggemann@arm.com
---
 kernel/sched/deadline.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ec90265..01f474a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -69,6 +69,34 @@ static inline int dl_bw_cpus(int i)
 
 	return cpus;
 }
+
+static inline unsigned long __dl_bw_capacity(int i)
+{
+	struct root_domain *rd = cpu_rq(i)->rd;
+	unsigned long cap = 0;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
+			 "sched RCU must be held");
+
+	for_each_cpu_and(i, rd->span, cpu_active_mask)
+		cap += capacity_orig_of(i);
+
+	return cap;
+}
+
+/*
+ * XXX Fix: If 'rq->rd == def_root_domain' perform AC against capacity
+ * of the CPU the task is running on rather rd's \Sum CPU capacity.
+ */
+static inline unsigned long dl_bw_capacity(int i)
+{
+	if (!static_branch_unlikely(&sched_asym_cpucapacity) &&
+	    capacity_orig_of(i) == SCHED_CAPACITY_SCALE) {
+		return dl_bw_cpus(i) << SCHED_CAPACITY_SHIFT;
+	} else {
+		return __dl_bw_capacity(i);
+	}
+}
 #else
 static inline struct dl_bw *dl_bw_of(int i)
 {
@@ -79,6 +107,11 @@ static inline int dl_bw_cpus(int i)
 {
 	return 1;
 }
+
+static inline unsigned long dl_bw_capacity(int i)
+{
+	return SCHED_CAPACITY_SCALE;
+}
 #endif
 
 static inline

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/core] sched/deadline: Optimize dl_bw_cpus()
  2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
  2020-05-22 14:57   ` Juri Lelli
@ 2020-06-16 12:21   ` tip-bot2 for Dietmar Eggemann
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Dietmar Eggemann @ 2020-06-16 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dietmar Eggemann, Peter Zijlstra (Intel), Juri Lelli, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c81b89329933c6c0be809d4c0d2cb57c49153ee3
Gitweb:        https://git.kernel.org/tip/c81b89329933c6c0be809d4c0d2cb57c49153ee3
Author:        Dietmar Eggemann <dietmar.eggemann@arm.com>
AuthorDate:    Wed, 20 May 2020 15:42:39 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 15 Jun 2020 14:10:04 +02:00

sched/deadline: Optimize dl_bw_cpus()

Return the weight of the root domain (rd) span in case it is a subset
of the cpu_active_mask.

Continue to compute the number of CPUs over rd span and cpu_active_mask
when in hotplug.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200520134243.19352-2-dietmar.eggemann@arm.com
---
 kernel/sched/deadline.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f31964a..ec90265 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -54,10 +54,16 @@ static inline struct dl_bw *dl_bw_of(int i)
 static inline int dl_bw_cpus(int i)
 {
 	struct root_domain *rd = cpu_rq(i)->rd;
-	int cpus = 0;
+	int cpus;
 
 	RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(),
 			 "sched RCU must be held");
+
+	if (cpumask_subset(rd->span, cpu_active_mask))
+		return cpumask_weight(rd->span);
+
+	cpus = 0;
+
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-06-16 12:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 13:42 [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Dietmar Eggemann
2020-05-20 13:42 ` [PATCH v3 1/5] sched/deadline: Optimize dl_bw_cpus() Dietmar Eggemann
2020-05-22 14:57   ` Juri Lelli
2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
2020-05-20 13:42 ` [PATCH v3 2/5] sched/deadline: Add dl_bw_capacity() Dietmar Eggemann
2020-05-22 14:58   ` Juri Lelli
2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Dietmar Eggemann
2020-05-20 13:42 ` [PATCH v3 3/5] sched/deadline: Improve admission control for asymmetric CPU capacities Dietmar Eggemann
2020-05-22 14:58   ` Juri Lelli
2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
2020-05-20 13:42 ` [PATCH v3 4/5] sched/deadline: Make DL capacity-aware Dietmar Eggemann
2020-05-22 14:58   ` Juri Lelli
2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
2020-05-20 13:42 ` [PATCH v3 5/5] sched/deadline: Implement fallback mechanism for !fit case Dietmar Eggemann
2020-05-22 14:59   ` Juri Lelli
2020-06-16 12:21   ` [tip: sched/core] " tip-bot2 for Luca Abeni
2020-06-10 10:26 ` [PATCH v3 0/5] Capacity awareness for SCHED_DEADLINE Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.