* [RFC PATCH 1/7] sched: Add WF_TTWU, WF_EXEC wakeup flags
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
@ 2019-12-11 16:43 ` Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 2/7] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:43 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, Juri Lelli,
Steven Rostedt
To remove the sd_flag parameter of select_task_rq(), we need another way of
encoding wakeup kinds. There already is a WF_FORK flag, add the missing
ones.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/sched.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..1936ab2faff0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1639,11 +1639,13 @@ static inline int task_on_rq_migrating(struct task_struct *p)
}
/*
- * wake flags
+ * Wake flags
*/
#define WF_SYNC 0x01 /* Waker goes to sleep after wakeup */
-#define WF_FORK 0x02 /* Child wakeup after fork */
-#define WF_MIGRATED 0x4 /* Internal use, task got migrated */
+#define WF_TTWU 0x02 /* Regular task wakeup */
+#define WF_FORK 0x04 /* Child wakeup after fork */
+#define WF_EXEC 0x08 /* "Fake" wakeup at exec */
+#define WF_MIGRATED 0x10 /* Internal use, task got migrated */
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 2/7] sched: Kill select_task_rq()'s sd_flag parameter
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 1/7] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
@ 2019-12-11 16:43 ` Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 3/7] sched/fair: find_idlest_group(): Remove unused " Valentin Schneider
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:43 UTC (permalink / raw)
To: linux-kernel
Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, Juri Lelli,
Steven Rostedt
Only select_task_rq_fair() uses that parameter to do an actual domain
search, other classes only care about what kind of wakeup is happening
(fork, exec, or "regular") and thus just translate the flag into a wakeup
type.
WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to
encode the wakeup types we care about. This cleans up the API a
bit, but adds an extra conversion layer within select_task_rq_fair().
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/core.c | 10 +++++-----
kernel/sched/deadline.c | 4 ++--
kernel/sched/fair.c | 18 +++++++++++++++---
kernel/sched/idle.c | 2 +-
kernel/sched/rt.c | 4 ++--
kernel/sched/sched.h | 2 +-
kernel/sched/stop_task.c | 2 +-
7 files changed, 27 insertions(+), 15 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 90e4b00ace89..38b434846d5e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2101,12 +2101,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
* The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
*/
static inline
-int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
{
lockdep_assert_held(&p->pi_lock);
if (p->nr_cpus_allowed > 1)
- cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
+ cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
else
cpu = cpumask_any(p->cpus_ptr);
@@ -2625,7 +2625,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
atomic_dec(&task_rq(p)->nr_iowait);
}
- cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
+ cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
if (task_cpu(p) != cpu) {
wake_flags |= WF_MIGRATED;
psi_ttwu_dequeue(p);
@@ -2958,7 +2958,7 @@ void wake_up_new_task(struct task_struct *p)
* as we're not fully set-up yet.
*/
p->recent_used_cpu = task_cpu(p);
- __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+ __set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
#endif
rq = __task_rq_lock(p, &rf);
update_rq_clock(rq);
@@ -3499,7 +3499,7 @@ void sched_exec(void)
int dest_cpu;
raw_spin_lock_irqsave(&p->pi_lock, flags);
- dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0);
+ dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
if (dest_cpu == smp_processor_id())
goto unlock;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 43323f875cb9..53216fffb871 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1599,12 +1599,12 @@ static void yield_task_dl(struct rq *rq)
static int find_later_rq(struct task_struct *task);
static int
-select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_dl(struct task_struct *p, int cpu, int flags)
{
struct task_struct *curr;
struct rq *rq;
- if (sd_flag != SD_BALANCE_WAKE)
+ if (!(flags & WF_TTWU))
goto out;
rq = cpu_rq(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08a233e97a01..a614e4794024 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6325,7 +6325,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
/*
* select_task_rq_fair: Select target runqueue for the waking task in domains
- * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
+ * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAKE,
* SD_BALANCE_FORK, or SD_BALANCE_EXEC.
*
* Balances load by selecting the idlest CPU in the idlest group, or under
@@ -6336,13 +6336,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
* preempt must be disabled.
*/
static int
-select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
{
+ int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
struct sched_domain *tmp, *sd = NULL;
int cpu = smp_processor_id();
int new_cpu = prev_cpu;
int want_affine = 0;
- int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
+ int sd_flag;
+
+ switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
+ case WF_TTWU:
+ sd_flag = SD_BALANCE_WAKE;
+ break;
+ case WF_FORK:
+ sd_flag = SD_BALANCE_FORK;
+ break;
+ default:
+ sd_flag = SD_BALANCE_EXEC;
+ }
if (sd_flag & SD_BALANCE_WAKE) {
record_wakee(p);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index ffa959e91227..416185886e7f 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -367,7 +367,7 @@ void cpu_startup_entry(enum cpuhp_state state)
#ifdef CONFIG_SMP
static int
-select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int cpu, int flags)
{
return task_cpu(p); /* IDLE tasks as never migrated */
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e591d40fd645..ae579d79ddc0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1387,13 +1387,13 @@ static void yield_task_rt(struct rq *rq)
static int find_lowest_rq(struct task_struct *task);
static int
-select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int cpu, int flags)
{
struct task_struct *curr;
struct rq *rq;
/* For anything but wake ups, just return the task_cpu */
- if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
+ if (!(flags & (WF_TTWU | WF_FORK)))
goto out;
rq = cpu_rq(cpu);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1936ab2faff0..62efc0443cac 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1722,7 +1722,7 @@ struct sched_class {
#ifdef CONFIG_SMP
int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
- int (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
+ int (*select_task_rq)(struct task_struct *p, int task_cpu, int flags);
void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
void (*task_woken)(struct rq *this_rq, struct task_struct *task);
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..4f061ddf8470 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -11,7 +11,7 @@
#ifdef CONFIG_SMP
static int
-select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int cpu, int flags)
{
return task_cpu(p); /* stop tasks as never migrate */
}
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 3/7] sched/fair: find_idlest_group(): Remove unused sd_flag parameter
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 1/7] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 2/7] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
@ 2019-12-11 16:43 ` Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 4/7] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:43 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
The last use of that parameter was removed by commit
57abff067a08 ("sched/fair: Rework find_idlest_group()")
Get rid of the parameter.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/fair.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a614e4794024..f0b2b403bebb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5575,8 +5575,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
}
static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
- int this_cpu, int sd_flag);
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
/*
* find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
@@ -5665,7 +5664,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
continue;
}
- group = find_idlest_group(sd, p, cpu, sd_flag);
+ group = find_idlest_group(sd, p, cpu);
if (!group) {
sd = sd->child;
continue;
@@ -8382,8 +8381,7 @@ static bool update_pick_idlest(struct sched_group *idlest,
* Assumes p is allowed on at least one CPU in sd.
*/
static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
- int this_cpu, int sd_flag)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
{
struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
struct sg_lb_stats local_sgs, tmp_sgs;
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 4/7] sched/fair: Dissociate wakeup decisions from SD flag value
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
` (2 preceding siblings ...)
2019-12-11 16:43 ` [RFC PATCH 3/7] sched/fair: find_idlest_group(): Remove unused " Valentin Schneider
@ 2019-12-11 16:43 ` Valentin Schneider
2019-12-11 16:53 ` Valentin Schneider
2019-12-11 16:43 ` [RFC PATCH 5/7] sched/topology: Make {lowest/highest}_flag_domain() work with > 1 flags Valentin Schneider
` (3 subsequent siblings)
7 siblings, 1 reply; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:43 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
The CFS wakeup code will only ever go through EAS / its fast path on
"regular" wakeups (i.e. not on forks or execs). These are currently gated
by a check against 'sd_flag', which would be SD_BALANCE_WAKE at wakeup.
However, we now have a flag that explicitly tells us whether a wakeup is a
"regular" one, so hinge those conditions on that flag instead.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/fair.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f0b2b403bebb..30e8d357a24f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6355,7 +6355,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
sd_flag = SD_BALANCE_EXEC;
}
- if (sd_flag & SD_BALANCE_WAKE) {
+ if (wake_flags & WF_TTWU) {
record_wakee(p);
if (sched_energy_enabled()) {
@@ -6396,9 +6396,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
if (unlikely(sd)) {
/* Slow path */
new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
- } else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
+ } else if (wake_flags & WF_TTWU) { /* XXX always ? */
/* Fast path */
-
new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
if (want_affine)
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 4/7] sched/fair: Dissociate wakeup decisions from SD flag value
2019-12-11 16:43 ` [RFC PATCH 4/7] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
@ 2019-12-11 16:53 ` Valentin Schneider
0 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:53 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
On 11/12/2019 16:43, Valentin Schneider wrote:
> @@ -6396,9 +6396,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> if (unlikely(sd)) {
> /* Slow path */
> new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
> - } else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
> + } else if (wake_flags & WF_TTWU) { /* XXX always ? */
While I'm at it, Dietmar pointed out to me that this is only really
relevant to forkees and execees when a NULL domain is attached to the CPU
(since sd_init() unconditionally sets SD_BALANCE_{FORK, EXEC}). So this
only makes a difference when the SD hierarchy hasn't been built / is being
rebuilt, or when cpusets are involved.
So perhaps we could make that an unconditional else, or make forkees/execees
bail out earlier.
> /* Fast path */
> -
> new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
>
> if (want_affine)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC PATCH 5/7] sched/topology: Make {lowest/highest}_flag_domain() work with > 1 flags
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
` (3 preceding siblings ...)
2019-12-11 16:43 ` [RFC PATCH 4/7] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
@ 2019-12-11 16:43 ` Valentin Schneider
2019-12-11 16:44 ` [RFC PATCH 6/7] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:43 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
In some cases, select_task_rq_fair() ends up looking for the highest domain
with both SD_LOAD_BALANCE and one of SD_BALANCE_{WAKE, FORK, EXEC}.
This is pretty much a highest_flag_domain() call, but that latter
can only cope with a single flag.
Make the existing helpers cope with being passed more than one flag. While
at it, rename them to make their newfound powers explicit.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/sched.h | 23 ++++++++++++++++-------
kernel/sched/topology.c | 8 ++++----
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 62efc0443cac..233b3d41e347 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1340,20 +1340,20 @@ extern void sched_ttwu_pending(void);
#define for_each_lower_domain(sd) for (; sd; sd = sd->child)
/**
- * highest_flag_domain - Return highest sched_domain containing flag.
+ * highest_flags_domain - Return highest sched_domain containing flags.
* @cpu: The CPU whose highest level of sched domain is to
* be returned.
- * @flag: The flag to check for the highest sched_domain
+ * @flags: The flags to check for the highest sched_domain
* for the given CPU.
*
- * Returns the highest sched_domain of a CPU which contains the given flag.
+ * Returns the highest sched_domain of a CPU which contains all given flags.
*/
-static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
+static inline struct sched_domain *highest_flags_domain(int cpu, int flags)
{
struct sched_domain *sd, *hsd = NULL;
for_each_domain(cpu, sd) {
- if (!(sd->flags & flag))
+ if (!((sd->flags & flags) == flags))
break;
hsd = sd;
}
@@ -1361,12 +1361,21 @@ static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
return hsd;
}
-static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
+/**
+ * lowest_flags_domain - Return lowest sched_domain containing flags.
+ * @cpu: The CPU whose lowest level of sched domain is to
+ * be returned.
+ * @flags: The flags to check for the lowest sched_domain
+ * for the given CPU.
+ *
+ * Returns the lowest sched_domain of a CPU which contains all given flags.
+ */
+static inline struct sched_domain *lowest_flags_domain(int cpu, int flags)
{
struct sched_domain *sd;
for_each_domain(cpu, sd) {
- if (sd->flags & flag)
+ if ((sd->flags & flags) == flags)
break;
}
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6ec1e595b1d4..f0d2a15fd10b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -631,7 +631,7 @@ static void update_top_cache_domain(int cpu)
int id = cpu;
int size = 1;
- sd = highest_flag_domain(cpu, SD_SHARE_PKG_RESOURCES);
+ sd = highest_flags_domain(cpu, SD_SHARE_PKG_RESOURCES);
if (sd) {
id = cpumask_first(sched_domain_span(sd));
size = cpumask_weight(sched_domain_span(sd));
@@ -643,13 +643,13 @@ static void update_top_cache_domain(int cpu)
per_cpu(sd_llc_id, cpu) = id;
rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
- sd = lowest_flag_domain(cpu, SD_NUMA);
+ sd = lowest_flags_domain(cpu, SD_NUMA);
rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
- sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
+ sd = highest_flags_domain(cpu, SD_ASYM_PACKING);
rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
- sd = lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY);
+ sd = lowest_flags_domain(cpu, SD_ASYM_CPUCAPACITY);
rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 6/7] sched/fair: Split select_task_rq_fair want_affine logic
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
` (4 preceding siblings ...)
2019-12-11 16:43 ` [RFC PATCH 5/7] sched/topology: Make {lowest/highest}_flag_domain() work with > 1 flags Valentin Schneider
@ 2019-12-11 16:44 ` Valentin Schneider
2019-12-11 16:44 ` [RFC PATCH 7/7] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
2020-01-06 8:10 ` [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:44 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
The domain loop within select_task_rq_fair() depends on a few bits of
input, namely the SD flag we're looking for and whether we want_affine.
For !want_affine, the domain loop will walk up the hierarchy to reach the
highest domain with both SD_LOAD_BALANCE and the requested sd_flag
(SD_BALANCE_{WAKE, FORK, EXEC}) set.
In other words, that's a call to highest_flags_domain() for these two
flags. Note that this is a static information wrt a given SD hierarchy,
so we can cache that - but that comes in a later patch to ease reviewing.
For want_affine, we'll walk up the hierarchy to reach the first domain
with SD_LOAD_BALANCE, SD_WAKE_AFFINE, and that spans the tasks's prev_cpu.
We still save a pointer to the last visited domain that had the requested
sd_flag set (and SD_LOAD_BALANCE), which means that if we fail to go
through the affine condition (e.g. no domain had SD_WAKE_AFFINE) we'll use
the same SD as we would have found if we had !want_affine.
Split the domain loop in !want_affine and want_affine paths. As it is,
this leads to two domain walks instead of a single one, but stay tuned for
the next patch.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/fair.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 30e8d357a24f..ea875c7c82d7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6370,29 +6370,36 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
}
rcu_read_lock();
+
+ sd = highest_flags_domain(cpu, sd_flag | SD_LOAD_BALANCE);
+
+ /*
+ * If !want_affine, we just look for the highest domain where
+ * sd_flag is set.
+ */
+ if (!want_affine)
+ goto scan;
+
+ /*
+ * Otherwise we look for the lowest domain with SD_WAKE_AFFINE and that
+ * spans both 'cpu' and 'prev_cpu'.
+ */
for_each_domain(cpu, tmp) {
if (!(tmp->flags & SD_LOAD_BALANCE))
break;
- /*
- * If both 'cpu' and 'prev_cpu' are part of this domain,
- * cpu is a valid SD_WAKE_AFFINE target.
- */
- if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
+ if ((tmp->flags & SD_WAKE_AFFINE) &&
cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
if (cpu != prev_cpu)
new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);
- sd = NULL; /* Prefer wake_affine over balance flags */
+ /* Prefer wake_affine over SD lookup */
+ sd = NULL;
break;
}
-
- if (tmp->flags & sd_flag)
- sd = tmp;
- else if (!want_affine)
- break;
}
+scan:
if (unlikely(sd)) {
/* Slow path */
new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC PATCH 7/7] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
` (5 preceding siblings ...)
2019-12-11 16:44 ` [RFC PATCH 6/7] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
@ 2019-12-11 16:44 ` Valentin Schneider
2020-01-06 8:10 ` [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
7 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2019-12-11 16:44 UTC (permalink / raw)
To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann
Reworking select_task_rq_fair()'s domain walk exposed that !want_affine
wakeups only look for highest sched_domain with the required sd_flag
set. This is something we can cache at sched domain build time to slightly
optimize select_task_rq_fair(). Note that this isn't a "free" optimization:
it costs us 3 pointers per CPU.
Add cached per-CPU pointers for the highest domains with SD_BALANCE_WAKE,
SD_BALANCE_EXEC and SD_BALANCE_FORK. Do note that these are exclusively
used in select_task_rq_fair(), which also requires SD_LOAD_BALANCE, so the
cached pointers are also gated by SD_LOAD_BALANCE (which is somewhat icky).
There is another nasty thing that comes out of this: with the
sched_domain/* sysctl knobs, one could e.g. clear SD_LOAD_BALANCE for some
CPU(s) (AFAIK the only way to clear that flag ATM). Before this patch,
that would disable entering find_idlest_cpu() from said CPU(s). With this
patch, writing to the SD flags sysctl wouldn't lead to any cached SD
pointer update, so we'd keep going through the slow-path.
This isn't a new problem (all cached domain pointers are affected by this),
but it is a change in behaviour. It could make sense to either turn this
interface read-only, or add a callback in the flags update to properly
update our cached domain data.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
kernel/sched/fair.c | 25 +++++++++++++------------
kernel/sched/sched.h | 3 +++
kernel/sched/topology.c | 12 ++++++++++++
3 files changed, 28 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ea875c7c82d7..5b72597d6540 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6344,17 +6344,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
int want_affine = 0;
int sd_flag;
- switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
- case WF_TTWU:
- sd_flag = SD_BALANCE_WAKE;
- break;
- case WF_FORK:
- sd_flag = SD_BALANCE_FORK;
- break;
- default:
- sd_flag = SD_BALANCE_EXEC;
- }
-
if (wake_flags & WF_TTWU) {
record_wakee(p);
@@ -6371,7 +6360,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
rcu_read_lock();
- sd = highest_flags_domain(cpu, sd_flag | SD_LOAD_BALANCE);
+ switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
+ case WF_TTWU:
+ sd_flag = SD_BALANCE_WAKE;
+ sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));
+ break;
+ case WF_FORK:
+ sd_flag = SD_BALANCE_FORK;
+ sd = rcu_dereference(per_cpu(sd_balance_fork, cpu));
+ break;
+ default:
+ sd_flag = SD_BALANCE_EXEC;
+ sd = rcu_dereference(per_cpu(sd_balance_exec, cpu));
+ }
/*
* If !want_affine, we just look for the highest domain where
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 233b3d41e347..ba9d55085c7b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1386,6 +1386,9 @@ DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
DECLARE_PER_CPU(int, sd_llc_size);
DECLARE_PER_CPU(int, sd_llc_id);
DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index f0d2a15fd10b..f76f40b64279 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -619,6 +619,9 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
DEFINE_PER_CPU(int, sd_llc_size);
DEFINE_PER_CPU(int, sd_llc_id);
DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
@@ -643,6 +646,15 @@ static void update_top_cache_domain(int cpu)
per_cpu(sd_llc_id, cpu) = id;
rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
+ sd = highest_flags_domain(cpu, SD_BALANCE_WAKE | SD_LOAD_BALANCE);
+ rcu_assign_pointer(per_cpu(sd_balance_wake, cpu), sd);
+
+ sd = highest_flags_domain(cpu, SD_BALANCE_FORK | SD_LOAD_BALANCE);
+ rcu_assign_pointer(per_cpu(sd_balance_fork, cpu), sd);
+
+ sd = highest_flags_domain(cpu, SD_BALANCE_EXEC | SD_LOAD_BALANCE);
+ rcu_assign_pointer(per_cpu(sd_balance_exec, cpu), sd);
+
sd = lowest_flags_domain(cpu, SD_NUMA);
rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair()
2019-12-11 16:43 [RFC PATCH 0/7] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
` (6 preceding siblings ...)
2019-12-11 16:44 ` [RFC PATCH 7/7] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
@ 2020-01-06 8:10 ` Peter Zijlstra
7 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2020-01-06 8:10 UTC (permalink / raw)
To: Valentin Schneider
Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
Juri Lelli, Steven Rostedt
On Wed, Dec 11, 2019 at 04:43:54PM +0000, Valentin Schneider wrote:
> Discussion points
> =================
>
> The use of SD_LOAD_BALANCE forced me to do a few ugly things.
>
> Patch 5 is only required because the domain walk in select_task_rq_fair()
> is ceiled by the presence of SD_LOAD_BALANCE. Thing is (I also ramble about
> this in the changelog of patch 7) AFAIK this flag is set unconditionally in
> sd_init() and the only way to clear it is via the sched_domain sysctl,
> which is a SCHED_DEBUG interface. I haven't found anything with cpusets
> that would clear it; AFAICT cpusets can end up attaching the NULL domain to
> some CPUs (with sched_load_balance=0) but that's as far as this goes:
I can't find it in a hurry, but cpusets should really be able to build
stuff without LOAD_BALANCe on, otherwise it is broken.
/me digs a little into the code and finds that. no, you're right. What
cpusets does is create non-overlapping domain trees for each parition.
Which avoids the need to clear that flag.
Hmmm.. if we double check all that, I don't suppose there is anything
against simply removing that flag. Less is more etc..
^ permalink raw reply [flat|nested] 10+ messages in thread