linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
@ 2020-04-15 21:05 Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, Juri Lelli,
	Steven Rostedt

I've been staring at select_task_rq_fair() for some time now, and have come
to "hate" the usage of the sd_flag parameter. It is used as both an
indicator of the wakeup type (ttwu/fork/exec) and as a domain walk search
target. CFS is the only class doing this, the other classes just need the
wakeup type but get passed a domain flag instead.

This series gets rid of select_task_rq()'s sd_flag parameter and also tries
to optimize select_task_rq_fair().

This is based on top of v5.7-rc1.

Patches
=======

o Patch 1 is a simple dead parameter cleanup
o Patches 2-4 get rid of SD_LOAD_BALANCE
o Patches 5-6 involve getting rid of the sd_flag parameter for
  select_task_rq(). 
o Patch 7 is an extra cleanup in the select_task_rq_fair() region.
o Patches 8-9 split the want_affine vs !want_affine paths of
  select_task_rq_fair(), which unearths a small optimization. Sort of a
  single patch split in two for the sake of review.

Testing
=======

Testing was done against a slightly older tip/sched/core at:
25ac227a25ac ("sched/fair: Remove wake_cap()")

I got annoyed by the variance in my 500 iteration runs, so I scripted
something to run batches of 5k iterations. It looks pretty stable from one
batch to another. I also stared at some boxplots to convince myself I
wasn't needlessly making things worse - you too can stare at them here [1].

Note: the 'X%' stats are the percentiles, so 50% is the 50th percentile.

Juno r0
-------

2+4 big.LITTLE. SD levels are {MC, DIE}.

Hackbench
~~~~~~~~~

15000 iterations of
  $ hackbench
(lower is better):

|       |   -PATCH |   +PATCH |  DELTA |
|-------+----------+----------+--------|
| mean  | 0.622235 | 0.618834 | -0.55% |
| std   | 0.018121 | 0.017579 | -2.99% |
| min   | 0.571000 | 0.573000 | +0.35% |
| 50%   | 0.620000 | 0.617000 | -0.48% |
| 75%   | 0.633000 | 0.629000 | -0.63% |
| 99%   | 0.674000 | 0.669000 | -0.74% |
| max   | 0.818000 | 0.756000 | -7.58% |

The boxplot shows a single outlier to the very left for both commits, which
are the minimums reported above. Other than that, +PATCH has somewhat lower
outliers on the righthand side: worst cases are a tad better.

Sysbench
~~~~~~~~

15000 iterations of
  $ sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=6 run
(higher is better):

|       |       -PATCH |       +PATCH |   DELTA |
|-------+--------------+--------------+---------|
| mean  | 15318.954000 | 15628.416933 |  +2.02% |
| std   |   235.466202 |   205.162730 | -12.87% |
| min   | 13025.000000 | 13554.000000 |  +4.06% |
| 50%   | 15366.000000 | 15681.000000 |  +2.05% |
| 75%   | 15497.000000 | 15765.000000 |  +1.73% |
| 99%   | 15651.000000 | 15893.000000 |  +1.55% |
| max   | 15716.000000 | 15972.000000 |  +1.63% |

That's overall a tad better.

Dual-socket Xeon E5
-------------------

Each socket is 10 cores w/ SMT2 - 40 CPUs total. SD levels are
{SMT, MC, NUMA}.

Hackbench
~~~~~~~~~

25000 iterations of
  $ hackbench -l 1000
(lower is better):

|       |       -PATCH |       +PATCH |  DELTA |
|-------+--------------+--------------+--------|
| mean  |     0.946312 |     0.944685 | -0.17% |
| std   |     0.006419 |     0.006447 | +0.44% |
| min   |     0.906000 |     0.897000 | -0.99% |
| 50%   |     0.947000 |     0.945000 | -0.21% |
| 75%   |     0.950000 |     0.949000 | -0.11% |
| 99%   |     0.959000 |     0.958000 | -0.10% |
| max   |     0.988000 |     0.967000 | -2.13% |

The boxplot shows that the min improvement is some sort of fluke - it's a
single point standing out on the left. The mean *is* slightly lowered,
which most likely comes from +PATCH having less high-value outliers.

I looked into some statistical tests, but my samples distribution isn't a
normal distribution (which is a requirement for most of them). This
actually can't happen by construction according to [2], since hackbench
outputs the maximum of a set of random of variables. We could instead use
the average of all sender/receiver pairs, or even the invidual time taken
per each pair; that being said, I don't think each value produced by a pair
could be seen as independent variables, given that there'll be more > 1
task per CPU.

Wilcoxon's test [3] gives me a p-value of ~1e-182, so there *is* a
significant difference between the two datasets, but it does not say if the
difference is in the mean, variance, or any other parameter of the
distribution.

Sysbench
~~~~~~~~~

25000 iterations of
  $ sysbench --max-time=5 --max-requests=-1 --test=threads --num-threads=40 run
(higher is better):

|       |       -PATCH |       +PATCH |   DELTA |
|-------+--------------+--------------+---------|
| mean  | 23937.937560 | 24280.668640 |  +1.43% |
| std   |   547.139948 |   484.963639 | -11.36% |
| min   | 21526.000000 | 21917.000000 |  +1.82% |
| 50%   | 24032.000000 | 24374.000000 |  +1.42% |
| 75%   | 24355.000000 | 24638.000000 |  +1.16% |
| 99%   | 24869.010000 | 25084.000000 |  +0.86% |
| max   | 25410.000000 | 25623.000000 |  +0.84% |

As with the Juno, that's overall a tad better.

Takeaway
--------

The TL;DR for those fancy stats seems to be: it's hard to say much about
hackbench, and sysbench likes it a bit. The important thing for me is to
not introduce regressions with my stupid change, and AFAICT it is the case.

Links
=====

[1]: https://htmlpreview.github.io/?https://gist.githubusercontent.com/valschneider/433b3772d1776c52214dd05be2ab2f03/raw/316fbd9f774fa381c60731511c881a3360111563/streamline_v2_bplots.html
[2]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Tippett%E2%80%93Gnedenko_theorem
[3]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.wilcoxon.html#scipy.stats.wilcoxon

Revisions
=========

v2 -> v3
--------
o Rebased on top of v5.7-rc1 (didn't re-run performance tests)
o Collected Reviewed-by (Dietmar)
o Updated changelog of 3/9 (Dietmar)

v1 -> v2
--------
o Removed the 'RFC' tag
o Made the sd_flags syctl read-only
o Removed the SD_LOAD_BALANCE flag
o Cleaned up ugly changes thanks to the above

Valentin Schneider (9):
  sched/fair: find_idlest_group(): Remove unused sd_flag parameter
  sched/debug: Make sd->flags sysctl read-only
  sched: Remove checks against SD_LOAD_BALANCE
  sched/topology: Kill SD_LOAD_BALANCE
  sched: Add WF_TTWU, WF_EXEC wakeup flags
  sched: Kill select_task_rq()'s sd_flag parameter
  sched/fair: Dissociate wakeup decisions from SD flag value
  sched/fair: Split select_task_rq_fair want_affine logic
  sched/topology: Define and use shortcut pointers for wakeup sd_flag
    scan

 include/linux/sched/topology.h | 29 +++++++------
 kernel/sched/core.c            | 10 ++---
 kernel/sched/deadline.c        |  4 +-
 kernel/sched/debug.c           |  2 +-
 kernel/sched/fair.c            | 75 +++++++++++++++++++---------------
 kernel/sched/idle.c            |  2 +-
 kernel/sched/rt.c              |  4 +-
 kernel/sched/sched.h           | 13 ++++--
 kernel/sched/stop_task.c       |  2 +-
 kernel/sched/topology.c        | 43 +++++++++----------
 10 files changed, 98 insertions(+), 86 deletions(-)

--
2.24.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-16  7:40   ` Vincent Guittot
  2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 2/9] sched/debug: Make sd->flags sysctl read-only Valentin Schneider
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

The last use of that parameter was removed by commit

  57abff067a08 ("sched/fair: Rework find_idlest_group()")

Get rid of the parameter.

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..98321d8dde7e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5821,8 +5821,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 }
 
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
-		  int this_cpu, int sd_flag);
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
 
 /*
  * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
@@ -5905,7 +5904,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 			continue;
 		}
 
-		group = find_idlest_group(sd, p, cpu, sd_flag);
+		group = find_idlest_group(sd, p, cpu);
 		if (!group) {
 			sd = sd->child;
 			continue;
@@ -8677,8 +8676,7 @@ static bool update_pick_idlest(struct sched_group *idlest,
  * Assumes p is allowed on at least one CPU in sd.
  */
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
-		  int this_cpu, int sd_flag)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 {
 	struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
 	struct sg_lb_stats local_sgs, tmp_sgs;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/9] sched/debug: Make sd->flags sysctl read-only
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 3/9] sched: Remove checks against SD_LOAD_BALANCE Valentin Schneider
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

Writing to the sysctl of a sched_domain->flags directly updates the value of
the field, and goes nowhere near update_top_cache_domain(). This means that
the cached domain pointers can end up containing stale data (e.g. the
domain pointed to doesn't have the relevant flag set anymore).

Explicit domain walks that check for flags will be affected by
the write, but this won't be in sync with the cached pointers which will
still point to the domains that were cached at the last sched_domain
build.

In other words, writing to this interface is playing a dangerous game. It
could be made to trigger an update of the cached sched_domain pointers when
written to, but this does not seem to be worth the trouble. Make it
read-only.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a562df57a86e..00796b0a2723 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -258,7 +258,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
 	set_table_entry(&table[2], "busy_factor",	  &sd->busy_factor,	    sizeof(int),  0644, proc_dointvec_minmax);
 	set_table_entry(&table[3], "imbalance_pct",	  &sd->imbalance_pct,	    sizeof(int),  0644, proc_dointvec_minmax);
 	set_table_entry(&table[4], "cache_nice_tries",	  &sd->cache_nice_tries,    sizeof(int),  0644, proc_dointvec_minmax);
-	set_table_entry(&table[5], "flags",		  &sd->flags,		    sizeof(int),  0644, proc_dointvec_minmax);
+	set_table_entry(&table[5], "flags",		  &sd->flags,		    sizeof(int),  0444, proc_dointvec_minmax);
 	set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax);
 	set_table_entry(&table[7], "name",		  sd->name,	       CORENAME_MAX_SIZE, 0444, proc_dostring);
 	/* &table[8] is terminator */
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/9] sched: Remove checks against SD_LOAD_BALANCE
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 2/9] sched/debug: Make sd->flags sysctl read-only Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 4/9] sched/topology: Kill SD_LOAD_BALANCE Valentin Schneider
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

The SD_LOAD_BALANCE flag is set unconditionally for all domains in
sd_init(). By making the sched_domain->flags syctl interface read-only, we
have removed the last piece of code that could clear that flag - as such,
it will now be always present. Rather than to keep carrying it along, we
can work towards getting rid of it entirely.

cpusets don't need it because they can make CPUs be attached to the NULL
domain (e.g. cpuset with sched_load_balance=0), or to a partitioned
root_domain, i.e. a sched_domain hierarchy that doesn't span the entire
system (e.g. root cpuset with sched_load_balance=0 and sibling cpusets with
sched_load_balance=1).

isolcpus apply the same "trick": isolated CPUs are explicitly taken out of
the sched_domain rebuild (using housekeeping_cpumask()), so they get the
NULL domain treatment as well.

Remove the checks against SD_LOAD_BALANCE.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c     | 14 ++------------
 kernel/sched/topology.c | 28 +++++++++-------------------
 2 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98321d8dde7e..3d34b4e4060f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6645,9 +6645,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 
 	rcu_read_lock();
 	for_each_domain(cpu, tmp) {
-		if (!(tmp->flags & SD_LOAD_BALANCE))
-			break;
-
 		/*
 		 * If both 'cpu' and 'prev_cpu' are part of this domain,
 		 * cpu is a valid SD_WAKE_AFFINE target.
@@ -9792,9 +9789,8 @@ static int active_load_balance_cpu_stop(void *data)
 	/* Search for an sd spanning us and the target CPU. */
 	rcu_read_lock();
 	for_each_domain(target_cpu, sd) {
-		if ((sd->flags & SD_LOAD_BALANCE) &&
-		    cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
-				break;
+		if (cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
+			break;
 	}
 
 	if (likely(sd)) {
@@ -9883,9 +9879,6 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 		}
 		max_cost += sd->max_newidle_lb_cost;
 
-		if (!(sd->flags & SD_LOAD_BALANCE))
-			continue;
-
 		/*
 		 * Stop the load balance at this level. There is another
 		 * CPU in our sched group which is doing load balancing more
@@ -10474,9 +10467,6 @@ int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 		int continue_balancing = 1;
 		u64 t0, domain_cost;
 
-		if (!(sd->flags & SD_LOAD_BALANCE))
-			continue;
-
 		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
 			update_next_balance(sd, &next_balance);
 			break;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8344757bba6e..a9dc34a0ebc1 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -33,14 +33,6 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 	cpumask_clear(groupmask);
 
 	printk(KERN_DEBUG "%*s domain-%d: ", level, "", level);
-
-	if (!(sd->flags & SD_LOAD_BALANCE)) {
-		printk("does not load-balance\n");
-		if (sd->parent)
-			printk(KERN_ERR "ERROR: !SD_LOAD_BALANCE domain has parent");
-		return -1;
-	}
-
 	printk(KERN_CONT "span=%*pbl level=%s\n",
 	       cpumask_pr_args(sched_domain_span(sd)), sd->name);
 
@@ -151,8 +143,7 @@ static int sd_degenerate(struct sched_domain *sd)
 		return 1;
 
 	/* Following flags need at least 2 groups */
-	if (sd->flags & (SD_LOAD_BALANCE |
-			 SD_BALANCE_NEWIDLE |
+	if (sd->flags & (SD_BALANCE_NEWIDLE |
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUCAPACITY |
@@ -183,15 +174,14 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 
 	/* Flags needing groups don't count if only 1 group in parent */
 	if (parent->groups == parent->groups->next) {
-		pflags &= ~(SD_LOAD_BALANCE |
-				SD_BALANCE_NEWIDLE |
-				SD_BALANCE_FORK |
-				SD_BALANCE_EXEC |
-				SD_ASYM_CPUCAPACITY |
-				SD_SHARE_CPUCAPACITY |
-				SD_SHARE_PKG_RESOURCES |
-				SD_PREFER_SIBLING |
-				SD_SHARE_POWERDOMAIN);
+		pflags &= ~(SD_BALANCE_NEWIDLE |
+			    SD_BALANCE_FORK |
+			    SD_BALANCE_EXEC |
+			    SD_ASYM_CPUCAPACITY |
+			    SD_SHARE_CPUCAPACITY |
+			    SD_SHARE_PKG_RESOURCES |
+			    SD_PREFER_SIBLING |
+			    SD_SHARE_POWERDOMAIN);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 4/9] sched/topology: Kill SD_LOAD_BALANCE
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (2 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 3/9] sched: Remove checks against SD_LOAD_BALANCE Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 5/9] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

That flag is set unconditionally in sd_init(), and no one checks for it
anymore. Remove it.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/topology.h | 29 ++++++++++++++---------------
 kernel/sched/topology.c        |  3 +--
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index af9319e4cfb9..e07380669ee5 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,21 +11,20 @@
  */
 #ifdef CONFIG_SMP
 
-#define SD_LOAD_BALANCE		0x0001	/* Do load balancing on this domain. */
-#define SD_BALANCE_NEWIDLE	0x0002	/* Balance when about to become idle */
-#define SD_BALANCE_EXEC		0x0004	/* Balance on exec */
-#define SD_BALANCE_FORK		0x0008	/* Balance on fork, clone */
-#define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
-#define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY	0x0040  /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY	0x0080	/* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share CPU pkg resources */
-#define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
-#define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING	0x1000	/* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
-#define SD_NUMA			0x4000	/* cross-node balancing */
+#define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
+#define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
+#define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
+#define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
+#define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
+#define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
+#define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
+#define SD_SHARE_POWERDOMAIN	0x0080	/* Domain members share power domain */
+#define SD_SHARE_PKG_RESOURCES	0x0100	/* Domain members share CPU pkg resources */
+#define SD_SERIALIZE		0x0200	/* Only a single load balancing instance */
+#define SD_ASYM_PACKING		0x0400  /* Place busy groups earlier in the domain */
+#define SD_PREFER_SIBLING	0x0800	/* Prefer to place tasks in a sibling domain */
+#define SD_OVERLAP		0x1000	/* sched_domains of this level overlap */
+#define SD_NUMA			0x2000	/* cross-node balancing */
 
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index a9dc34a0ebc1..1d7b446fac7d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1341,8 +1341,7 @@ sd_init(struct sched_domain_topology_level *tl,
 
 		.cache_nice_tries	= 0,
 
-		.flags			= 1*SD_LOAD_BALANCE
-					| 1*SD_BALANCE_NEWIDLE
+		.flags			= 1*SD_BALANCE_NEWIDLE
 					| 1*SD_BALANCE_EXEC
 					| 1*SD_BALANCE_FORK
 					| 0*SD_BALANCE_WAKE
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 5/9] sched: Add WF_TTWU, WF_EXEC wakeup flags
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (3 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 4/9] sched/topology: Kill SD_LOAD_BALANCE Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, Ingo Molnar,
	Juri Lelli, Steven Rostedt

To remove the sd_flag parameter of select_task_rq(), we need another way of
encoding wakeup kinds. There already is a WF_FORK flag, add the missing
ones.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/sched.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index db3a57675ccf..f32c5fa229af 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1684,11 +1684,13 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 }
 
 /*
- * wake flags
+ * Wake flags
  */
 #define WF_SYNC			0x01		/* Waker goes to sleep after wakeup */
-#define WF_FORK			0x02		/* Child wakeup after fork */
-#define WF_MIGRATED		0x4		/* Internal use, task got migrated */
+#define WF_TTWU                 0x02            /* Regular task wakeup */
+#define WF_FORK			0x04		/* Child wakeup after fork */
+#define WF_EXEC			0x08		/* "Fake" wakeup at exec */
+#define WF_MIGRATED		0x10		/* Internal use, task got migrated */
 
 /*
  * To aid in avoiding the subversion of "niceness" due to uneven distribution
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (4 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 5/9] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-16  7:42   ` Vincent Guittot
  2020-04-15 21:05 ` [PATCH v3 7/9] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, Ingo Molnar,
	Juri Lelli, Steven Rostedt

Only select_task_rq_fair() uses that parameter to do an actual domain
search, other classes only care about what kind of wakeup is happening
(fork, exec, or "regular") and thus just translate the flag into a wakeup
type.

WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to
encode the wakeup types we care about. This cleans up the API a
bit, but adds an extra conversion layer within select_task_rq_fair().

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/core.c      | 10 +++++-----
 kernel/sched/deadline.c  |  4 ++--
 kernel/sched/fair.c      | 18 +++++++++++++++---
 kernel/sched/idle.c      |  2 +-
 kernel/sched/rt.c        |  4 ++--
 kernel/sched/sched.h     |  2 +-
 kernel/sched/stop_task.c |  2 +-
 7 files changed, 27 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3a61a3b8eaa9..aea9badd397a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2094,12 +2094,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
  * The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
  */
 static inline
-int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
 {
 	lockdep_assert_held(&p->pi_lock);
 
 	if (p->nr_cpus_allowed > 1)
-		cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
+		cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
 	else
 		cpu = cpumask_any(p->cpus_ptr);
 
@@ -2612,7 +2612,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		atomic_dec(&task_rq(p)->nr_iowait);
 	}
 
-	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
+	cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
 	if (task_cpu(p) != cpu) {
 		wake_flags |= WF_MIGRATED;
 		psi_ttwu_dequeue(p);
@@ -2945,7 +2945,7 @@ void wake_up_new_task(struct task_struct *p)
 	 * as we're not fully set-up yet.
 	 */
 	p->recent_used_cpu = task_cpu(p);
-	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+	__set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
 #endif
 	rq = __task_rq_lock(p, &rf);
 	update_rq_clock(rq);
@@ -3486,7 +3486,7 @@ void sched_exec(void)
 	int dest_cpu;
 
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0);
+	dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
 	if (dest_cpu == smp_processor_id())
 		goto unlock;
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 504d2f51b0d6..0e96b435c51b 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1601,12 +1601,12 @@ static void yield_task_dl(struct rq *rq)
 static int find_later_rq(struct task_struct *task);
 
 static int
-select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
 	struct task_struct *curr;
 	struct rq *rq;
 
-	if (sd_flag != SD_BALANCE_WAKE)
+	if (!(flags & WF_TTWU))
 		goto out;
 
 	rq = cpu_rq(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3d34b4e4060f..b0bf98e6798b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6611,7 +6611,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
- * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
+ * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAKE,
  * SD_BALANCE_FORK, or SD_BALANCE_EXEC.
  *
  * Balances load by selecting the idlest CPU in the idlest group, or under
@@ -6622,13 +6622,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
  * preempt must be disabled.
  */
 static int
-select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 {
+	int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
 	struct sched_domain *tmp, *sd = NULL;
 	int cpu = smp_processor_id();
 	int new_cpu = prev_cpu;
 	int want_affine = 0;
-	int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
+	int sd_flag;
+
+	switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
+	case WF_TTWU:
+		sd_flag = SD_BALANCE_WAKE;
+		break;
+	case WF_FORK:
+		sd_flag = SD_BALANCE_FORK;
+		break;
+	default:
+		sd_flag = SD_BALANCE_EXEC;
+	}
 
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index b743bf38f08f..e9c6a27f0647 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -367,7 +367,7 @@ void cpu_startup_entry(enum cpuhp_state state)
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int cpu, int flags)
 {
 	return task_cpu(p); /* IDLE tasks as never migrated */
 }
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index df11d88c9895..88427ea0231b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1426,14 +1426,14 @@ static void yield_task_rt(struct rq *rq)
 static int find_lowest_rq(struct task_struct *task);
 
 static int
-select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
 	struct task_struct *curr;
 	struct rq *rq;
 	bool test;
 
 	/* For anything but wake ups, just return the task_cpu */
-	if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
+	if (!(flags & (WF_TTWU | WF_FORK)))
 		goto out;
 
 	rq = cpu_rq(cpu);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f32c5fa229af..448f5d630544 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1767,7 +1767,7 @@ struct sched_class {
 
 #ifdef CONFIG_SMP
 	int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
-	int  (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
+	int  (*select_task_rq)(struct task_struct *p, int task_cpu, int flags);
 	void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
 
 	void (*task_woken)(struct rq *this_rq, struct task_struct *task);
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 4c9e9975684f..4f061ddf8470 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -11,7 +11,7 @@
 
 #ifdef CONFIG_SMP
 static int
-select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int cpu, int flags)
 {
 	return task_cpu(p); /* stop tasks as never migrate */
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 7/9] sched/fair: Dissociate wakeup decisions from SD flag value
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (5 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 8/9] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

The CFS wakeup code will only ever go through EAS / its fast path on
"regular" wakeups (i.e. not on forks or execs). These are currently gated
by a check against 'sd_flag', which would be SD_BALANCE_WAKE at wakeup.

However, we now have a flag that explicitly tells us whether a wakeup is a
"regular" one, so hinge those conditions on that flag instead.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b0bf98e6798b..f20e5cd6515c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6642,7 +6642,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 		sd_flag = SD_BALANCE_EXEC;
 	}
 
-	if (sd_flag & SD_BALANCE_WAKE) {
+	if (wake_flags & WF_TTWU) {
 		record_wakee(p);
 
 		if (sched_energy_enabled()) {
@@ -6679,9 +6679,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	if (unlikely(sd)) {
 		/* Slow path */
 		new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
-	} else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
+	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
 		/* Fast path */
-
 		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
 
 		if (want_affine)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 8/9] sched/fair: Split select_task_rq_fair want_affine logic
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (6 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 7/9] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-15 21:05 ` [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
  2020-04-16 10:58 ` [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
  9 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

The domain loop within select_task_rq_fair() depends on a few bits of
input, namely the SD flag we're looking for and whether we want_affine.

For !want_affine, the domain loop will walk up the hierarchy to reach the
highest domain with the requested sd_flag (SD_BALANCE_{WAKE, FORK, EXEC})
set. In other words, that's a call to highest_flag_domain().
Note that this is a static information wrt a given SD hierarchy, so we can
cache that - but that comes in a later patch to ease reviewing.

For want_affine, we'll walk up the hierarchy to reach the first domain with
SD_LOAD_BALANCE, SD_WAKE_AFFINE, and that spans the tasks's prev_cpu.  We
still save a pointer to the last visited domain that had the requested
sd_flag set, which means that if we fail to go through the affine
condition (e.g. no domain had SD_WAKE_AFFINE) we'll use the same SD as we
would have found if we had !want_affine.

Split the domain loop in !want_affine and want_affine paths. As it is,
this leads to two domain walks instead of a single one, but stay tuned for
the next patch.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f20e5cd6515c..6f8cdb99f4a0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6656,26 +6656,33 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	}
 
 	rcu_read_lock();
+
+	sd = highest_flag_domain(cpu, sd_flag);
+
+	/*
+	 * If !want_affine, we just look for the highest domain where
+	 * sd_flag is set.
+	 */
+	if (!want_affine)
+		goto scan;
+
+	/*
+	 * Otherwise we look for the lowest domain with SD_WAKE_AFFINE and that
+	 * spans both 'cpu' and 'prev_cpu'.
+	 */
 	for_each_domain(cpu, tmp) {
-		/*
-		 * If both 'cpu' and 'prev_cpu' are part of this domain,
-		 * cpu is a valid SD_WAKE_AFFINE target.
-		 */
-		if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
+		if ((tmp->flags & SD_WAKE_AFFINE) &&
 		    cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
 			if (cpu != prev_cpu)
 				new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);
 
-			sd = NULL; /* Prefer wake_affine over balance flags */
+			/* Prefer wake_affine over SD lookup */
+			sd = NULL;
 			break;
 		}
-
-		if (tmp->flags & sd_flag)
-			sd = tmp;
-		else if (!want_affine)
-			break;
 	}
 
+scan:
 	if (unlikely(sd)) {
 		/* Slow path */
 		new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (7 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 8/9] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
@ 2020-04-15 21:05 ` Valentin Schneider
  2020-04-16  7:46   ` Vincent Guittot
  2020-04-16 10:58 ` [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
  9 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-15 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, vincent.guittot, dietmar.eggemann

Reworking select_task_rq_fair()'s domain walk exposed that !want_affine
wakeups only look for highest sched_domain with the required sd_flag
set. This is something we can cache at sched domain build time to slightly
optimize select_task_rq_fair(). Note that this isn't a "free" optimization:
it costs us 3 pointers per CPU.

Add cached per-CPU pointers for the highest domains with SD_BALANCE_WAKE,
SD_BALANCE_EXEC and SD_BALANCE_FORK. Use them in select_task_rq_fair().

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/fair.c     | 25 +++++++++++++------------
 kernel/sched/sched.h    |  3 +++
 kernel/sched/topology.c | 12 ++++++++++++
 3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6f8cdb99f4a0..db4fb29a88d9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6631,17 +6631,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	int want_affine = 0;
 	int sd_flag;
 
-	switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
-	case WF_TTWU:
-		sd_flag = SD_BALANCE_WAKE;
-		break;
-	case WF_FORK:
-		sd_flag = SD_BALANCE_FORK;
-		break;
-	default:
-		sd_flag = SD_BALANCE_EXEC;
-	}
-
 	if (wake_flags & WF_TTWU) {
 		record_wakee(p);
 
@@ -6657,7 +6646,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 
 	rcu_read_lock();
 
-	sd = highest_flag_domain(cpu, sd_flag);
+	switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
+	case WF_TTWU:
+		sd_flag = SD_BALANCE_WAKE;
+		sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));
+		break;
+	case WF_FORK:
+		sd_flag = SD_BALANCE_FORK;
+		sd = rcu_dereference(per_cpu(sd_balance_fork, cpu));
+		break;
+	default:
+		sd_flag = SD_BALANCE_EXEC;
+		sd = rcu_dereference(per_cpu(sd_balance_exec, cpu));
+	}
 
 	/*
 	 * If !want_affine, we just look for the highest domain where
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 448f5d630544..4b014103affb 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1423,6 +1423,9 @@ DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
 DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
 DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 extern struct static_key_false sched_asym_cpucapacity;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 1d7b446fac7d..66763c539bbd 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -618,6 +618,9 @@ DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
 DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
 DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
@@ -644,6 +647,15 @@ static void update_top_cache_domain(int cpu)
 	sd = lowest_flag_domain(cpu, SD_NUMA);
 	rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
 
+	sd = highest_flag_domain(cpu, SD_BALANCE_WAKE);
+	rcu_assign_pointer(per_cpu(sd_balance_wake, cpu), sd);
+
+	sd = highest_flag_domain(cpu, SD_BALANCE_FORK);
+	rcu_assign_pointer(per_cpu(sd_balance_fork, cpu), sd);
+
+	sd = highest_flag_domain(cpu, SD_BALANCE_EXEC);
+	rcu_assign_pointer(per_cpu(sd_balance_exec, cpu), sd);
+
 	sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
 	rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter
  2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
@ 2020-04-16  7:40   ` Vincent Guittot
  2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16  7:40 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann

On Wed, 15 Apr 2020 at 23:05, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> The last use of that parameter was removed by commit
>
>   57abff067a08 ("sched/fair: Rework find_idlest_group()")
>
> Get rid of the parameter.
>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>

This patch is not directly related to others

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
>  kernel/sched/fair.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..98321d8dde7e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5821,8 +5821,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  }
>
>  static struct sched_group *
> -find_idlest_group(struct sched_domain *sd, struct task_struct *p,
> -                 int this_cpu, int sd_flag);
> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
>
>  /*
>   * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
> @@ -5905,7 +5904,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
>                         continue;
>                 }
>
> -               group = find_idlest_group(sd, p, cpu, sd_flag);
> +               group = find_idlest_group(sd, p, cpu);
>                 if (!group) {
>                         sd = sd->child;
>                         continue;
> @@ -8677,8 +8676,7 @@ static bool update_pick_idlest(struct sched_group *idlest,
>   * Assumes p is allowed on at least one CPU in sd.
>   */
>  static struct sched_group *
> -find_idlest_group(struct sched_domain *sd, struct task_struct *p,
> -                 int this_cpu, int sd_flag)
> +find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
>  {
>         struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
>         struct sg_lb_stats local_sgs, tmp_sgs;
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter
  2020-04-15 21:05 ` [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
@ 2020-04-16  7:42   ` Vincent Guittot
  2020-04-16 10:24     ` Valentin Schneider
  2020-04-16 10:47     ` Peter Zijlstra
  0 siblings, 2 replies; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16  7:42 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Steven Rostedt

On Wed, 15 Apr 2020 at 23:05, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> Only select_task_rq_fair() uses that parameter to do an actual domain
> search, other classes only care about what kind of wakeup is happening
> (fork, exec, or "regular") and thus just translate the flag into a wakeup
> type.
>
> WF_TTWU and WF_EXEC have just been added, use these along with WF_FORK to
> encode the wakeup types we care about. This cleans up the API a
> bit, but adds an extra conversion layer within select_task_rq_fair().
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/core.c      | 10 +++++-----
>  kernel/sched/deadline.c  |  4 ++--
>  kernel/sched/fair.c      | 18 +++++++++++++++---
>  kernel/sched/idle.c      |  2 +-
>  kernel/sched/rt.c        |  4 ++--
>  kernel/sched/sched.h     |  2 +-
>  kernel/sched/stop_task.c |  2 +-
>  7 files changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 3a61a3b8eaa9..aea9badd397a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2094,12 +2094,12 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>   * The caller (fork, wakeup) owns p->pi_lock, ->cpus_ptr is stable.
>   */
>  static inline
> -int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
> +int select_task_rq(struct task_struct *p, int cpu, int wake_flags)
>  {
>         lockdep_assert_held(&p->pi_lock);
>
>         if (p->nr_cpus_allowed > 1)
> -               cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
> +               cpu = p->sched_class->select_task_rq(p, cpu, wake_flags);
>         else
>                 cpu = cpumask_any(p->cpus_ptr);
>
> @@ -2612,7 +2612,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>                 atomic_dec(&task_rq(p)->nr_iowait);
>         }
>
> -       cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
> +       cpu = select_task_rq(p, p->wake_cpu, wake_flags | WF_TTWU);
>         if (task_cpu(p) != cpu) {
>                 wake_flags |= WF_MIGRATED;
>                 psi_ttwu_dequeue(p);
> @@ -2945,7 +2945,7 @@ void wake_up_new_task(struct task_struct *p)
>          * as we're not fully set-up yet.
>          */
>         p->recent_used_cpu = task_cpu(p);
> -       __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
> +       __set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
>  #endif
>         rq = __task_rq_lock(p, &rf);
>         update_rq_clock(rq);
> @@ -3486,7 +3486,7 @@ void sched_exec(void)
>         int dest_cpu;
>
>         raw_spin_lock_irqsave(&p->pi_lock, flags);
> -       dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0);
> +       dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), WF_EXEC);
>         if (dest_cpu == smp_processor_id())
>                 goto unlock;
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 504d2f51b0d6..0e96b435c51b 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1601,12 +1601,12 @@ static void yield_task_dl(struct rq *rq)
>  static int find_later_rq(struct task_struct *task);
>
>  static int
> -select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
> +select_task_rq_dl(struct task_struct *p, int cpu, int flags)
>  {
>         struct task_struct *curr;
>         struct rq *rq;
>
> -       if (sd_flag != SD_BALANCE_WAKE)
> +       if (!(flags & WF_TTWU))
>                 goto out;
>
>         rq = cpu_rq(cpu);
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3d34b4e4060f..b0bf98e6798b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6611,7 +6611,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>
>  /*
>   * select_task_rq_fair: Select target runqueue for the waking task in domains
> - * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> + * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAKE,
>   * SD_BALANCE_FORK, or SD_BALANCE_EXEC.
>   *
>   * Balances load by selecting the idlest CPU in the idlest group, or under
> @@ -6622,13 +6622,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>   * preempt must be disabled.
>   */
>  static int
> -select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
> +select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>  {
> +       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
>         struct sched_domain *tmp, *sd = NULL;
>         int cpu = smp_processor_id();
>         int new_cpu = prev_cpu;
>         int want_affine = 0;
> -       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
> +       int sd_flag;
> +
> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {

You remove a function parameter, which was directly set with the right
flag, but then you add a switch case to recreate this sd_flag
internally. Not sure we can say that it's real benefit

> +       case WF_TTWU:
> +               sd_flag = SD_BALANCE_WAKE;
> +               break;
> +       case WF_FORK:
> +               sd_flag = SD_BALANCE_FORK;
> +               break;
> +       default:
> +               sd_flag = SD_BALANCE_EXEC;
> +       }
>
>         if (sd_flag & SD_BALANCE_WAKE) {
>                 record_wakee(p);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index b743bf38f08f..e9c6a27f0647 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -367,7 +367,7 @@ void cpu_startup_entry(enum cpuhp_state state)
>
>  #ifdef CONFIG_SMP
>  static int
> -select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags)
> +select_task_rq_idle(struct task_struct *p, int cpu, int flags)
>  {
>         return task_cpu(p); /* IDLE tasks as never migrated */
>  }
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index df11d88c9895..88427ea0231b 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1426,14 +1426,14 @@ static void yield_task_rt(struct rq *rq)
>  static int find_lowest_rq(struct task_struct *task);
>
>  static int
> -select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> +select_task_rq_rt(struct task_struct *p, int cpu, int flags)
>  {
>         struct task_struct *curr;
>         struct rq *rq;
>         bool test;
>
>         /* For anything but wake ups, just return the task_cpu */
> -       if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> +       if (!(flags & (WF_TTWU | WF_FORK)))
>                 goto out;
>
>         rq = cpu_rq(cpu);
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index f32c5fa229af..448f5d630544 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1767,7 +1767,7 @@ struct sched_class {
>
>  #ifdef CONFIG_SMP
>         int (*balance)(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
> -       int  (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
> +       int  (*select_task_rq)(struct task_struct *p, int task_cpu, int flags);
>         void (*migrate_task_rq)(struct task_struct *p, int new_cpu);
>
>         void (*task_woken)(struct rq *this_rq, struct task_struct *task);
> diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
> index 4c9e9975684f..4f061ddf8470 100644
> --- a/kernel/sched/stop_task.c
> +++ b/kernel/sched/stop_task.c
> @@ -11,7 +11,7 @@
>
>  #ifdef CONFIG_SMP
>  static int
> -select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags)
> +select_task_rq_stop(struct task_struct *p, int cpu, int flags)
>  {
>         return task_cpu(p); /* stop tasks as never migrate */
>  }
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-15 21:05 ` [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
@ 2020-04-16  7:46   ` Vincent Guittot
  2020-04-16 10:24     ` Valentin Schneider
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16  7:46 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann

On Wed, 15 Apr 2020 at 23:05, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> Reworking select_task_rq_fair()'s domain walk exposed that !want_affine
> wakeups only look for highest sched_domain with the required sd_flag
> set. This is something we can cache at sched domain build time to slightly
> optimize select_task_rq_fair(). Note that this isn't a "free" optimization:
> it costs us 3 pointers per CPU.
>
> Add cached per-CPU pointers for the highest domains with SD_BALANCE_WAKE,
> SD_BALANCE_EXEC and SD_BALANCE_FORK. Use them in select_task_rq_fair().
>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/fair.c     | 25 +++++++++++++------------
>  kernel/sched/sched.h    |  3 +++
>  kernel/sched/topology.c | 12 ++++++++++++
>  3 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6f8cdb99f4a0..db4fb29a88d9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6631,17 +6631,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>         int want_affine = 0;
>         int sd_flag;
>
> -       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
> -       case WF_TTWU:
> -               sd_flag = SD_BALANCE_WAKE;
> -               break;
> -       case WF_FORK:
> -               sd_flag = SD_BALANCE_FORK;
> -               break;
> -       default:
> -               sd_flag = SD_BALANCE_EXEC;
> -       }
> -
>         if (wake_flags & WF_TTWU) {
>                 record_wakee(p);
>
> @@ -6657,7 +6646,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>
>         rcu_read_lock();
>
> -       sd = highest_flag_domain(cpu, sd_flag);
> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
> +       case WF_TTWU:
> +               sd_flag = SD_BALANCE_WAKE;
> +               sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));

It's worth having a direct pointer for the fast path which we always
try to keep short but the other paths are already slow and will not
get any benefit of this per cpu pointer.
We should keep the loop for the slow paths

> +               break;
> +       case WF_FORK:
> +               sd_flag = SD_BALANCE_FORK;
> +               sd = rcu_dereference(per_cpu(sd_balance_fork, cpu));
> +               break;
> +       default:
> +               sd_flag = SD_BALANCE_EXEC;
> +               sd = rcu_dereference(per_cpu(sd_balance_exec, cpu));
> +       }
>
>         /*
>          * If !want_affine, we just look for the highest domain where
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 448f5d630544..4b014103affb 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1423,6 +1423,9 @@ DECLARE_PER_CPU(int, sd_llc_size);
>  DECLARE_PER_CPU(int, sd_llc_id);
>  DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
>  DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
> +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
> +DECLARE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
>  DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
>  DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>  extern struct static_key_false sched_asym_cpucapacity;
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 1d7b446fac7d..66763c539bbd 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -618,6 +618,9 @@ DEFINE_PER_CPU(int, sd_llc_size);
>  DEFINE_PER_CPU(int, sd_llc_id);
>  DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
>  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_wake);
> +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_fork);
> +DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_exec);
>  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
>  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
>  DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
> @@ -644,6 +647,15 @@ static void update_top_cache_domain(int cpu)
>         sd = lowest_flag_domain(cpu, SD_NUMA);
>         rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
>
> +       sd = highest_flag_domain(cpu, SD_BALANCE_WAKE);
> +       rcu_assign_pointer(per_cpu(sd_balance_wake, cpu), sd);
> +
> +       sd = highest_flag_domain(cpu, SD_BALANCE_FORK);
> +       rcu_assign_pointer(per_cpu(sd_balance_fork, cpu), sd);
> +
> +       sd = highest_flag_domain(cpu, SD_BALANCE_EXEC);
> +       rcu_assign_pointer(per_cpu(sd_balance_exec, cpu), sd);
> +
>         sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
>         rcu_assign_pointer(per_cpu(sd_asym_packing, cpu), sd);
>
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter
  2020-04-16  7:42   ` Vincent Guittot
@ 2020-04-16 10:24     ` Valentin Schneider
  2020-04-16 10:47     ` Peter Zijlstra
  1 sibling, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 10:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Steven Rostedt


Hi Vincent,

Thanks for taking a look at this.

On 16/04/20 08:42, Vincent Guittot wrote:
>> @@ -6622,13 +6622,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>>   * preempt must be disabled.
>>   */
>>  static int
>> -select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
>> +select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>  {
>> +       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
>>         struct sched_domain *tmp, *sd = NULL;
>>         int cpu = smp_processor_id();
>>         int new_cpu = prev_cpu;
>>         int want_affine = 0;
>> -       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
>> +       int sd_flag;
>> +
>> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
>
> You remove a function parameter, which was directly set with the right
> flag, but then you add a switch case to recreate this sd_flag
> internally. Not sure we can say that it's real benefit
>

It is indeed the contentious point of this series (IMO). Still, it has a
few things going for it:

1) only CFS is helped by that extra parameter

2) with patch 9, I need a control flow to pick up the right cached pointer
   anyway; the alternative would be to do something like the unsavoury:

DEFINE_PER_CPU(struct sched_domain __rcu *, sd_balance_flags[3]);
...
update_top_cache_domain()
{
        per_cpu(sd_balance_flags[0], cpu) = highest_flag_domain(cpu, SD_BALANCE_EXEC);
        per_cpu(sd_balance_flags[1], cpu) = highest_flag_domain(cpu, SD_BALANCE_FORK);
        per_cpu(sd_balance_flags[2], cpu) = highest_flag_domain(cpu, SD_BALANCE_WAKE);
}
...
select_task_rq_fair()
{
        // Whatever sort of shady constant time conversion you can think of
        int index = !!(wake_flags & WF_FORK) + 2 * !!(wake_flags & WF_TTWU)
        sd_flag = SD_BALANCE_EXEC << index;
        sd = per_cpu(sd_balance_flags[index], cpu);
}

>> +       case WF_TTWU:
>> +               sd_flag = SD_BALANCE_WAKE;
>> +               break;
>> +       case WF_FORK:
>> +               sd_flag = SD_BALANCE_FORK;
>> +               break;
>> +       default:
>> +               sd_flag = SD_BALANCE_EXEC;
>> +       }
>>
>>         if (sd_flag & SD_BALANCE_WAKE) {
>>                 record_wakee(p);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16  7:46   ` Vincent Guittot
@ 2020-04-16 10:24     ` Valentin Schneider
  2020-04-16 13:04       ` Dietmar Eggemann
  0 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 10:24 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Dietmar Eggemann


On 16/04/20 08:46, Vincent Guittot wrote:
>> @@ -6657,7 +6646,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>
>>         rcu_read_lock();
>>
>> -       sd = highest_flag_domain(cpu, sd_flag);
>> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
>> +       case WF_TTWU:
>> +               sd_flag = SD_BALANCE_WAKE;
>> +               sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));
>
> It's worth having a direct pointer for the fast path which we always
> try to keep short but the other paths are already slow and will not
> get any benefit of this per cpu pointer.
> We should keep the loop for the slow paths
>

Which fast/slow paths are you referring to here? want_affine vs
!want_affine? If so, do you then mean that we should do the switch case
only when !want_affine, and otherwise look for the domain via the
for_each_domain() loop?

>> +               break;
>> +       case WF_FORK:
>> +               sd_flag = SD_BALANCE_FORK;
>> +               sd = rcu_dereference(per_cpu(sd_balance_fork, cpu));
>> +               break;
>> +       default:
>> +               sd_flag = SD_BALANCE_EXEC;
>> +               sd = rcu_dereference(per_cpu(sd_balance_exec, cpu));
>> +       }
>>
>>         /*
>>          * If !want_affine, we just look for the highest domain where

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter
  2020-04-16  7:42   ` Vincent Guittot
  2020-04-16 10:24     ` Valentin Schneider
@ 2020-04-16 10:47     ` Peter Zijlstra
  2020-04-16 11:43       ` Valentin Schneider
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-16 10:47 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Valentin Schneider, linux-kernel, Ingo Molnar, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Steven Rostedt

On Thu, Apr 16, 2020 at 09:42:36AM +0200, Vincent Guittot wrote:
> On Wed, 15 Apr 2020 at 23:05, Valentin Schneider
> > @@ -6622,13 +6622,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >   * preempt must be disabled.
> >   */
> >  static int
> > +select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >  {
> > +       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
> >         struct sched_domain *tmp, *sd = NULL;
> >         int cpu = smp_processor_id();
> >         int new_cpu = prev_cpu;
> >         int want_affine = 0;
> > -       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
> > +       int sd_flag;
> > +
> > +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
> 
> You remove a function parameter, which was directly set with the right
> flag, but then you add a switch case to recreate this sd_flag
> internally. Not sure we can say that it's real benefit
> 
> > +       case WF_TTWU:
> > +               sd_flag = SD_BALANCE_WAKE;
> > +               break;
> > +       case WF_FORK:
> > +               sd_flag = SD_BALANCE_FORK;
> > +               break;
> > +       default:
> > +               sd_flag = SD_BALANCE_EXEC;
> > +       }

Agreed, that's a bit yuck, how about something like so instead:


--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,10 +11,12 @@
  */
 #ifdef CONFIG_SMP
 
+/* First nibble of SD_flag is shared with WF_flag */
 #define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
 #define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
 #define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
 #define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
+
 #define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
 #define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
 #define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6635,16 +6635,8 @@ select_task_rq_fair(struct task_struct *
 	int want_affine = 0;
 	int sd_flag;
 
-	switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
-	case WF_TTWU:
-		sd_flag = SD_BALANCE_WAKE;
-		break;
-	case WF_FORK:
-		sd_flag = SD_BALANCE_FORK;
-		break;
-	default:
-		sd_flag = SD_BALANCE_EXEC;
-	}
+	/* SD_flags and WF_flags share the first nibble */
+	sd_flag = wake_flags & 0xf;
 
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1685,11 +1685,12 @@ static inline int task_on_rq_migrating(s
 /*
  * Wake flags
  */
-#define WF_SYNC			0x01		/* Waker goes to sleep after wakeup */
-#define WF_TTWU                 0x02            /* Regular task wakeup */
-#define WF_FORK			0x04		/* Child wakeup after fork */
-#define WF_EXEC			0x08		/* "Fake" wakeup at exec */
-#define WF_MIGRATED		0x10		/* Internal use, task got migrated */
+#define WF_EXEC			0x02	/* SD_BALANCE_EXEC */
+#define WF_FORK			0x04	/* SD_BALANCE_FORK */
+#define WF_TTWU			0x08	/* SD_BALANCE_WAKE */
+
+#define WF_SYNC			0x10	/* Waker goes to sleep after wakeup */
+#define WF_MIGRATED		0x20	/* Internal use, task got migrated */
 
 /*
  * To aid in avoiding the subversion of "niceness" due to uneven distribution

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
  2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
                   ` (8 preceding siblings ...)
  2020-04-15 21:05 ` [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
@ 2020-04-16 10:58 ` Peter Zijlstra
  2020-04-16 11:00   ` Peter Zijlstra
  9 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-16 10:58 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
	Juri Lelli, Steven Rostedt

On Wed, Apr 15, 2020 at 10:05:03PM +0100, Valentin Schneider wrote:
> Valentin Schneider (9):
>   sched/fair: find_idlest_group(): Remove unused sd_flag parameter
>   sched/debug: Make sd->flags sysctl read-only
>   sched: Remove checks against SD_LOAD_BALANCE
>   sched/topology: Kill SD_LOAD_BALANCE
>   sched: Add WF_TTWU, WF_EXEC wakeup flags

How about I queue two first 5, and you rework these last few?

>   sched: Kill select_task_rq()'s sd_flag parameter
>   sched/fair: Dissociate wakeup decisions from SD flag value
>   sched/fair: Split select_task_rq_fair want_affine logic
>   sched/topology: Define and use shortcut pointers for wakeup sd_flag scan



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
  2020-04-16 10:58 ` [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
@ 2020-04-16 11:00   ` Peter Zijlstra
  2020-04-16 11:02     ` Valentin Schneider
  2020-04-16 13:00     ` Vincent Guittot
  0 siblings, 2 replies; 30+ messages in thread
From: Peter Zijlstra @ 2020-04-16 11:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
	Juri Lelli, Steven Rostedt

On Thu, Apr 16, 2020 at 12:58:28PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 15, 2020 at 10:05:03PM +0100, Valentin Schneider wrote:
> > Valentin Schneider (9):
> >   sched/fair: find_idlest_group(): Remove unused sd_flag parameter
> >   sched/debug: Make sd->flags sysctl read-only
> >   sched: Remove checks against SD_LOAD_BALANCE
> >   sched/topology: Kill SD_LOAD_BALANCE
> >   sched: Add WF_TTWU, WF_EXEC wakeup flags
> 
> How about I queue two first 5, and you rework these last few?

Argh, 4 ofcourse, that 5th patch doesn't make much sense if we have to
rework those flags like I proposed.

> >   sched: Kill select_task_rq()'s sd_flag parameter
> >   sched/fair: Dissociate wakeup decisions from SD flag value
> >   sched/fair: Split select_task_rq_fair want_affine logic
> >   sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
  2020-04-16 11:00   ` Peter Zijlstra
@ 2020-04-16 11:02     ` Valentin Schneider
  2020-04-16 13:00     ` Vincent Guittot
  1 sibling, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 11:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
	Juri Lelli, Steven Rostedt


On 16/04/20 12:00, Peter Zijlstra wrote:
> On Thu, Apr 16, 2020 at 12:58:28PM +0200, Peter Zijlstra wrote:
>> On Wed, Apr 15, 2020 at 10:05:03PM +0100, Valentin Schneider wrote:
>> > Valentin Schneider (9):
>> >   sched/fair: find_idlest_group(): Remove unused sd_flag parameter
>> >   sched/debug: Make sd->flags sysctl read-only
>> >   sched: Remove checks against SD_LOAD_BALANCE
>> >   sched/topology: Kill SD_LOAD_BALANCE
>> >   sched: Add WF_TTWU, WF_EXEC wakeup flags
>>
>> How about I queue two first 5, and you rework these last few?
>
> Argh, 4 ofcourse, that 5th patch doesn't make much sense if we have to
> rework those flags like I proposed.
>

Was about to comment on that :) Sounds good to me!

>> >   sched: Kill select_task_rq()'s sd_flag parameter
>> >   sched/fair: Dissociate wakeup decisions from SD flag value
>> >   sched/fair: Split select_task_rq_fair want_affine logic
>> >   sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter
  2020-04-16 10:47     ` Peter Zijlstra
@ 2020-04-16 11:43       ` Valentin Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 11:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Steven Rostedt


On 16/04/20 11:47, Peter Zijlstra wrote:
> On Thu, Apr 16, 2020 at 09:42:36AM +0200, Vincent Guittot wrote:
>> On Wed, 15 Apr 2020 at 23:05, Valentin Schneider
>> > @@ -6622,13 +6622,25 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>> >   * preempt must be disabled.
>> >   */
>> >  static int
>> > +select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> >  {
>> > +       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
>> >         struct sched_domain *tmp, *sd = NULL;
>> >         int cpu = smp_processor_id();
>> >         int new_cpu = prev_cpu;
>> >         int want_affine = 0;
>> > -       int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
>> > +       int sd_flag;
>> > +
>> > +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
>>
>> You remove a function parameter, which was directly set with the right
>> flag, but then you add a switch case to recreate this sd_flag
>> internally. Not sure we can say that it's real benefit
>>
>> > +       case WF_TTWU:
>> > +               sd_flag = SD_BALANCE_WAKE;
>> > +               break;
>> > +       case WF_FORK:
>> > +               sd_flag = SD_BALANCE_FORK;
>> > +               break;
>> > +       default:
>> > +               sd_flag = SD_BALANCE_EXEC;
>> > +       }
>
> Agreed, that's a bit yuck, how about something like so instead:
>

Aligning the SD_* and WF_* flags sure makes it simpler, I just wasn't
daring enough to do that. I suppose we'll want some BUILD_BUG_ON() checks
just for good measure.

Also, it doesn't directly solve the switch case of 9/9, but I may get out
of it with some hackery such as what I suggested in my reply to Vincent.

>
> --- a/include/linux/sched/topology.h
> +++ b/include/linux/sched/topology.h
> @@ -11,10 +11,12 @@
>   */
>  #ifdef CONFIG_SMP
>
> +/* First nibble of SD_flag is shared with WF_flag */
>  #define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
>  #define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
>  #define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
>  #define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
> +
>  #define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
>  #define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
>  #define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6635,16 +6635,8 @@ select_task_rq_fair(struct task_struct *
>       int want_affine = 0;
>       int sd_flag;
>
> -	switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
> -	case WF_TTWU:
> -		sd_flag = SD_BALANCE_WAKE;
> -		break;
> -	case WF_FORK:
> -		sd_flag = SD_BALANCE_FORK;
> -		break;
> -	default:
> -		sd_flag = SD_BALANCE_EXEC;
> -	}
> +	/* SD_flags and WF_flags share the first nibble */
> +	sd_flag = wake_flags & 0xf;
>
>       if (sd_flag & SD_BALANCE_WAKE) {
>               record_wakee(p);
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1685,11 +1685,12 @@ static inline int task_on_rq_migrating(s
>  /*
>   * Wake flags
>   */
> -#define WF_SYNC			0x01		/* Waker goes to sleep after wakeup */
> -#define WF_TTWU                 0x02            /* Regular task wakeup */
> -#define WF_FORK			0x04		/* Child wakeup after fork */
> -#define WF_EXEC			0x08		/* "Fake" wakeup at exec */
> -#define WF_MIGRATED		0x10		/* Internal use, task got migrated */
> +#define WF_EXEC			0x02	/* SD_BALANCE_EXEC */
> +#define WF_FORK			0x04	/* SD_BALANCE_FORK */
> +#define WF_TTWU			0x08	/* SD_BALANCE_WAKE */
> +
> +#define WF_SYNC			0x10	/* Waker goes to sleep after wakeup */
> +#define WF_MIGRATED		0x20	/* Internal use, task got migrated */
>
>  /*
>   * To aid in avoiding the subversion of "niceness" due to uneven distribution

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair()
  2020-04-16 11:00   ` Peter Zijlstra
  2020-04-16 11:02     ` Valentin Schneider
@ 2020-04-16 13:00     ` Vincent Guittot
  1 sibling, 0 replies; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16 13:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Valentin Schneider, linux-kernel, Ingo Molnar, Dietmar Eggemann,
	Juri Lelli, Steven Rostedt

On Thu, 16 Apr 2020 at 13:01, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Apr 16, 2020 at 12:58:28PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 15, 2020 at 10:05:03PM +0100, Valentin Schneider wrote:
> > > Valentin Schneider (9):
> > >   sched/fair: find_idlest_group(): Remove unused sd_flag parameter
> > >   sched/debug: Make sd->flags sysctl read-only
> > >   sched: Remove checks against SD_LOAD_BALANCE
> > >   sched/topology: Kill SD_LOAD_BALANCE
> > >   sched: Add WF_TTWU, WF_EXEC wakeup flags
> >
> > How about I queue two first 5, and you rework these last few?
>
> Argh, 4 ofcourse, that 5th patch doesn't make much sense if we have to
> rework those flags like I proposed.

Looks good to me too

>
> > >   sched: Kill select_task_rq()'s sd_flag parameter
> > >   sched/fair: Dissociate wakeup decisions from SD flag value
> > >   sched/fair: Split select_task_rq_fair want_affine logic
> > >   sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
> >
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16 10:24     ` Valentin Schneider
@ 2020-04-16 13:04       ` Dietmar Eggemann
  2020-04-16 13:36         ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2020-04-16 13:04 UTC (permalink / raw)
  To: Valentin Schneider, Vincent Guittot
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra

On 16.04.20 12:24, Valentin Schneider wrote:
> 
> On 16/04/20 08:46, Vincent Guittot wrote:
>>> @@ -6657,7 +6646,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>>
>>>         rcu_read_lock();
>>>
>>> -       sd = highest_flag_domain(cpu, sd_flag);
>>> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
>>> +       case WF_TTWU:
>>> +               sd_flag = SD_BALANCE_WAKE;
>>> +               sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));
>>
>> It's worth having a direct pointer for the fast path which we always
>> try to keep short but the other paths are already slow and will not
>> get any benefit of this per cpu pointer.
>> We should keep the loop for the slow paths
>>
> 
> Which fast/slow paths are you referring to here? want_affine vs
> !want_affine? If so, do you then mean that we should do the switch case
> only when !want_affine, and otherwise look for the domain via the
> for_each_domain() loop?

Coming back to the v2 discussion on this patch

https://lore.kernel.org/r/20200311181601.18314-10-valentin.schneider@arm.com

SD_BALANCE_WAKE is not used in mainline anymore, so wakeups are always
fast today.

I.e. you wouldn't need a per_cpu(sd_balance_wake, cpu) since it's always
NULL.

I.e. want_affine logic and the 'for_each_domain(cpu, tmp)' isn't needed
anymore.

This will dramatically simplify the code in select_task_rq_fair().

But I guess Vincent wants to keep the functionality so we're able to
enable SD_BALANCE_WAKE on certain sd's?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16 13:04       ` Dietmar Eggemann
@ 2020-04-16 13:36         ` Vincent Guittot
  2020-04-16 15:27           ` Valentin Schneider
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16 13:36 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Valentin Schneider, linux-kernel, Ingo Molnar, Peter Zijlstra

On Thu, 16 Apr 2020 at 15:04, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 16.04.20 12:24, Valentin Schneider wrote:
> >
> > On 16/04/20 08:46, Vincent Guittot wrote:
> >>> @@ -6657,7 +6646,19 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> >>>
> >>>         rcu_read_lock();
> >>>
> >>> -       sd = highest_flag_domain(cpu, sd_flag);
> >>> +       switch (wake_flags & (WF_TTWU | WF_FORK | WF_EXEC)) {
> >>> +       case WF_TTWU:
> >>> +               sd_flag = SD_BALANCE_WAKE;
> >>> +               sd = rcu_dereference(per_cpu(sd_balance_wake, cpu));
> >>
> >> It's worth having a direct pointer for the fast path which we always
> >> try to keep short but the other paths are already slow and will not
> >> get any benefit of this per cpu pointer.
> >> We should keep the loop for the slow paths
> >>
> >
> > Which fast/slow paths are you referring to here? want_affine vs
> > !want_affine? If so, do you then mean that we should do the switch case
> > only when !want_affine, and otherwise look for the domain via the
> > for_each_domain() loop?
>
> Coming back to the v2 discussion on this patch
>
> https://lore.kernel.org/r/20200311181601.18314-10-valentin.schneider@arm.com
>
> SD_BALANCE_WAKE is not used in mainline anymore, so wakeups are always
> fast today.
>
> I.e. you wouldn't need a per_cpu(sd_balance_wake, cpu) since it's always
> NULL.
>
> I.e. want_affine logic and the 'for_each_domain(cpu, tmp)' isn't needed
> anymore.
>
> This will dramatically simplify the code in select_task_rq_fair().
>
> But I guess Vincent wants to keep the functionality so we're able to
> enable SD_BALANCE_WAKE on certain sd's?

I looked too quickly what was done by this patch. I thought that it
was adding a per_cpu pointer for all cases including the fast path
with wake affine but it only skips the for_each_domain loop for the
slow paths which don't need it because they are already slow.

It would be better to keep the for_each_domain loop for slow paths and
to use a per_cpu pointer for fast_path/wake affine. Regarding the
wake_affine path, we don't really care about looping all domains and
we could directly use the highest domain because wake_affine() that is
used in the loop, only uses the imbalance_pct of the sched domain for
wake_affine_weight() and it should not harm to use only the highest
domain and then select_idle_sibling doesn't use it but the llc or
asym_capacity pointer instead.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16 13:36         ` Vincent Guittot
@ 2020-04-16 15:27           ` Valentin Schneider
  2020-04-16 15:58             ` Vincent Guittot
  0 siblings, 1 reply; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 15:27 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Dietmar Eggemann, linux-kernel, Ingo Molnar, Peter Zijlstra


On 16/04/20 14:36, Vincent Guittot wrote:
>> Coming back to the v2 discussion on this patch
>>
>> https://lore.kernel.org/r/20200311181601.18314-10-valentin.schneider@arm.com
>>
>> SD_BALANCE_WAKE is not used in mainline anymore, so wakeups are always
>> fast today.
>>
>> I.e. you wouldn't need a per_cpu(sd_balance_wake, cpu) since it's always
>> NULL.
>>
>> I.e. want_affine logic and the 'for_each_domain(cpu, tmp)' isn't needed
>> anymore.
>>
>> This will dramatically simplify the code in select_task_rq_fair().
>>
>> But I guess Vincent wants to keep the functionality so we're able to
>> enable SD_BALANCE_WAKE on certain sd's?
>
> I looked too quickly what was done by this patch. I thought that it
> was adding a per_cpu pointer for all cases including the fast path
> with wake affine but it only skips the for_each_domain loop for the
> slow paths which don't need it because they are already slow.
>
> It would be better to keep the for_each_domain loop for slow paths and
> to use a per_cpu pointer for fast_path/wake affine. Regarding the
> wake_affine path, we don't really care about looping all domains and
> we could directly use the highest domain because wake_affine() that is
> used in the loop, only uses the imbalance_pct of the sched domain for
> wake_affine_weight() and it should not harm to use only the highest
> domain and then select_idle_sibling doesn't use it but the llc or
> asym_capacity pointer instead.

So Dietmar's pointing out that sd will always be NULL for want_affine,
because want_affine can only be true at wakeups and we don't have any
topologies with SD_BALANCE_WAKE anymore. We would still want to call
wake_affine() though, because that can change new_cpu.

What you are adding on top is that the only sd field used in wake_affine()
is the imbalance_pct, so we could take a shortcut and just go for the
highest domain with SD_WAKE_AFFINE; i.e. something like this:

---
if (want_affine) {
        // We can cache that at topology buildup
        sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);

        if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
            cpu != prev_cpu)
                new_cpu = wake_affine();

        // Directly go to select_idle_sibling()
        goto sis;
}

// !want_affine logic here
---

As for the !want_affine part, we could either keep the current domain walk
(IIUC this is what you are suggesting) or go for the extra cached pointers
I'm introducing.

Now if we are a bit bolder than that, because there are no more
(mainline) topologies w/ SD_BALANCE_WAKE, we could even turn the above
into:

---
if (wake_flags & WF_TTWU) {
        if (want_affine) {
                // We can cache that at topology buildup
                sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);

                if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
                    cpu != prev_cpu)
                        new_cpu = wake_affine();

        }
        // Directly go to select_idle_sibling()
        goto sis;
}

// !want_affine logic here
---

This in turns mean we could get rid of SD_BALANCE_WAKE entirely... I'm a
bit more reluctant to that only because the last SD_BALANCE_WAKE setter was
removed fairly recently, see
  a526d466798d ("sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems")

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16 15:27           ` Valentin Schneider
@ 2020-04-16 15:58             ` Vincent Guittot
  2020-04-16 20:54               ` Valentin Schneider
  0 siblings, 1 reply; 30+ messages in thread
From: Vincent Guittot @ 2020-04-16 15:58 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Dietmar Eggemann, linux-kernel, Ingo Molnar, Peter Zijlstra

On Thu, 16 Apr 2020 at 17:27, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
>
> On 16/04/20 14:36, Vincent Guittot wrote:
> >> Coming back to the v2 discussion on this patch
> >>
> >> https://lore.kernel.org/r/20200311181601.18314-10-valentin.schneider@arm.com
> >>
> >> SD_BALANCE_WAKE is not used in mainline anymore, so wakeups are always
> >> fast today.
> >>
> >> I.e. you wouldn't need a per_cpu(sd_balance_wake, cpu) since it's always
> >> NULL.
> >>
> >> I.e. want_affine logic and the 'for_each_domain(cpu, tmp)' isn't needed
> >> anymore.
> >>
> >> This will dramatically simplify the code in select_task_rq_fair().
> >>
> >> But I guess Vincent wants to keep the functionality so we're able to
> >> enable SD_BALANCE_WAKE on certain sd's?
> >
> > I looked too quickly what was done by this patch. I thought that it
> > was adding a per_cpu pointer for all cases including the fast path
> > with wake affine but it only skips the for_each_domain loop for the
> > slow paths which don't need it because they are already slow.
> >
> > It would be better to keep the for_each_domain loop for slow paths and
> > to use a per_cpu pointer for fast_path/wake affine. Regarding the
> > wake_affine path, we don't really care about looping all domains and
> > we could directly use the highest domain because wake_affine() that is
> > used in the loop, only uses the imbalance_pct of the sched domain for
> > wake_affine_weight() and it should not harm to use only the highest
> > domain and then select_idle_sibling doesn't use it but the llc or
> > asym_capacity pointer instead.
>
> So Dietmar's pointing out that sd will always be NULL for want_affine,
> because want_affine can only be true at wakeups and we don't have any
> topologies with SD_BALANCE_WAKE anymore. We would still want to call
> wake_affine() though, because that can change new_cpu.
>
> What you are adding on top is that the only sd field used in wake_affine()
> is the imbalance_pct, so we could take a shortcut and just go for the
> highest domain with SD_WAKE_AFFINE; i.e. something like this:
>
> ---
> if (want_affine) {
>         // We can cache that at topology buildup
>         sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);

Yes and this one should be cached at topology buildup

>
>         if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
>             cpu != prev_cpu)
>                 new_cpu = wake_affine();
>
>         // Directly go to select_idle_sibling()
>         goto sis;
> }
>
> // !want_affine logic here
> ---
>
> As for the !want_affine part, we could either keep the current domain walk
> (IIUC this is what you are suggesting) or go for the extra cached pointers
> I'm introducing.
>
> Now if we are a bit bolder than that, because there are no more
> (mainline) topologies w/ SD_BALANCE_WAKE, we could even turn the above
> into:
>
> ---
> if (wake_flags & WF_TTWU) {
>         if (want_affine) {
>                 // We can cache that at topology buildup
>                 sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);
>
>                 if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
>                     cpu != prev_cpu)
>                         new_cpu = wake_affine();
>
>         }
>         // Directly go to select_idle_sibling()
>         goto sis;
> }
>
> // !want_affine logic here
> ---
>
> This in turns mean we could get rid of SD_BALANCE_WAKE entirely... I'm a
> bit more reluctant to that only because the last SD_BALANCE_WAKE setter was

For now, we should probably skip the additional test above: "if
(wake_flags & WF_TTWU) {" and keep SD_BALANCE_WAKE so we will continue
to loop in case of !want_affine.

We can imagine that we might want at the end to be a bit more smart
for SD_BALANCE_WAKE and the slow path... like with the latency nice
proposal and latency-nice=19 as a example

> removed fairly recently, see
>   a526d466798d ("sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems")

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan
  2020-04-16 15:58             ` Vincent Guittot
@ 2020-04-16 20:54               ` Valentin Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2020-04-16 20:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Dietmar Eggemann, linux-kernel, Ingo Molnar, Peter Zijlstra


On 16/04/20 16:58, Vincent Guittot wrote:
>> ---
>> if (wake_flags & WF_TTWU) {
>>         if (want_affine) {
>>                 // We can cache that at topology buildup
>>                 sd = highest_flag_domain(cpu, SD_WAKE_AFFINE);
>>
>>                 if (cpumask_test_cpu(prev_cpu, sched_domain_span(sd) &&
>>                     cpu != prev_cpu)
>>                         new_cpu = wake_affine();
>>
>>         }
>>         // Directly go to select_idle_sibling()
>>         goto sis;
>> }
>>
>> // !want_affine logic here
>> ---
>>
>> This in turns mean we could get rid of SD_BALANCE_WAKE entirely... I'm a
>> bit more reluctant to that only because the last SD_BALANCE_WAKE setter was
>
> For now, we should probably skip the additional test above: "if
> (wake_flags & WF_TTWU) {" and keep SD_BALANCE_WAKE so we will continue
> to loop in case of !want_affine.
>
> We can imagine that we might want at the end to be a bit more smart
> for SD_BALANCE_WAKE and the slow path... like with the latency nice
> proposal and latency-nice=19 as a example
>

Good point. I'll go for the first option and see where I end up; I'd like
to cache the other domain pointers if possible, I'll do some benchmarking
and see if I can do that without another switch case.

>> removed fairly recently, see
>>   a526d466798d ("sched/topology: Remove SD_BALANCE_WAKE on asymmetric capacity systems")

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/topology: Kill SD_LOAD_BALANCE
  2020-04-15 21:05 ` [PATCH v3 4/9] sched/topology: Kill SD_LOAD_BALANCE Valentin Schneider
@ 2020-05-01 18:22   ` tip-bot2 for Valentin Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-05-01 18:22 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Valentin Schneider, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     36c5bdc4387056af3840adb4478c752faeb9d15e
Gitweb:        https://git.kernel.org/tip/36c5bdc4387056af3840adb4478c752faeb9d15e
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 15 Apr 2020 22:05:07 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 30 Apr 2020 20:14:39 +02:00

sched/topology: Kill SD_LOAD_BALANCE

That flag is set unconditionally in sd_init(), and no one checks for it
anymore. Remove it.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200415210512.805-5-valentin.schneider@arm.com
---
 include/linux/sched/topology.h | 29 ++++++++++++++---------------
 kernel/sched/topology.c        |  3 +--
 2 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 95253ad..fb11091 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,21 +11,20 @@
  */
 #ifdef CONFIG_SMP
 
-#define SD_LOAD_BALANCE		0x0001	/* Do load balancing on this domain. */
-#define SD_BALANCE_NEWIDLE	0x0002	/* Balance when about to become idle */
-#define SD_BALANCE_EXEC		0x0004	/* Balance on exec */
-#define SD_BALANCE_FORK		0x0008	/* Balance on fork, clone */
-#define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
-#define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY	0x0040  /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY	0x0080	/* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share CPU pkg resources */
-#define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
-#define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING	0x1000	/* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
-#define SD_NUMA			0x4000	/* cross-node balancing */
+#define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
+#define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
+#define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
+#define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
+#define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
+#define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
+#define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
+#define SD_SHARE_POWERDOMAIN	0x0080	/* Domain members share power domain */
+#define SD_SHARE_PKG_RESOURCES	0x0100	/* Domain members share CPU pkg resources */
+#define SD_SERIALIZE		0x0200	/* Only a single load balancing instance */
+#define SD_ASYM_PACKING		0x0400  /* Place busy groups earlier in the domain */
+#define SD_PREFER_SIBLING	0x0800	/* Prefer to place tasks in a sibling domain */
+#define SD_OVERLAP		0x1000	/* sched_domains of this level overlap */
+#define SD_NUMA			0x2000	/* cross-node balancing */
 
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index a9dc34a..1d7b446 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1341,8 +1341,7 @@ sd_init(struct sched_domain_topology_level *tl,
 
 		.cache_nice_tries	= 0,
 
-		.flags			= 1*SD_LOAD_BALANCE
-					| 1*SD_BALANCE_NEWIDLE
+		.flags			= 1*SD_BALANCE_NEWIDLE
 					| 1*SD_BALANCE_EXEC
 					| 1*SD_BALANCE_FORK
 					| 0*SD_BALANCE_WAKE

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/debug: Make sd->flags sysctl read-only
  2020-04-15 21:05 ` [PATCH v3 2/9] sched/debug: Make sd->flags sysctl read-only Valentin Schneider
@ 2020-05-01 18:22   ` tip-bot2 for Valentin Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-05-01 18:22 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Valentin Schneider, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9818427c6270a9ce8c52c8621026fe9cebae0f92
Gitweb:        https://git.kernel.org/tip/9818427c6270a9ce8c52c8621026fe9cebae0f92
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 15 Apr 2020 22:05:05 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 30 Apr 2020 20:14:39 +02:00

sched/debug: Make sd->flags sysctl read-only

Writing to the sysctl of a sched_domain->flags directly updates the value of
the field, and goes nowhere near update_top_cache_domain(). This means that
the cached domain pointers can end up containing stale data (e.g. the
domain pointed to doesn't have the relevant flag set anymore).

Explicit domain walks that check for flags will be affected by
the write, but this won't be in sync with the cached pointers which will
still point to the domains that were cached at the last sched_domain
build.

In other words, writing to this interface is playing a dangerous game. It
could be made to trigger an update of the cached sched_domain pointers when
written to, but this does not seem to be worth the trouble. Make it
read-only.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com
---
 kernel/sched/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index b3ac1c1..c6cc02a 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -258,7 +258,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
 	set_table_entry(&table[2], "busy_factor",	  &sd->busy_factor,	    sizeof(int),  0644, proc_dointvec_minmax);
 	set_table_entry(&table[3], "imbalance_pct",	  &sd->imbalance_pct,	    sizeof(int),  0644, proc_dointvec_minmax);
 	set_table_entry(&table[4], "cache_nice_tries",	  &sd->cache_nice_tries,    sizeof(int),  0644, proc_dointvec_minmax);
-	set_table_entry(&table[5], "flags",		  &sd->flags,		    sizeof(int),  0644, proc_dointvec_minmax);
+	set_table_entry(&table[5], "flags",		  &sd->flags,		    sizeof(int),  0444, proc_dointvec_minmax);
 	set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax);
 	set_table_entry(&table[7], "name",		  sd->name,	       CORENAME_MAX_SIZE, 0444, proc_dostring);
 	/* &table[8] is terminator */

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched: Remove checks against SD_LOAD_BALANCE
  2020-04-15 21:05 ` [PATCH v3 3/9] sched: Remove checks against SD_LOAD_BALANCE Valentin Schneider
@ 2020-05-01 18:22   ` tip-bot2 for Valentin Schneider
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-05-01 18:22 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Valentin Schneider, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     e669ac8ab952df2f07dee1e1efbf40647d6de332
Gitweb:        https://git.kernel.org/tip/e669ac8ab952df2f07dee1e1efbf40647d6de332
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 15 Apr 2020 22:05:06 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 30 Apr 2020 20:14:39 +02:00

sched: Remove checks against SD_LOAD_BALANCE

The SD_LOAD_BALANCE flag is set unconditionally for all domains in
sd_init(). By making the sched_domain->flags syctl interface read-only, we
have removed the last piece of code that could clear that flag - as such,
it will now be always present. Rather than to keep carrying it along, we
can work towards getting rid of it entirely.

cpusets don't need it because they can make CPUs be attached to the NULL
domain (e.g. cpuset with sched_load_balance=0), or to a partitioned
root_domain, i.e. a sched_domain hierarchy that doesn't span the entire
system (e.g. root cpuset with sched_load_balance=0 and sibling cpusets with
sched_load_balance=1).

isolcpus apply the same "trick": isolated CPUs are explicitly taken out of
the sched_domain rebuild (using housekeeping_cpumask()), so they get the
NULL domain treatment as well.

Remove the checks against SD_LOAD_BALANCE.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200415210512.805-4-valentin.schneider@arm.com
---
 kernel/sched/fair.c     | 14 ++------------
 kernel/sched/topology.c | 28 +++++++++-------------------
 2 files changed, 11 insertions(+), 31 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 617ca44..4b959c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6649,9 +6649,6 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 
 	rcu_read_lock();
 	for_each_domain(cpu, tmp) {
-		if (!(tmp->flags & SD_LOAD_BALANCE))
-			break;
-
 		/*
 		 * If both 'cpu' and 'prev_cpu' are part of this domain,
 		 * cpu is a valid SD_WAKE_AFFINE target.
@@ -9790,9 +9787,8 @@ static int active_load_balance_cpu_stop(void *data)
 	/* Search for an sd spanning us and the target CPU. */
 	rcu_read_lock();
 	for_each_domain(target_cpu, sd) {
-		if ((sd->flags & SD_LOAD_BALANCE) &&
-		    cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
-				break;
+		if (cpumask_test_cpu(busiest_cpu, sched_domain_span(sd)))
+			break;
 	}
 
 	if (likely(sd)) {
@@ -9881,9 +9877,6 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 		}
 		max_cost += sd->max_newidle_lb_cost;
 
-		if (!(sd->flags & SD_LOAD_BALANCE))
-			continue;
-
 		/*
 		 * Stop the load balance at this level. There is another
 		 * CPU in our sched group which is doing load balancing more
@@ -10472,9 +10465,6 @@ int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 		int continue_balancing = 1;
 		u64 t0, domain_cost;
 
-		if (!(sd->flags & SD_LOAD_BALANCE))
-			continue;
-
 		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
 			update_next_balance(sd, &next_balance);
 			break;
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 8344757..a9dc34a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -33,14 +33,6 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 	cpumask_clear(groupmask);
 
 	printk(KERN_DEBUG "%*s domain-%d: ", level, "", level);
-
-	if (!(sd->flags & SD_LOAD_BALANCE)) {
-		printk("does not load-balance\n");
-		if (sd->parent)
-			printk(KERN_ERR "ERROR: !SD_LOAD_BALANCE domain has parent");
-		return -1;
-	}
-
 	printk(KERN_CONT "span=%*pbl level=%s\n",
 	       cpumask_pr_args(sched_domain_span(sd)), sd->name);
 
@@ -151,8 +143,7 @@ static int sd_degenerate(struct sched_domain *sd)
 		return 1;
 
 	/* Following flags need at least 2 groups */
-	if (sd->flags & (SD_LOAD_BALANCE |
-			 SD_BALANCE_NEWIDLE |
+	if (sd->flags & (SD_BALANCE_NEWIDLE |
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUCAPACITY |
@@ -183,15 +174,14 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 
 	/* Flags needing groups don't count if only 1 group in parent */
 	if (parent->groups == parent->groups->next) {
-		pflags &= ~(SD_LOAD_BALANCE |
-				SD_BALANCE_NEWIDLE |
-				SD_BALANCE_FORK |
-				SD_BALANCE_EXEC |
-				SD_ASYM_CPUCAPACITY |
-				SD_SHARE_CPUCAPACITY |
-				SD_SHARE_PKG_RESOURCES |
-				SD_PREFER_SIBLING |
-				SD_SHARE_POWERDOMAIN);
+		pflags &= ~(SD_BALANCE_NEWIDLE |
+			    SD_BALANCE_FORK |
+			    SD_BALANCE_EXEC |
+			    SD_ASYM_CPUCAPACITY |
+			    SD_SHARE_CPUCAPACITY |
+			    SD_SHARE_PKG_RESOURCES |
+			    SD_PREFER_SIBLING |
+			    SD_SHARE_POWERDOMAIN);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip: sched/core] sched/fair: find_idlest_group(): Remove unused sd_flag parameter
  2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
  2020-04-16  7:40   ` Vincent Guittot
@ 2020-05-01 18:22   ` tip-bot2 for Valentin Schneider
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot2 for Valentin Schneider @ 2020-05-01 18:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Valentin Schneider, Peter Zijlstra (Intel),
	Dietmar Eggemann, Vincent Guittot, x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     45da27732b0b9b7a04696653065d5e6037bc5ac0
Gitweb:        https://git.kernel.org/tip/45da27732b0b9b7a04696653065d5e6037bc5ac0
Author:        Valentin Schneider <valentin.schneider@arm.com>
AuthorDate:    Wed, 15 Apr 2020 22:05:04 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 30 Apr 2020 20:14:39 +02:00

sched/fair: find_idlest_group(): Remove unused sd_flag parameter

The last use of that parameter was removed by commit

  57abff067a08 ("sched/fair: Rework find_idlest_group()")

Get rid of the parameter.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20200415210512.805-2-valentin.schneider@arm.com
---
 kernel/sched/fair.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 63419c6..617ca44 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5825,8 +5825,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 }
 
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
-		  int this_cpu, int sd_flag);
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu);
 
 /*
  * find_idlest_group_cpu - find the idlest CPU among the CPUs in the group.
@@ -5909,7 +5908,7 @@ static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p
 			continue;
 		}
 
-		group = find_idlest_group(sd, p, cpu, sd_flag);
+		group = find_idlest_group(sd, p, cpu);
 		if (!group) {
 			sd = sd->child;
 			continue;
@@ -8681,8 +8680,7 @@ static bool update_pick_idlest(struct sched_group *idlest,
  * Assumes p is allowed on at least one CPU in sd.
  */
 static struct sched_group *
-find_idlest_group(struct sched_domain *sd, struct task_struct *p,
-		  int this_cpu, int sd_flag)
+find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu)
 {
 	struct sched_group *idlest = NULL, *local = NULL, *group = sd->groups;
 	struct sg_lb_stats local_sgs, tmp_sgs;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-05-01 18:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15 21:05 [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 1/9] sched/fair: find_idlest_group(): Remove unused sd_flag parameter Valentin Schneider
2020-04-16  7:40   ` Vincent Guittot
2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 2/9] sched/debug: Make sd->flags sysctl read-only Valentin Schneider
2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 3/9] sched: Remove checks against SD_LOAD_BALANCE Valentin Schneider
2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 4/9] sched/topology: Kill SD_LOAD_BALANCE Valentin Schneider
2020-05-01 18:22   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 5/9] sched: Add WF_TTWU, WF_EXEC wakeup flags Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 6/9] sched: Kill select_task_rq()'s sd_flag parameter Valentin Schneider
2020-04-16  7:42   ` Vincent Guittot
2020-04-16 10:24     ` Valentin Schneider
2020-04-16 10:47     ` Peter Zijlstra
2020-04-16 11:43       ` Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 7/9] sched/fair: Dissociate wakeup decisions from SD flag value Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 8/9] sched/fair: Split select_task_rq_fair want_affine logic Valentin Schneider
2020-04-15 21:05 ` [PATCH v3 9/9] sched/topology: Define and use shortcut pointers for wakeup sd_flag scan Valentin Schneider
2020-04-16  7:46   ` Vincent Guittot
2020-04-16 10:24     ` Valentin Schneider
2020-04-16 13:04       ` Dietmar Eggemann
2020-04-16 13:36         ` Vincent Guittot
2020-04-16 15:27           ` Valentin Schneider
2020-04-16 15:58             ` Vincent Guittot
2020-04-16 20:54               ` Valentin Schneider
2020-04-16 10:58 ` [PATCH v3 0/9] sched: Streamline select_task_rq() & select_task_rq_fair() Peter Zijlstra
2020-04-16 11:00   ` Peter Zijlstra
2020-04-16 11:02     ` Valentin Schneider
2020-04-16 13:00     ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).