All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Reduce scheduler migrations due to wake_affine
@ 2017-12-18  9:43 Mel Gorman
  2017-12-18  9:43 ` [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Mel Gorman @ 2017-12-18  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, Mel Gorman, LKML

wake_affine has the impossible task of figuring out when it's best for a
waker to pull a wakee towards the wakers CPU on the expectation that data
locality will offset the migration. It's hurt by the fact that most wakeups
cannot run on the current CPU to avoid stacking multiple tasks on one CPU
by accident so it depends heavily on topology and which CPU nearby is idle.
This series special cases some wake_affine decisions.

Patch 1 was already posted but is a pre-requisite for the other patches. It
	avoids wake_affine pulling a task to a different node if the wakeup
	source is an interrupt. This is on the basis that we have little
	knowledge of whather the CPU servicing the interrupt is relevant
	to the data locality of the task being woken. The data from the
	interrupt itself may be a tiny proportion of the tasks working
	set.

Patch 2 notes that a previous CPU that is idle and cache affine with
	the waker is probably a suitable idle sibling and that a search
	in select_idle_sibling can be avoided.

Patch 3 just adds a comment for someone who doesn't know the history of
	sync wakeups

Patch 4 special cases kworkers that run on a specific CPU as they can have
	a synchronous relationship between waker and wakee

The changelog includes some data but results would also be highly machine
specific.  For example, I noted a relatively small improvement from patch
1 while Mike Galbraith reported a significant gain on a different machine
for the same workload. YMMV.

 kernel/sched/fair.c     | 108 +++++++++++++++++++++++++++++++++++++++---------
 kernel/sched/features.h |   8 ++++
 2 files changed, 96 insertions(+), 20 deletions(-)

-- 
2.15.0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache
  2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
@ 2017-12-18  9:43 ` Mel Gorman
  2017-12-18  9:43 ` [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2017-12-18  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, Mel Gorman, LKML

If waking from an idle CPU due to an interrupt then it's possible that
the waker task will be pulled to wake on the current CPU. Unfortunately,
depending on the type of interrupt and IRQ configuration, there may not
be a strong relationship between the CPU an interrupt was delivered on
and the CPU a task was running on. For example, the interrupts could all
be delivered to CPUs on one particular node due to the machine topology
or IRQ affinity configuration. Another example is an interrupt for an IO
completion which can be delivered to any CPU where there is no guarantee
the data is either cache hot or even local.

This patch was motivated by the observation that an IO workload was
being pulled cross-node on a frequent basis when IO completed.  From a
wakeup latency perspective, it's still useful to know that an idle CPU is
immediately available for use but lets only consider an automatic migration
if the CPUs share cache to limit damage due to NUMA migrations. Migrations
may still occur if wake_affine_weight determines it's appropriate.

These are the throughput results for dbench running on ext4 comparing
4.15-rc3 and this patch on a 2-socket machine where interrupts due to IO
completions can happen on any CPU.

                          4.15.0-rc3             4.15.0-rc3
                             vanilla            lessmigrate
Hmean     1        854.64 (   0.00%)      865.01 (   1.21%)
Hmean     2       1229.60 (   0.00%)     1274.44 (   3.65%)
Hmean     4       1591.81 (   0.00%)     1628.08 (   2.28%)
Hmean     8       1845.04 (   0.00%)     1831.80 (  -0.72%)
Hmean     16      2038.61 (   0.00%)     2091.44 (   2.59%)
Hmean     32      2327.19 (   0.00%)     2430.29 (   4.43%)
Hmean     64      2570.61 (   0.00%)     2568.54 (  -0.08%)
Hmean     128     2481.89 (   0.00%)     2499.28 (   0.70%)
Stddev    1         14.31 (   0.00%)        5.35 (  62.65%)
Stddev    2         21.29 (   0.00%)       11.09 (  47.92%)
Stddev    4          7.22 (   0.00%)        6.80 (   5.92%)
Stddev    8         26.70 (   0.00%)        9.41 (  64.76%)
Stddev    16        22.40 (   0.00%)       20.01 (  10.70%)
Stddev    32        45.13 (   0.00%)       44.74 (   0.85%)
Stddev    64        93.10 (   0.00%)       93.18 (  -0.09%)
Stddev    128      184.28 (   0.00%)      177.85 (   3.49%)

Note the small increase in throughput for low thread counts but also
note that the standard deviation for each sample during the test run is
lower. The throughput figures for dbench can be misleading so the benchmark
is actually modified to time the latency of the processing of one load
file with many samples taken. The difference in latency is

                           4.15.0-rc3             4.15.0-rc3
                              vanilla            lessmigrate
Amean      1         21.71 (   0.00%)       21.47 (   1.08%)
Amean      2         30.89 (   0.00%)       29.58 (   4.26%)
Amean      4         47.54 (   0.00%)       46.61 (   1.97%)
Amean      8         82.71 (   0.00%)       82.81 (  -0.12%)
Amean      16       149.45 (   0.00%)      145.01 (   2.97%)
Amean      32       265.49 (   0.00%)      248.43 (   6.42%)
Amean      64       463.23 (   0.00%)      463.55 (  -0.07%)
Amean      128      933.97 (   0.00%)      935.50 (  -0.16%)
Stddev     1          1.58 (   0.00%)        1.54 (   2.26%)
Stddev     2          2.84 (   0.00%)        2.95 (  -4.15%)
Stddev     4          6.78 (   0.00%)        6.85 (  -0.99%)
Stddev     8         16.85 (   0.00%)       16.37 (   2.85%)
Stddev     16        41.59 (   0.00%)       41.04 (   1.32%)
Stddev     32       111.05 (   0.00%)      105.11 (   5.35%)
Stddev     64       285.94 (   0.00%)      288.01 (  -0.72%)
Stddev     128      803.39 (   0.00%)      809.73 (  -0.79%)

It's a small improvement which is not surprising given that migrations that
migrate to a different node as not that common. However, it is noticable
in the CPU migration statistics which are reduced by 24%.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2fe3aa853e4d..4a1f7d32ecf6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5701,7 +5701,13 @@ static bool
 wake_affine_idle(struct sched_domain *sd, struct task_struct *p,
 		 int this_cpu, int prev_cpu, int sync)
 {
-	if (idle_cpu(this_cpu))
+	/*
+	 * If this_cpu is idle, it implies the wakeup is from interrupt
+	 * context. Only allow the move if cache is shared. Otherwise an
+	 * interrupt intensive workload could force all tasks onto one
+	 * node depending on the IO topology or IRQ affinity settings.
+	 */
+	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
 		return true;
 
 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker
  2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
  2017-12-18  9:43 ` [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache Mel Gorman
@ 2017-12-18  9:43 ` Mel Gorman
  2017-12-18 10:26   ` Peter Zijlstra
  2017-12-18  9:43 ` [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU Mel Gorman
  2017-12-18  9:43 ` [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU Mel Gorman
  3 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2017-12-18  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, Mel Gorman, LKML

With the commit "sched: Only migrate tasks due to interrupts if prev
and target CPUs share cache", we no longer migrate a task from interrupt
context if the waker does not share a CPU. However, for a normal wakeup
from a cache-affine process, we can miss the fact that prev_cpu is idle
and an appropriate sibling leading to unnecessary searches and migrations.

This patch reworks wake_affine to return a suitable CPU to wake on which
may be the current or prev CPU. If wake_affine_idle returns prev due to it
being idle then select_idle_sibling will immediately return the prev_cpu
without searching. It's slightly mixed on dbench using ext4 with gains when the machine is lightly
loaded and a small regression borderline on the noise when more than a node's
worth of CPU is used.

                          4.15.0-rc3             4.15.0-rc3
                               noirq               wakeprev
Hmean     1        865.01 (   0.00%)      834.19 (  -3.56%)
Hmean     2       1274.44 (   0.00%)     1353.09 (   6.17%)
Hmean     4       1628.08 (   0.00%)     1714.82 (   5.33%)
Hmean     8       1831.80 (   0.00%)     1855.84 (   1.31%)
Hmean     16      2091.44 (   0.00%)     1975.40 (  -5.55%)
Hmean     32      2430.29 (   0.00%)     2298.58 (  -5.42%)
Hmean     64      2568.54 (   0.00%)     2536.56 (  -1.25%)
Hmean     128     2499.28 (   0.00%)     2543.81 (   1.78%)
Stddev    1          5.35 (   0.00%)       19.39 (-262.63%)
Stddev    2         11.09 (   0.00%)        4.88 (  55.97%)
Stddev    4          6.80 (   0.00%)        9.24 ( -35.93%)
Stddev    8          9.41 (   0.00%)       28.39 (-201.82%)
Stddev    16        20.01 (   0.00%)       44.92 (-124.56%)
Stddev    32        44.74 (   0.00%)       50.14 ( -12.07%)
Stddev    64        93.18 (   0.00%)       84.97 (   8.81%)
Stddev    128      177.85 (   0.00%)      178.00 (  -0.09%)

However, system CPU usage is noticably reduced

          4.15.0-rc3  4.15.0-rc3
               noirq    wakeprev
User         1058.32     1077.42
System       5729.22     5287.61
Elapsed      1550.69     1553.09

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 70 ++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4a1f7d32ecf6..392e08b364bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5689,17 +5689,21 @@ static int wake_wide(struct task_struct *p)
  * soonest. For the purpose of speed we only consider the waking and previous
  * CPU.
  *
- * wake_affine_idle() - only considers 'now', it check if the waking CPU is (or
- *			will be) idle.
+ * wake_affine_idle() - only considers 'now', it checks if a CPU that is
+ *			cache-affine with the waker is idle
+ *
+ * wake_affine_sync() - only considers 'now', it checks if the waking CPU
+ *			will be idle. Migrations to a different NUMA node
+ *			are allowed on the basis that sync wakeups imply
+ *			shared data between waker and wakee.
  *
  * wake_affine_weight() - considers the weight to reflect the average
  *			  scheduling latency of the CPUs. This seems to work
  *			  for the overloaded case.
  */
 
-static bool
-wake_affine_idle(struct sched_domain *sd, struct task_struct *p,
-		 int this_cpu, int prev_cpu, int sync)
+static int
+wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 {
 	/*
 	 * If this_cpu is idle, it implies the wakeup is from interrupt
@@ -5710,13 +5714,36 @@ wake_affine_idle(struct sched_domain *sd, struct task_struct *p,
 	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
 		return true;
 
+	/*
+	 * Prefer migration if it's an interrupt on the assumption that the
+	 * data is cache hot to the CPU receiving the interrupt.
+	 */
+	if (idle_cpu(this_cpu))
+		return this_cpu;
+
+	/*
+	 * For normal wakeups, we use the prev_cpu if it's cache affine but
+	 * for remote wakeups, rely on wake_affine_weight to determine if
+	 * if it's best to pull the waker to the wakee. For sync wakeups,
+	 * rely on wake_affine_sync to determine if the task should wakeup
+	 * on the current CPU.
+	*/
+	if (this_cpu != prev_cpu && !sync && idle_cpu(prev_cpu))
+		return prev_cpu;
+
+	return nr_cpumask_bits;
+}
+
+static int
+wake_affine_sync(int this_cpu, int sync)
+{
 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
-		return true;
+		return this_cpu;
 
-	return false;
+	return nr_cpumask_bits;
 }
 
-static bool
+static int
 wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		   int this_cpu, int prev_cpu, int sync)
 {
@@ -5730,7 +5757,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		unsigned long current_load = task_h_load(current);
 
 		if (current_load > this_eff_load)
-			return true;
+			return this_cpu;
 
 		this_eff_load -= current_load;
 	}
@@ -5747,28 +5774,34 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
 	prev_eff_load *= capacity_of(this_cpu);
 
-	return this_eff_load <= prev_eff_load;
+	if (this_eff_load <= prev_eff_load)
+		return this_cpu;
+	return nr_cpumask_bits;
 }
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		       int prev_cpu, int sync)
 {
 	int this_cpu = smp_processor_id();
-	bool affine = false;
+	int new_cpu = nr_cpumask_bits;
+
+	if (sched_feat(WA_IDLE))
+		new_cpu = wake_affine_idle(this_cpu, prev_cpu, sync);
 
-	if (sched_feat(WA_IDLE) && !affine)
-		affine = wake_affine_idle(sd, p, this_cpu, prev_cpu, sync);
+	if (sched_feat(WA_IDLE) && new_cpu == nr_cpumask_bits)
+		new_cpu = wake_affine_sync(this_cpu, sync);
 
-	if (sched_feat(WA_WEIGHT) && !affine)
-		affine = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
+	if (sched_feat(WA_WEIGHT) && new_cpu == nr_cpumask_bits)
+		new_cpu = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
 
 	schedstat_inc(p->se.statistics.nr_wakeups_affine_attempts);
-	if (affine) {
+	if (new_cpu != nr_cpumask_bits) {
 		schedstat_inc(sd->ttwu_move_affine);
 		schedstat_inc(p->se.statistics.nr_wakeups_affine);
+		return new_cpu;
 	}
 
-	return affine;
+	return prev_cpu;
 }
 
 static inline int task_util(struct task_struct *p);
@@ -6361,8 +6394,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 		if (cpu == prev_cpu)
 			goto pick_cpu;
 
-		if (wake_affine(affine_sd, p, prev_cpu, sync))
-			new_cpu = cpu;
+		new_cpu = wake_affine(affine_sd, p, prev_cpu, sync);
 	}
 
 	if (sd && !(sd_flag & SD_BALANCE_FORK)) {
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU
  2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
  2017-12-18  9:43 ` [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache Mel Gorman
  2017-12-18  9:43 ` [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker Mel Gorman
@ 2017-12-18  9:43 ` Mel Gorman
  2017-12-19 19:06   ` Peter Zijlstra
  2017-12-18  9:43 ` [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU Mel Gorman
  3 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2017-12-18  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, Mel Gorman, LKML

The sync wakeup logic in wake_affine_idle deserves a short description.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 392e08b364bd..95b1145bc38d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5737,6 +5737,11 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 static int
 wake_affine_sync(int this_cpu, int sync)
 {
+	/*
+	 * Consider stacking tasks if it's a sync wakeup and there is only
+	 * one task on the runqueue. sync wakesups are expected to sleep
+	 * either immediately or shortly after the wakeup.
+	 */
 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
 		return this_cpu;
 
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU
  2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
                   ` (2 preceding siblings ...)
  2017-12-18  9:43 ` [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU Mel Gorman
@ 2017-12-18  9:43 ` Mel Gorman
  2017-12-18 10:44   ` Mike Galbraith
  3 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2017-12-18  9:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, Mel Gorman, LKML

If tasks wake a kworker to do some work and is woken on completion and it
was a per-cpu kworker that was used then a situation can arise where the
current CPU is always active when the kworker is waking and select_idle_sibling
moves the task. This leads to a situation where a task moves around the socket
each time a kworker is used even through the relationship is effectively sync.
This patch special cases a kworker running on the same CPU. It has a noticable
impact on migrations and performance of dbench running with the XFS filesystem
but has no impact on ext4 as ext4 interacts with a kthread, not a kworker.

                          4.15.0-rc3             4.15.0-rc3
                            wakeprev                stackwq
Hmean     1        392.92 (   0.00%)     1024.22 ( 160.67%)
Hmean     2        787.09 (   0.00%)     1808.38 ( 129.75%)
Hmean     4       1559.71 (   0.00%)     2525.42 (  61.92%)
Hmean     8       2576.05 (   0.00%)     2881.12 (  11.84%)
Hmean     16      2949.28 (   0.00%)     3137.65 (   6.39%)
Hmean     32      3041.89 (   0.00%)     3147.92 (   3.49%)
Hmean     64      1655.42 (   0.00%)     1756.21 (   6.09%)
Hmean     128     1133.19 (   0.00%)     1165.39 (   2.84%)
Stddev    1          2.59 (   0.00%)       11.21 (-332.82%)
Stddev    2          8.96 (   0.00%)       13.57 ( -51.44%)
Stddev    4         20.15 (   0.00%)        8.51 (  57.75%)
Stddev    8         17.15 (   0.00%)       14.45 (  15.75%)
Stddev    16        30.29 (   0.00%)       31.30 (  -3.33%)
Stddev    32        64.45 (   0.00%)       57.22 (  11.21%)
Stddev    64        55.89 (   0.00%)       62.84 ( -12.43%)
Stddev    128       55.89 (   0.00%)       62.75 ( -12.27%)

There is also a large drop in system CPU usage;

          4.15.0-rc3  4.15.0-rc3
            wakeprev     stackwq
User         1561.85     1166.59
System       6961.89     4965.09
Elapsed      1472.05     1471.84

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c     | 29 +++++++++++++++++++++++++++--
 kernel/sched/features.h |  8 ++++++++
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 95b1145bc38d..cff55481bd19 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5684,6 +5684,19 @@ static int wake_wide(struct task_struct *p)
 	return 1;
 }
 
+/*
+ * Returns true if a wakeup is either from or to a workqueue and the tasks
+ * appear to be synchronised with each other.
+ */
+static bool
+is_wakeup_workqueue_sync(struct task_struct *p, int this_cpu, int prev_cpu)
+{
+	return sched_feat(WA_STACK_WQ) &&
+		this_cpu == prev_cpu &&
+		((p->flags & PF_WQ_WORKER) || (current->flags & PF_WQ_WORKER)) &&
+		this_rq()->nr_running <= 1;
+}
+
 /*
  * The purpose of wake_affine() is to quickly determine on which CPU we can run
  * soonest. For the purpose of speed we only consider the waking and previous
@@ -5735,7 +5748,7 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 }
 
 static int
-wake_affine_sync(int this_cpu, int sync)
+wake_affine_sync(struct task_struct *p, int this_cpu, int prev_cpu, int sync)
 {
 	/*
 	 * Consider stacking tasks if it's a sync wakeup and there is only
@@ -5745,6 +5758,14 @@ wake_affine_sync(int this_cpu, int sync)
 	if (sync && cpu_rq(this_cpu)->nr_running == 1)
 		return this_cpu;
 
+	/*
+	 * If the waker or wakee is a workqueue and it appears to be similar
+	 * to a sync wakeup then assume the waker will sleep shortly and allow
+	 * the tasks to stack on the same CPU.
+	 */
+	if (is_wakeup_workqueue_sync(p, this_cpu, prev_cpu))
+		return this_cpu;
+
 	return nr_cpumask_bits;
 }
 
@@ -5794,7 +5815,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
 		new_cpu = wake_affine_idle(this_cpu, prev_cpu, sync);
 
 	if (sched_feat(WA_IDLE) && new_cpu == nr_cpumask_bits)
-		new_cpu = wake_affine_sync(this_cpu, sync);
+		new_cpu = wake_affine_sync(p, this_cpu, prev_cpu, sync);
 
 	if (sched_feat(WA_WEIGHT) && new_cpu == nr_cpumask_bits)
 		new_cpu = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
@@ -6240,6 +6261,10 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (idle_cpu(target))
 		return target;
 
+	/* Allow a wakeup to stack if it looks like a synchronous workqueue */
+	if (is_wakeup_workqueue_sync(p, smp_processor_id(), target))
+		return target;
+
 	/*
 	 * If the previous cpu is cache affine and idle, don't be stupid.
 	 */
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 9552fd5854bf..c96ad246584a 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -85,3 +85,11 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true)
 SCHED_FEAT(WA_IDLE, true)
 SCHED_FEAT(WA_WEIGHT, true)
 SCHED_FEAT(WA_BIAS, true)
+
+/*
+ * If true then a process may stack with a workqueue on the same CPU during
+ * wakeup instead of finding an idle sibling. This should only happen in the
+ * case where there appears to be a strong relationship beween the wq and the
+ * task e.g. IO operations dispatched to a workqueue on XFS.
+ */
+SCHED_FEAT(WA_STACK_WQ, true)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker
  2017-12-18  9:43 ` [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker Mel Gorman
@ 2017-12-18 10:26   ` Peter Zijlstra
  2017-12-18 10:56     ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-12-18 10:26 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Ingo Molnar, Matt Fleming, LKML

On Mon, Dec 18, 2017 at 09:43:25AM +0000, Mel Gorman wrote:
> With the commit "sched: Only migrate tasks due to interrupts if prev
> and target CPUs share cache", we no longer migrate a task from interrupt
> context if the waker does not share a CPU. However, for a normal wakeup
> from a cache-affine process, we can miss the fact that prev_cpu is idle
> and an appropriate sibling leading to unnecessary searches and migrations.
> 
> This patch reworks wake_affine to return a suitable CPU to wake on which
> may be the current or prev CPU. If wake_affine_idle returns prev due to it
> being idle then select_idle_sibling will immediately return the prev_cpu
> without searching. It's slightly mixed on dbench using ext4 with gains when the machine is lightly
> loaded and a small regression borderline on the noise when more than a node's
> worth of CPU is used.
> 
>                           4.15.0-rc3             4.15.0-rc3
>                                noirq               wakeprev
> Hmean     1        865.01 (   0.00%)      834.19 (  -3.56%)
> Hmean     2       1274.44 (   0.00%)     1353.09 (   6.17%)
> Hmean     4       1628.08 (   0.00%)     1714.82 (   5.33%)
> Hmean     8       1831.80 (   0.00%)     1855.84 (   1.31%)
> Hmean     16      2091.44 (   0.00%)     1975.40 (  -5.55%)
> Hmean     32      2430.29 (   0.00%)     2298.58 (  -5.42%)
> Hmean     64      2568.54 (   0.00%)     2536.56 (  -1.25%)
> Hmean     128     2499.28 (   0.00%)     2543.81 (   1.78%)
> Stddev    1          5.35 (   0.00%)       19.39 (-262.63%)
> Stddev    2         11.09 (   0.00%)        4.88 (  55.97%)
> Stddev    4          6.80 (   0.00%)        9.24 ( -35.93%)
> Stddev    8          9.41 (   0.00%)       28.39 (-201.82%)
> Stddev    16        20.01 (   0.00%)       44.92 (-124.56%)
> Stddev    32        44.74 (   0.00%)       50.14 ( -12.07%)
> Stddev    64        93.18 (   0.00%)       84.97 (   8.81%)
> Stddev    128      177.85 (   0.00%)      178.00 (  -0.09%)
> 
> However, system CPU usage is noticably reduced
> 
>           4.15.0-rc3  4.15.0-rc3
>                noirq    wakeprev
> User         1058.32     1077.42
> System       5729.22     5287.61
> Elapsed      1550.69     1553.09
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 70 ++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 51 insertions(+), 19 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 4a1f7d32ecf6..392e08b364bd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5689,17 +5689,21 @@ static int wake_wide(struct task_struct *p)
>   * soonest. For the purpose of speed we only consider the waking and previous
>   * CPU.
>   *
> - * wake_affine_idle() - only considers 'now', it check if the waking CPU is (or
> - *			will be) idle.
> + * wake_affine_idle() - only considers 'now', it checks if a CPU that is
> + *			cache-affine with the waker is idle
> + *

This bit belongs in the previous patch I'm thinking.

> + * wake_affine_sync() - only considers 'now', it checks if the waking CPU
> + *			will be idle. Migrations to a different NUMA node
> + *			are allowed on the basis that sync wakeups imply
> + *			shared data between waker and wakee.

And it would be nice if we can rework the return value thing in a
separate patch from adding that affine_sync thing, and then munge your
3rd patch along wiht the patch that introduces it.


Did you run these patches on more than just dbench? In specific I
suppose the schbench stuff from facebook would be interesting. Also that
NAS-lu benchmark.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU
  2017-12-18  9:43 ` [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU Mel Gorman
@ 2017-12-18 10:44   ` Mike Galbraith
  2017-12-18 11:25     ` Mel Gorman
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2017-12-18 10:44 UTC (permalink / raw)
  To: Mel Gorman, Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, LKML

On Mon, 2017-12-18 at 09:43 +0000, Mel Gorman wrote:
> If tasks wake a kworker to do some work and is woken on completion and it
> was a per-cpu kworker that was used then a situation can arise where the
> current CPU is always active when the kworker is waking and select_idle_sibling
> moves the task. This leads to a situation where a task moves around the socket
> each time a kworker is used even through the relationship is effectively sync.
> This patch special cases a kworker running on the same CPU. It has a noticable
> impact on migrations and performance of dbench running with the XFS filesystem
> but has no impact on ext4 as ext4 interacts with a kthread, not a kworker.

I think intentional stacking is a very bad idea unless you know with
absolute certainty that waker/wakee are in fact 100% synchronous.  This
is IMO the wrong way to go about combating the excessive bouncing, that
can be achieved by simple ratelimiting.


>                           4.15.0-rc3             4.15.0-rc3
>                             wakeprev                stackwq
> Hmean     1        392.92 (   0.00%)     1024.22 ( 160.67%)
> Hmean     2        787.09 (   0.00%)     1808.38 ( 129.75%)
> Hmean     4       1559.71 (   0.00%)     2525.42 (  61.92%)
> Hmean     8       2576.05 (   0.00%)     2881.12 (  11.84%)
> Hmean     16      2949.28 (   0.00%)     3137.65 (   6.39%)
> Hmean     32      3041.89 (   0.00%)     3147.92 (   3.49%)
> Hmean     64      1655.42 (   0.00%)     1756.21 (   6.09%)
> Hmean     128     1133.19 (   0.00%)     1165.39 (   2.84%)
> Stddev    1          2.59 (   0.00%)       11.21 (-332.82%)
> Stddev    2          8.96 (   0.00%)       13.57 ( -51.44%)
> Stddev    4         20.15 (   0.00%)        8.51 (  57.75%)
> Stddev    8         17.15 (   0.00%)       14.45 (  15.75%)
> Stddev    16        30.29 (   0.00%)       31.30 (  -3.33%)
> Stddev    32        64.45 (   0.00%)       57.22 (  11.21%)
> Stddev    64        55.89 (   0.00%)       62.84 ( -12.43%)
> Stddev    128       55.89 (   0.00%)       62.75 ( -12.27%)
> 
> There is also a large drop in system CPU usage;
> 
>           4.15.0-rc3  4.15.0-rc3
>             wakeprev     stackwq
> User         1561.85     1166.59
> System       6961.89     4965.09
> Elapsed      1472.05     1471.84
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c     | 29 +++++++++++++++++++++++++++--
>  kernel/sched/features.h |  8 ++++++++
>  2 files changed, 35 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 95b1145bc38d..cff55481bd19 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5684,6 +5684,19 @@ static int wake_wide(struct task_struct *p)
>  	return 1;
>  }
>  
> +/*
> + * Returns true if a wakeup is either from or to a workqueue and the tasks
> + * appear to be synchronised with each other.
> + */
> +static bool
> +is_wakeup_workqueue_sync(struct task_struct *p, int this_cpu, int prev_cpu)
> +{
> +	return sched_feat(WA_STACK_WQ) &&
> +		this_cpu == prev_cpu &&
> +		((p->flags & PF_WQ_WORKER) || (current->flags & PF_WQ_WORKER)) &&
> +		this_rq()->nr_running <= 1;
> +}
> +
>  /*
>   * The purpose of wake_affine() is to quickly determine on which CPU we can run
>   * soonest. For the purpose of speed we only consider the waking and previous
> @@ -5735,7 +5748,7 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>  }
>  
>  static int
> -wake_affine_sync(int this_cpu, int sync)
> +wake_affine_sync(struct task_struct *p, int this_cpu, int prev_cpu, int sync)
>  {
>  	/*
>  	 * Consider stacking tasks if it's a sync wakeup and there is only
> @@ -5745,6 +5758,14 @@ wake_affine_sync(int this_cpu, int sync)
>  	if (sync && cpu_rq(this_cpu)->nr_running == 1)
>  		return this_cpu;
>  
> +	/*
> +	 * If the waker or wakee is a workqueue and it appears to be similar
> +	 * to a sync wakeup then assume the waker will sleep shortly and allow
> +	 * the tasks to stack on the same CPU.
> +	 */
> +	if (is_wakeup_workqueue_sync(p, this_cpu, prev_cpu))
> +		return this_cpu;
> +
>  	return nr_cpumask_bits;
>  }
>  
> @@ -5794,7 +5815,7 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>  		new_cpu = wake_affine_idle(this_cpu, prev_cpu, sync);
>  
>  	if (sched_feat(WA_IDLE) && new_cpu == nr_cpumask_bits)
> -		new_cpu = wake_affine_sync(this_cpu, sync);
> +		new_cpu = wake_affine_sync(p, this_cpu, prev_cpu, sync);
>  
>  	if (sched_feat(WA_WEIGHT) && new_cpu == nr_cpumask_bits)
>  		new_cpu = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync);
> @@ -6240,6 +6261,10 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  	if (idle_cpu(target))
>  		return target;
>  
> +	/* Allow a wakeup to stack if it looks like a synchronous workqueue */
> +	if (is_wakeup_workqueue_sync(p, smp_processor_id(), target))
> +		return target;
> +
>  	/*
>  	 * If the previous cpu is cache affine and idle, don't be stupid.
>  	 */
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 9552fd5854bf..c96ad246584a 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -85,3 +85,11 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true)
>  SCHED_FEAT(WA_IDLE, true)
>  SCHED_FEAT(WA_WEIGHT, true)
>  SCHED_FEAT(WA_BIAS, true)
> +
> +/*
> + * If true then a process may stack with a workqueue on the same CPU during
> + * wakeup instead of finding an idle sibling. This should only happen in the
> + * case where there appears to be a strong relationship beween the wq and the
> + * task e.g. IO operations dispatched to a workqueue on XFS.
> + */
> +SCHED_FEAT(WA_STACK_WQ, true)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker
  2017-12-18 10:26   ` Peter Zijlstra
@ 2017-12-18 10:56     ` Mel Gorman
  2017-12-18 10:59       ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Mel Gorman @ 2017-12-18 10:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, LKML

On Mon, Dec 18, 2017 at 11:26:10AM +0100, Peter Zijlstra wrote:
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  kernel/sched/fair.c | 70 ++++++++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 51 insertions(+), 19 deletions(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 4a1f7d32ecf6..392e08b364bd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5689,17 +5689,21 @@ static int wake_wide(struct task_struct *p)
> >   * soonest. For the purpose of speed we only consider the waking and previous
> >   * CPU.
> >   *
> > - * wake_affine_idle() - only considers 'now', it check if the waking CPU is (or
> > - *			will be) idle.
> > + * wake_affine_idle() - only considers 'now', it checks if a CPU that is
> > + *			cache-affine with the waker is idle
> > + *
> 
> This bit belongs in the previous patch I'm thinking.
> 

Yes, sorry.

> > + * wake_affine_sync() - only considers 'now', it checks if the waking CPU
> > + *			will be idle. Migrations to a different NUMA node
> > + *			are allowed on the basis that sync wakeups imply
> > + *			shared data between waker and wakee.
> 
> And it would be nice if we can rework the return value thing in a
> separate patch from adding that affine_sync thing, and then munge your
> 3rd patch along wiht the patch that introduces it.
> 

I can do that. I also noticed that there was an error in wake_affine_idle
in the version I sent. It'll be corrected and retested.

> Did you run these patches on more than just dbench? In specific I
> suppose the schbench stuff from facebook would be interesting. Also that
> NAS-lu benchmark.
> 

A battery of tests including netperf streaming, netperf rr, sockperf,
hackbench (various configs), pipetest, pgbench, siege, tbench, kernel
building and some shellscript intensive workloads. schbench and NAS were
not included but I'll do that before posting a new revision.

Thanks

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker
  2017-12-18 10:56     ` Mel Gorman
@ 2017-12-18 10:59       ` Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2017-12-18 10:59 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Ingo Molnar, Matt Fleming, LKML

On Mon, Dec 18, 2017 at 10:56:46AM +0000, Mel Gorman wrote:
> On Mon, Dec 18, 2017 at 11:26:10AM +0100, Peter Zijlstra wrote:

> > Did you run these patches on more than just dbench? In specific I
> > suppose the schbench stuff from facebook would be interesting. Also that
> > NAS-lu benchmark.
> > 
> 
> A battery of tests including netperf streaming, netperf rr, sockperf,
> hackbench (various configs), pipetest, pgbench, siege, tbench, kernel
> building and some shellscript intensive workloads. schbench and NAS were
> not included but I'll do that before posting a new revision.


Much appreciated, Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU
  2017-12-18 10:44   ` Mike Galbraith
@ 2017-12-18 11:25     ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2017-12-18 11:25 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, Ingo Molnar, Matt Fleming, LKML

On Mon, Dec 18, 2017 at 11:44:30AM +0100, Mike Galbraith wrote:
> On Mon, 2017-12-18 at 09:43 +0000, Mel Gorman wrote:
> > If tasks wake a kworker to do some work and is woken on completion and it
> > was a per-cpu kworker that was used then a situation can arise where the
> > current CPU is always active when the kworker is waking and select_idle_sibling
> > moves the task. This leads to a situation where a task moves around the socket
> > each time a kworker is used even through the relationship is effectively sync.
> > This patch special cases a kworker running on the same CPU. It has a noticable
> > impact on migrations and performance of dbench running with the XFS filesystem
> > but has no impact on ext4 as ext4 interacts with a kthread, not a kworker.
> 
> I think intentional stacking is a very bad idea unless you know with
> absolute certainty that waker/wakee are in fact 100% synchronous.  This
> is IMO the wrong way to go about combating the excessive bouncing, that
> can be achieved by simple ratelimiting.
> 

Grand, I thought the patch was a bit optimistic but was surprised at the
level of impact for a workload that really did have a synchronous
relationship between waker and wakee. Might be worth revisiting it in
the future, be it rate-limiting or some other mechanism.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU
  2017-12-18  9:43 ` [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU Mel Gorman
@ 2017-12-19 19:06   ` Peter Zijlstra
  2017-12-19 20:25     ` Mel Gorman
  2017-12-20  4:09     ` Mike Galbraith
  0 siblings, 2 replies; 14+ messages in thread
From: Peter Zijlstra @ 2017-12-19 19:06 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Ingo Molnar, Matt Fleming, LKML

On Mon, Dec 18, 2017 at 09:43:26AM +0000, Mel Gorman wrote:
> The sync wakeup logic in wake_affine_idle deserves a short description.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 392e08b364bd..95b1145bc38d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5737,6 +5737,11 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>  static int
>  wake_affine_sync(int this_cpu, int sync)
>  {
> +	/*
> +	 * Consider stacking tasks if it's a sync wakeup and there is only
> +	 * one task on the runqueue. sync wakesups are expected to sleep
> +	 * either immediately or shortly after the wakeup.
> +	 */
>  	if (sync && cpu_rq(this_cpu)->nr_running == 1)
>  		return this_cpu;
>  

So I don't think this one is over the top -- it went missing from the
last posting, but I agree with Mike that 4/4 was somewhat dodgy.

Our SYNC hint does promise the caller will go away 'soon', although I'm
not sure how many of the current users actually honor that.

In any case, picked up the one new patch, thanks for the giant changelog
;-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU
  2017-12-19 19:06   ` Peter Zijlstra
@ 2017-12-19 20:25     ` Mel Gorman
  2017-12-20  4:09     ` Mike Galbraith
  1 sibling, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2017-12-19 20:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Matt Fleming, LKML

On Tue, Dec 19, 2017 at 08:06:44PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 18, 2017 at 09:43:26AM +0000, Mel Gorman wrote:
> > The sync wakeup logic in wake_affine_idle deserves a short description.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  kernel/sched/fair.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 392e08b364bd..95b1145bc38d 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5737,6 +5737,11 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
> >  static int
> >  wake_affine_sync(int this_cpu, int sync)
> >  {
> > +	/*
> > +	 * Consider stacking tasks if it's a sync wakeup and there is only
> > +	 * one task on the runqueue. sync wakesups are expected to sleep
> > +	 * either immediately or shortly after the wakeup.
> > +	 */
> >  	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> >  		return this_cpu;
> >  
> 
> So I don't think this one is over the top -- it went missing from the
> last posting, but I agree with Mike that 4/4 was somewhat dodgy.
> 

I dropped it because I wasn't altering what sync wakeup means any more
and the comment was not that insightful. I've no objection to it being
picked up of course.

> Our SYNC hint does promise the caller will go away 'soon', although I'm
> not sure how many of the current users actually honor that.
> 

How soon matters a little too. I think pipe goes asleep immediately, exit
definitely does.  Networking appears to be soon enough from what I can tell.
I don't think any of the current callers of wake_up_interruptible_sync_poll
are problematic at least.

> In any case, picked up the one new patch, thanks for the giant changelog
> ;-)

Thanks!

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU
  2017-12-19 19:06   ` Peter Zijlstra
  2017-12-19 20:25     ` Mel Gorman
@ 2017-12-20  4:09     ` Mike Galbraith
  2017-12-20  4:21       ` Mike Galbraith
  1 sibling, 1 reply; 14+ messages in thread
From: Mike Galbraith @ 2017-12-20  4:09 UTC (permalink / raw)
  To: Peter Zijlstra, Mel Gorman; +Cc: Ingo Molnar, Matt Fleming, LKML

On Tue, 2017-12-19 at 20:06 +0100, Peter Zijlstra wrote:
> 
> Our SYNC hint does promise the caller will go away 'soon', although I'm
> not sure how many of the current users actually honor that.

The sync hint is not a lie, or even a damn lie, it's a statistic :) 
It's very useful for...

TCP_SENDFILE
homer:..debug/tracing # cat trace|grep netperf|grep wakes|wc -l
2417
homer:..debug/tracing # cat trace|grep netperf|grep schedules|wc -l
4
TCP_STREAM
homer:..debug/tracing # cat trace|grep netperf|grep wakes|wc -l
2506
homer:..debug/tracing # cat trace|grep netperf|grep schedules|wc -l
3
TCP_MAERTS
homer:..debug/tracing # cat trace|grep netperf|grep wakes|wc -l
2465
homer:..debug/tracing # cat trace|grep netperf|grep schedules|wc -l
2

...knowing that tasks are talking, but not the least bit useful for
scheduler decisions other than "pull to same planet".  Those are single
instances, all of which exceed 180% combined util, one at 100%.  Then
there are multi-wakers, tasks that would have scheduled if they hadn't
been given more work to do etc etc.  Nope, stacking based upon that
hint is most definitely not a good idea :)

	-Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU
  2017-12-20  4:09     ` Mike Galbraith
@ 2017-12-20  4:21       ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2017-12-20  4:21 UTC (permalink / raw)
  To: Peter Zijlstra, Mel Gorman; +Cc: Ingo Molnar, Matt Fleming, LKML

On Wed, 2017-12-20 at 05:09 +0100, Mike Galbraith wrote:
>  Nope, stacking based upon that
> hint is most definitely not a good idea :)

Except when heavily loaded.  The only thing worse for communicating
hogs being stacked is communicating hogs talking with another hog
between them.

	-Mike

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-12-20  4:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-18  9:43 [PATCH 0/4] Reduce scheduler migrations due to wake_affine Mel Gorman
2017-12-18  9:43 ` [PATCH 1/4] sched: Only immediately migrate tasks due to interrupts if prev and target CPUs share cache Mel Gorman
2017-12-18  9:43 ` [PATCH 2/4] sched: Allow a wakee to run on the prev_cpu if it is idle and cache-affine with the waker Mel Gorman
2017-12-18 10:26   ` Peter Zijlstra
2017-12-18 10:56     ` Mel Gorman
2017-12-18 10:59       ` Peter Zijlstra
2017-12-18  9:43 ` [PATCH 3/4] sched: Comment on why sync wakeups try to run on the current CPU Mel Gorman
2017-12-19 19:06   ` Peter Zijlstra
2017-12-19 20:25     ` Mel Gorman
2017-12-20  4:09     ` Mike Galbraith
2017-12-20  4:21       ` Mike Galbraith
2017-12-18  9:43 ` [PATCH 4/4] sched: Allow tasks to stack with a workqueue on the same CPU Mel Gorman
2017-12-18 10:44   ` Mike Galbraith
2017-12-18 11:25     ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.