linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Reduce migrations due to load imbalance and process exits
@ 2018-02-12 14:58 Mel Gorman
  2018-02-12 14:58 ` [PATCH 1/4] sched/fair: Avoid an unnecessary lookup of current CPU ID during wake_affine Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML, Mel Gorman

The following series was motivated by the fact that higher migrations were
observed than expected. The first two patches are minor cleanups. The third
patch avoids migrations on wakeup when two CPUs are equally loaded which
is particularly important in the case where the measured load is 0. The last
patch avoids cross-node migrations of a parent process when the child exits.
Overall, there is a minor boost in performance on some workloads with the
details included in the changelogs.

The series was tested based on Linus's tree during the 4.16 merge window
as of Feb 8th but applies cleanly to 4.16-rc1.

 kernel/sched/fair.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

-- 
2.15.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] sched/fair: Avoid an unnecessary lookup of current CPU ID during wake_affine
  2018-02-12 14:58 [PATCH 0/4] Reduce migrations due to load imbalance and process exits Mel Gorman
@ 2018-02-12 14:58 ` Mel Gorman
  2018-02-12 14:58 ` [PATCH 2/4] sched/fair: Defer calculation of prev_eff_load in wake_affine until needed Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML, Mel Gorman

The only caller of wake_affine() knows the CPU ID. Pass it in instead of
rechecking it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eb3ffc9be84..4b3e86eb2222 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5751,9 +5751,8 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 }
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p,
-		       int prev_cpu, int sync)
+		       int this_cpu, int prev_cpu, int sync)
 {
-	int this_cpu = smp_processor_id();
 	int target = nr_cpumask_bits;
 
 	if (sched_feat(WA_IDLE))
@@ -6376,7 +6375,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 		if (cpu == prev_cpu)
 			goto pick_cpu;
 
-		new_cpu = wake_affine(affine_sd, p, prev_cpu, sync);
+		new_cpu = wake_affine(affine_sd, p, cpu, prev_cpu, sync);
 	}
 
 	if (sd && !(sd_flag & SD_BALANCE_FORK)) {
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] sched/fair: Defer calculation of prev_eff_load in wake_affine until needed
  2018-02-12 14:58 [PATCH 0/4] Reduce migrations due to load imbalance and process exits Mel Gorman
  2018-02-12 14:58 ` [PATCH 1/4] sched/fair: Avoid an unnecessary lookup of current CPU ID during wake_affine Mel Gorman
@ 2018-02-12 14:58 ` Mel Gorman
  2018-02-12 14:58 ` [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal Mel Gorman
  2018-02-12 14:58 ` [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit Mel Gorman
  3 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML, Mel Gorman

On sync wakeups, the previous CPU effective load may not be used so delay
the calculation until it's needed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4b3e86eb2222..c1091cb023c4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5724,7 +5724,6 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 	unsigned long task_load;
 
 	this_eff_load = target_load(this_cpu, sd->wake_idx);
-	prev_eff_load = source_load(prev_cpu, sd->wake_idx);
 
 	if (sync) {
 		unsigned long current_load = task_h_load(current);
@@ -5742,6 +5741,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		this_eff_load *= 100;
 	this_eff_load *= capacity_of(prev_cpu);
 
+	prev_eff_load = source_load(prev_cpu, sd->wake_idx);
 	prev_eff_load -= task_load;
 	if (sched_feat(WA_BIAS))
 		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 14:58 [PATCH 0/4] Reduce migrations due to load imbalance and process exits Mel Gorman
  2018-02-12 14:58 ` [PATCH 1/4] sched/fair: Avoid an unnecessary lookup of current CPU ID during wake_affine Mel Gorman
  2018-02-12 14:58 ` [PATCH 2/4] sched/fair: Defer calculation of prev_eff_load in wake_affine until needed Mel Gorman
@ 2018-02-12 14:58 ` Mel Gorman
  2018-02-12 17:29   ` Peter Zijlstra
  2018-02-12 14:58 ` [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit Mel Gorman
  3 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML, Mel Gorman

wake_affine_weight() will consider migrating a task to, or near, the current
CPU if there is a load imbalance. If the CPUs share LLC then either CPU
is valid as a search-for-idle-sibling target and equally appropriate for
stacking two tasks on one CPU if an idle sibling is unavailable. If they do
not share cache then a cross-node migration potentially impacts locality
so while they are equal from a CPU capacity point of view, they are not
equal in terms of memory locality. In either case, it's more appropriate
to migrate only if there is a difference in their effective load.

This patch modifies wake_affine_weight to only consider migrating a task
if there is a load imbalance for normal wakeups but will allow potential
stacking if the loads are equal and it's a sync wakeup.

For the most part, the different in performance is marginal. For example,
on a 4-socket server running netperf UDP_STREAM on localhost the differences
are as follows

                                     4.15.0                 4.15.0
                                      16rc0          noequal-v1r23
Hmean     send-64         355.47 (   0.00%)      349.50 (  -1.68%)
Hmean     send-128        697.98 (   0.00%)      693.35 (  -0.66%)
Hmean     send-256       1328.02 (   0.00%)     1318.77 (  -0.70%)
Hmean     send-1024      5051.83 (   0.00%)     5051.11 (  -0.01%)
Hmean     send-2048      9637.02 (   0.00%)     9601.34 (  -0.37%)
Hmean     send-3312     14355.37 (   0.00%)    14414.51 (   0.41%)
Hmean     send-4096     16464.97 (   0.00%)    16301.37 (  -0.99%)
Hmean     send-8192     26722.42 (   0.00%)    26428.95 (  -1.10%)
Hmean     send-16384    38137.81 (   0.00%)    38046.11 (  -0.24%)
Hmean     recv-64         355.47 (   0.00%)      349.50 (  -1.68%)
Hmean     recv-128        697.98 (   0.00%)      693.35 (  -0.66%)
Hmean     recv-256       1328.02 (   0.00%)     1318.77 (  -0.70%)
Hmean     recv-1024      5051.83 (   0.00%)     5051.11 (  -0.01%)
Hmean     recv-2048      9636.95 (   0.00%)     9601.30 (  -0.37%)
Hmean     recv-3312     14355.32 (   0.00%)    14414.48 (   0.41%)
Hmean     recv-4096     16464.74 (   0.00%)    16301.16 (  -0.99%)
Hmean     recv-8192     26721.63 (   0.00%)    26428.17 (  -1.10%)
Hmean     recv-16384    38136.00 (   0.00%)    38044.88 (  -0.24%)
Stddev    send-64           7.30 (   0.00%)        4.75 (  34.96%)
Stddev    send-128         15.15 (   0.00%)       22.38 ( -47.66%)
Stddev    send-256         13.99 (   0.00%)       19.14 ( -36.81%)
Stddev    send-1024       105.73 (   0.00%)       67.38 (  36.27%)
Stddev    send-2048       294.57 (   0.00%)      223.88 (  24.00%)
Stddev    send-3312       302.28 (   0.00%)      271.74 (  10.10%)
Stddev    send-4096       195.92 (   0.00%)      121.10 (  38.19%)
Stddev    send-8192       399.71 (   0.00%)      563.77 ( -41.04%)
Stddev    send-16384     1163.47 (   0.00%)     1103.68 (   5.14%)
Stddev    recv-64           7.30 (   0.00%)        4.75 (  34.96%)
Stddev    recv-128         15.15 (   0.00%)       22.38 ( -47.66%)
Stddev    recv-256         13.99 (   0.00%)       19.14 ( -36.81%)
Stddev    recv-1024       105.73 (   0.00%)       67.38 (  36.27%)
Stddev    recv-2048       294.59 (   0.00%)      223.89 (  24.00%)
Stddev    recv-3312       302.24 (   0.00%)      271.75 (  10.09%)
Stddev    recv-4096       196.03 (   0.00%)      121.14 (  38.20%)
Stddev    recv-8192       399.86 (   0.00%)      563.65 ( -40.96%)
Stddev    recv-16384     1163.79 (   0.00%)     1103.86 (   5.15%)

The difference in overall performance is marginal but note that most
measurements are less variable. There were similar observations for other
netperf comparisons. hackbench with sockets or threads with processes or
threads showed minor difference with some reduction of migration. tbench
showed only marginal differences that were within the noise. dbench,
regardless of filesystem, showed minor differences all of which are
within noise. Multiple machines, both UMA and NUMA were tested without
any regressions showing up.

The biggest risk with a patch like this is affecting wakeup latencies.
However, the schbench load from Facebook which is very sensitive to wakeup
latency showed a mixed result with mostly improvements in wakeup latency

                                     4.15.0                 4.15.0
                                      16rc0          noequal-v1r23
Lat 50.00th-qrtle-1        38.00 (   0.00%)       38.00 (   0.00%)
Lat 75.00th-qrtle-1        49.00 (   0.00%)       41.00 (  16.33%)
Lat 90.00th-qrtle-1        52.00 (   0.00%)       50.00 (   3.85%)
Lat 95.00th-qrtle-1        54.00 (   0.00%)       51.00 (   5.56%)
Lat 99.00th-qrtle-1        63.00 (   0.00%)       60.00 (   4.76%)
Lat 99.50th-qrtle-1        66.00 (   0.00%)       61.00 (   7.58%)
Lat 99.90th-qrtle-1        78.00 (   0.00%)       65.00 (  16.67%)
Lat 50.00th-qrtle-2        38.00 (   0.00%)       38.00 (   0.00%)
Lat 75.00th-qrtle-2        42.00 (   0.00%)       43.00 (  -2.38%)
Lat 90.00th-qrtle-2        46.00 (   0.00%)       48.00 (  -4.35%)
Lat 95.00th-qrtle-2        49.00 (   0.00%)       50.00 (  -2.04%)
Lat 99.00th-qrtle-2        55.00 (   0.00%)       57.00 (  -3.64%)
Lat 99.50th-qrtle-2        58.00 (   0.00%)       60.00 (  -3.45%)
Lat 99.90th-qrtle-2        65.00 (   0.00%)       68.00 (  -4.62%)
Lat 50.00th-qrtle-4        41.00 (   0.00%)       41.00 (   0.00%)
Lat 75.00th-qrtle-4        45.00 (   0.00%)       46.00 (  -2.22%)
Lat 90.00th-qrtle-4        50.00 (   0.00%)       50.00 (   0.00%)
Lat 95.00th-qrtle-4        54.00 (   0.00%)       53.00 (   1.85%)
Lat 99.00th-qrtle-4        61.00 (   0.00%)       61.00 (   0.00%)
Lat 99.50th-qrtle-4        65.00 (   0.00%)       64.00 (   1.54%)
Lat 99.90th-qrtle-4        76.00 (   0.00%)       82.00 (  -7.89%)
Lat 50.00th-qrtle-8        48.00 (   0.00%)       46.00 (   4.17%)
Lat 75.00th-qrtle-8        55.00 (   0.00%)       54.00 (   1.82%)
Lat 90.00th-qrtle-8        60.00 (   0.00%)       59.00 (   1.67%)
Lat 95.00th-qrtle-8        63.00 (   0.00%)       63.00 (   0.00%)
Lat 99.00th-qrtle-8        71.00 (   0.00%)       69.00 (   2.82%)
Lat 99.50th-qrtle-8        74.00 (   0.00%)       73.00 (   1.35%)
Lat 99.90th-qrtle-8        98.00 (   0.00%)       90.00 (   8.16%)
Lat 50.00th-qrtle-16       56.00 (   0.00%)       55.00 (   1.79%)
Lat 75.00th-qrtle-16       68.00 (   0.00%)       67.00 (   1.47%)
Lat 90.00th-qrtle-16       77.00 (   0.00%)       78.00 (  -1.30%)
Lat 95.00th-qrtle-16       82.00 (   0.00%)       84.00 (  -2.44%)
Lat 99.00th-qrtle-16       90.00 (   0.00%)       93.00 (  -3.33%)
Lat 99.50th-qrtle-16       93.00 (   0.00%)       97.00 (  -4.30%)
Lat 99.90th-qrtle-16      110.00 (   0.00%)      110.00 (   0.00%)
Lat 50.00th-qrtle-32       68.00 (   0.00%)       62.00 (   8.82%)
Lat 75.00th-qrtle-32       90.00 (   0.00%)       83.00 (   7.78%)
Lat 90.00th-qrtle-32      110.00 (   0.00%)      100.00 (   9.09%)
Lat 95.00th-qrtle-32      122.00 (   0.00%)      111.00 (   9.02%)
Lat 99.00th-qrtle-32      145.00 (   0.00%)      133.00 (   8.28%)
Lat 99.50th-qrtle-32      154.00 (   0.00%)      143.00 (   7.14%)
Lat 99.90th-qrtle-32     2316.00 (   0.00%)      515.00 (  77.76%)
Lat 50.00th-qrtle-35       69.00 (   0.00%)       72.00 (  -4.35%)
Lat 75.00th-qrtle-35       92.00 (   0.00%)       95.00 (  -3.26%)
Lat 90.00th-qrtle-35      111.00 (   0.00%)      114.00 (  -2.70%)
Lat 95.00th-qrtle-35      122.00 (   0.00%)      124.00 (  -1.64%)
Lat 99.00th-qrtle-35      142.00 (   0.00%)      144.00 (  -1.41%)
Lat 99.50th-qrtle-35      150.00 (   0.00%)      154.00 (  -2.67%)
Lat 99.90th-qrtle-35     6104.00 (   0.00%)     5640.00 (   7.60%)

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c1091cb023c4..28c8d9c91955 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5747,7 +5747,16 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
 	prev_eff_load *= capacity_of(this_cpu);
 
-	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
+	/*
+	 * If sync, adjust the weight of prev_eff_load such that if
+	 * prev_eff == this_eff that select_idle_sibling will consider
+	 * stacking the wakee on top of the waker if no other CPU is
+	 * idle.
+	 */
+	if (sync)
+		prev_eff_load += 1;
+
+	return this_eff_load < prev_eff_load ? this_cpu : nr_cpumask_bits;
 }
 
 static int wake_affine(struct sched_domain *sd, struct task_struct *p,
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 14:58 [PATCH 0/4] Reduce migrations due to load imbalance and process exits Mel Gorman
                   ` (2 preceding siblings ...)
  2018-02-12 14:58 ` [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal Mel Gorman
@ 2018-02-12 14:58 ` Mel Gorman
  2018-02-12 17:31   ` Peter Zijlstra
  2018-02-12 20:01   ` Andreas Mohr
  3 siblings, 2 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 14:58 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML, Mel Gorman

When a task exits, it notifies the parent that it has exited. This is a sync
wakup and the exiting task may pull the parent towards the wakers CPU. For
even simple workloads like using a shell, it was observed that the shell
is pulled across nodes by exiting processes. This is daft as the parent
may be long-lived and properly placed. This patch special cases a sync
wakeup on exit to avoid pulling tasks across nodes. Testing on a range
of workloads and machines showed very little differences in performance
although there was a small 3% boost on some machines running a shellscript
intensive workload (git regression test suite).

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28c8d9c91955..50442697b455 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
 	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
 		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
 
-	if (sync && cpu_rq(this_cpu)->nr_running == 1)
+	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
+		/* Avoid tasks exiting pulling parents to new nodes */
+		if ((current->flags & PF_EXITING) &&
+		    !cpus_share_cache(this_cpu, prev_cpu))
+			return prev_cpu;
+
 		return this_cpu;
+	}
 
 	return nr_cpumask_bits;
 }
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 14:58 ` [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal Mel Gorman
@ 2018-02-12 17:29   ` Peter Zijlstra
  2018-02-12 17:43     ` Mel Gorman
  2018-02-12 19:14     ` Mike Galbraith
  0 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-12 17:29 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 02:58:56PM +0000, Mel Gorman wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c1091cb023c4..28c8d9c91955 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5747,7 +5747,16 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
>  		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
>  	prev_eff_load *= capacity_of(this_cpu);
>  
> -	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
> +	/*
> +	 * If sync, adjust the weight of prev_eff_load such that if
> +	 * prev_eff == this_eff that select_idle_sibling will consider
> +	 * stacking the wakee on top of the waker if no other CPU is
> +	 * idle.
> +	 */
> +	if (sync)
> +		prev_eff_load += 1;

So where we had <= and would consistently favour pulling the task to the
waking CPU when all else what equal, you now switch to <, such that when
things are equal we do not pull.

That makes sense I suppose.

Except for sync wakeups, where you say, if everything else is equal,
pull, which also makes sense, because sync says 'current' promises to go
away.

OK.

> +
> +	return this_eff_load < prev_eff_load ? this_cpu : nr_cpumask_bits;
>  }
>  
>  static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> -- 
> 2.15.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 14:58 ` [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit Mel Gorman
@ 2018-02-12 17:31   ` Peter Zijlstra
  2018-02-12 17:47     ` Mel Gorman
  2018-02-12 20:01   ` Andreas Mohr
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-12 17:31 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 02:58:57PM +0000, Mel Gorman wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 28c8d9c91955..50442697b455 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>  	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
>  		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
>  
> -	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> +	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
> +		/* Avoid tasks exiting pulling parents to new nodes */
> +		if ((current->flags & PF_EXITING) &&
> +		    !cpus_share_cache(this_cpu, prev_cpu))
> +			return prev_cpu;
> +

Cute, but should we not kill @sync right at the source in this case?

Something a little like this?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5eb3ffc9be84..568ea4ce5b36 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6342,7 +6342,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	int cpu = smp_processor_id();
 	int new_cpu = prev_cpu;
 	int want_affine = 0;
-	int sync = wake_flags & WF_SYNC;
+	int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
 
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 17:29   ` Peter Zijlstra
@ 2018-02-12 17:43     ` Mel Gorman
  2018-02-12 19:14     ` Mike Galbraith
  1 sibling, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 17:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 06:29:26PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 02:58:56PM +0000, Mel Gorman wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index c1091cb023c4..28c8d9c91955 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5747,7 +5747,16 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
> >  		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
> >  	prev_eff_load *= capacity_of(this_cpu);
> >  
> > -	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
> > +	/*
> > +	 * If sync, adjust the weight of prev_eff_load such that if
> > +	 * prev_eff == this_eff that select_idle_sibling will consider
> > +	 * stacking the wakee on top of the waker if no other CPU is
> > +	 * idle.
> > +	 */
> > +	if (sync)
> > +		prev_eff_load += 1;
> 
> So where we had <= and would consistently favour pulling the task to the
> waking CPU when all else what equal, you now switch to <, such that when
> things are equal we do not pull.
> 
> That makes sense I suppose.
> 

Yep, with the addenum that when CPU load is equal, it does not
necessarily mean they are equal in terms of memory locality. It might
make more sense to use <= if there were more cases where we stacked
tasks on the same CPU but we avoid that as much as possible for good
reasons.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 17:31   ` Peter Zijlstra
@ 2018-02-12 17:47     ` Mel Gorman
  2018-02-12 18:01       ` Mel Gorman
  0 siblings, 1 reply; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 17:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 06:31:47PM +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 02:58:57PM +0000, Mel Gorman wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 28c8d9c91955..50442697b455 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
> >  	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> >  		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> >  
> > -	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> > +	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
> > +		/* Avoid tasks exiting pulling parents to new nodes */
> > +		if ((current->flags & PF_EXITING) &&
> > +		    !cpus_share_cache(this_cpu, prev_cpu))
> > +			return prev_cpu;
> > +
> 
> Cute, but should we not kill @sync right at the source in this case?
> 
> Something a little like this?
> 

That works too but note that it has a slightly disadvantage in that we
add weight to the select_task_rq_fair fast path for all wakeups, not just
those that are affine. It's also not 100% functionally equivalent as I was
still allowing parents to migrate to the CPU if it shared cache although I
expect any advantage there is marginal. I can replace my patch with yours
if you really prefer it as the overhead in the fast path should be marginal.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 17:47     ` Mel Gorman
@ 2018-02-12 18:01       ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 18:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 05:47:29PM +0000, Mel Gorman wrote:
> On Mon, Feb 12, 2018 at 06:31:47PM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 12, 2018 at 02:58:57PM +0000, Mel Gorman wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 28c8d9c91955..50442697b455 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
> > >  	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> > >  		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> > >  
> > > -	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> > > +	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
> > > +		/* Avoid tasks exiting pulling parents to new nodes */
> > > +		if ((current->flags & PF_EXITING) &&
> > > +		    !cpus_share_cache(this_cpu, prev_cpu))
> > > +			return prev_cpu;
> > > +
> > 
> > Cute, but should we not kill @sync right at the source in this case?
> > 
> > Something a little like this?
> > 
> 
> That works too but note that it has a slightly disadvantage in that we
> add weight to the select_task_rq_fair fast path for all wakeups, not just
> those that are affine. It's also not 100% functionally equivalent as I was
> still allowing parents to migrate to the CPU if it shared cache although I
> expect any advantage there is marginal. I can replace my patch with yours
> if you really prefer it as the overhead in the fast path should be marginal.
> 

Clearly my brains leaked out my ears at the end of a work day. The overhead
only applies to sync wakeups and is a very small amount of overhead in the
case where this_cpu or prev_cpu was idle so I'm happy with either version.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 17:29   ` Peter Zijlstra
  2018-02-12 17:43     ` Mel Gorman
@ 2018-02-12 19:14     ` Mike Galbraith
  2018-02-12 19:57       ` Peter Zijlstra
  2018-02-12 20:20       ` Mel Gorman
  1 sibling, 2 replies; 15+ messages in thread
From: Mike Galbraith @ 2018-02-12 19:14 UTC (permalink / raw)
  To: Peter Zijlstra, Mel Gorman; +Cc: Matt Fleming, LKML

On Mon, 2018-02-12 at 18:29 +0100, Peter Zijlstra wrote:
> On Mon, Feb 12, 2018 at 02:58:56PM +0000, Mel Gorman wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index c1091cb023c4..28c8d9c91955 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5747,7 +5747,16 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
> >  		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
> >  	prev_eff_load *= capacity_of(this_cpu);
> >  
> > -	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
> > +	/*
> > +	 * If sync, adjust the weight of prev_eff_load such that if
> > +	 * prev_eff == this_eff that select_idle_sibling will consider
> > +	 * stacking the wakee on top of the waker if no other CPU is
> > +	 * idle.
> > +	 */
> > +	if (sync)
> > +		prev_eff_load += 1;
> 
> So where we had <= and would consistently favour pulling the task to the
> waking CPU when all else what equal, you now switch to <, such that when
> things are equal we do not pull.
> 
> That makes sense I suppose.
> 
> Except for sync wakeups, where you say, if everything else is equal,
> pull, which also makes sense, because sync says 'current' promises to go
> away.
> 
> OK.

Tasks tend to not honor that promise.. a lot.  Even if the sync hint
were a golden promise, it wouldn't be bullet proof: migrating a compute
hog based on a single "phoned home" wakeup remains a bad idea whether
schedule() is called immediately after the wakeup or not.  It's a pain,
only useful informational content is "this is a communication wakeup".

	-Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 19:14     ` Mike Galbraith
@ 2018-02-12 19:57       ` Peter Zijlstra
  2018-02-12 20:20       ` Mel Gorman
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2018-02-12 19:57 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Mel Gorman, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 08:14:49PM +0100, Mike Galbraith wrote:
> On Mon, 2018-02-12 at 18:29 +0100, Peter Zijlstra wrote:

> > Except for sync wakeups, where you say, if everything else is equal,
> > pull, which also makes sense, because sync says 'current' promises to go
> > away.
> 
> Tasks tend to not honor that promise.. a lot.

Yes I know.. still its something and its a very small bias in this case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 14:58 ` [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit Mel Gorman
  2018-02-12 17:31   ` Peter Zijlstra
@ 2018-02-12 20:01   ` Andreas Mohr
  2018-02-12 20:09     ` Mel Gorman
  1 sibling, 1 reply; 15+ messages in thread
From: Andreas Mohr @ 2018-02-12 20:01 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Peter Zijlstra, Mike Galbraith, Matt Fleming, LKML

Hello,

On Mon, Feb 12, 2018 at 02:58:57PM +0000, Mel Gorman wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 28c8d9c91955..50442697b455 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
>  	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
>  		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
>  
> -	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> +	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
> +		/* Avoid tasks exiting pulling parents to new nodes */

Semi-confusing wording (WTH are "pulling parents"? :) - possibly better variant?:
Avoid having exiting tasks pull their parents to new nodes


Good catch, it seems.

HTH,

Andreas Mohr

-- 
GNU/Linux. It's not the software that's free, it's you.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit
  2018-02-12 20:01   ` Andreas Mohr
@ 2018-02-12 20:09     ` Mel Gorman
  0 siblings, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 20:09 UTC (permalink / raw)
  To: Andreas Mohr; +Cc: Peter Zijlstra, Mike Galbraith, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 09:01:27PM +0100, Andreas Mohr wrote:
> Hello,
> 
> On Mon, Feb 12, 2018 at 02:58:57PM +0000, Mel Gorman wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 28c8d9c91955..50442697b455 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5710,8 +5710,14 @@ wake_affine_idle(int this_cpu, int prev_cpu, int sync)
> >  	if (idle_cpu(this_cpu) && cpus_share_cache(this_cpu, prev_cpu))
> >  		return idle_cpu(prev_cpu) ? prev_cpu : this_cpu;
> >  
> > -	if (sync && cpu_rq(this_cpu)->nr_running == 1)
> > +	if (sync && cpu_rq(this_cpu)->nr_running == 1) {
> > +		/* Avoid tasks exiting pulling parents to new nodes */
> 
> Semi-confusing wording (WTH are "pulling parents"? :) - possibly better variant?:
> Avoid having exiting tasks pull their parents to new nodes

That is better wording. If I don't replace the patch with Peter's
version, I'll update it.

> Good catch, it seems.
> 

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal
  2018-02-12 19:14     ` Mike Galbraith
  2018-02-12 19:57       ` Peter Zijlstra
@ 2018-02-12 20:20       ` Mel Gorman
  1 sibling, 0 replies; 15+ messages in thread
From: Mel Gorman @ 2018-02-12 20:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Peter Zijlstra, Matt Fleming, LKML

On Mon, Feb 12, 2018 at 08:14:49PM +0100, Mike Galbraith wrote:
> On Mon, 2018-02-12 at 18:29 +0100, Peter Zijlstra wrote:
> > On Mon, Feb 12, 2018 at 02:58:56PM +0000, Mel Gorman wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index c1091cb023c4..28c8d9c91955 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5747,7 +5747,16 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
> > >  		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
> > >  	prev_eff_load *= capacity_of(this_cpu);
> > >  
> > > -	return this_eff_load <= prev_eff_load ? this_cpu : nr_cpumask_bits;
> > > +	/*
> > > +	 * If sync, adjust the weight of prev_eff_load such that if
> > > +	 * prev_eff == this_eff that select_idle_sibling will consider
> > > +	 * stacking the wakee on top of the waker if no other CPU is
> > > +	 * idle.
> > > +	 */
> > > +	if (sync)
> > > +		prev_eff_load += 1;
> > 
> > So where we had <= and would consistently favour pulling the task to the
> > waking CPU when all else what equal, you now switch to <, such that when
> > things are equal we do not pull.
> > 
> > That makes sense I suppose.
> > 
> > Except for sync wakeups, where you say, if everything else is equal,
> > pull, which also makes sense, because sync says 'current' promises to go
> > away.
> > 
> > OK.
> 
> Tasks tend to not honor that promise.. a lot.  Even if the sync hint
> were a golden promise, it wouldn't be bullet proof: migrating a compute
> hog based on a single "phoned home" wakeup remains a bad idea whether
> schedule() is called immediately after the wakeup or not.  It's a pain,
> only useful informational content is "this is a communication wakeup".
> 

Agreed and I'm aware of the hazard of sync wakeups being no guarantee
that the task will immediately sleep. If there is any delay at all then
stacking incurs a wakeup latency penalty but in many cases, it'll be an
idle sibling that is used. The sync hint does give a stronger hint that the
tasks are closely related though which is why I special-cased it slightly
and I feel it's justified. I think in all cases where it mattered, it was
due to pref_eff_load and this_eff_load begin equal to 0 when waking a task
via a socket.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-02-12 20:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 14:58 [PATCH 0/4] Reduce migrations due to load imbalance and process exits Mel Gorman
2018-02-12 14:58 ` [PATCH 1/4] sched/fair: Avoid an unnecessary lookup of current CPU ID during wake_affine Mel Gorman
2018-02-12 14:58 ` [PATCH 2/4] sched/fair: Defer calculation of prev_eff_load in wake_affine until needed Mel Gorman
2018-02-12 14:58 ` [PATCH 3/4] sched/fair: Do not migrate on wake_affine_weight if weights are equal Mel Gorman
2018-02-12 17:29   ` Peter Zijlstra
2018-02-12 17:43     ` Mel Gorman
2018-02-12 19:14     ` Mike Galbraith
2018-02-12 19:57       ` Peter Zijlstra
2018-02-12 20:20       ` Mel Gorman
2018-02-12 14:58 ` [PATCH 4/4] sched/fair: Do not migrate due to a sync wakeup on exit Mel Gorman
2018-02-12 17:31   ` Peter Zijlstra
2018-02-12 17:47     ` Mel Gorman
2018-02-12 18:01       ` Mel Gorman
2018-02-12 20:01   ` Andreas Mohr
2018-02-12 20:09     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).