All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Mike Galbraith <efault@gmx.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers
Date: Thu, 21 Oct 2021 15:56:02 +0100	[thread overview]
Message-ID: <20211021145603.5313-2-mgorman@techsingularity.net> (raw)
In-Reply-To: <20211021145603.5313-1-mgorman@techsingularity.net>

This patch mitigates a problem where wake_wide() allows a heavy waker
(e.g. X) to stack an excessive number of wakees on the same CPU. This
is due to the cpu_load check in wake_affine_weight. As noted by the
original patch author (Mike Galbraith)[1];

	Between load updates, X, or any other waker of many, can stack
	wakees to a ludicrous depth.  Tracing kbuild vs firefox playing a
	youtube clip, I watched X stack 20 of the zillion firefox minions
	while their previous CPUs all had 1 lousy task running but a
	cpu_load() higher than the cpu_load() of X's CPU.  Most of those
	prev_cpus were where X had left them when it migrated. Each and
	every crazy depth migration was wake_affine_weight() deciding we
	should pull.

Parahrasing Mike's test results from the patch.

	With make -j8 running along with firefox with two tabs, one
	containing youtube's suggestions of stuff, the other a running
	clip, if the idle tab in focus, and don't drive mouse around,
	flips decay enough for wake_wide() to lose interest, but just
	wiggle the mouse, and it starts waking wide. Focus on the running
	clip, and it continuously wakes wide.  

The end result is that heavy wakers are less likely to stack tasks and,
depending on the workload, reduce migrations.

From additional tests on various servers, the impact is machine dependant
but generally this patch improves the situation.

hackbench-process-pipes
                          5.15.0-rc3             5.15.0-rc3
                             vanilla  sched-wakeeflips-v1r1
Amean     1        0.3667 (   0.00%)      0.3890 (  -6.09%)
Amean     4        0.5343 (   0.00%)      0.5217 (   2.37%)
Amean     7        0.5300 (   0.00%)      0.5387 (  -1.64%)
Amean     12       0.5737 (   0.00%)      0.5443 (   5.11%)
Amean     21       0.6727 (   0.00%)      0.6487 (   3.57%)
Amean     30       0.8583 (   0.00%)      0.8033 (   6.41%)
Amean     48       1.3977 (   0.00%)      1.2400 *  11.28%*
Amean     79       1.9790 (   0.00%)      1.8200 *   8.03%*
Amean     110      2.8020 (   0.00%)      2.5820 *   7.85%*
Amean     141      3.6683 (   0.00%)      3.2203 *  12.21%*
Amean     172      4.6687 (   0.00%)      3.8200 *  18.18%*
Amean     203      5.2183 (   0.00%)      4.3357 *  16.91%*
Amean     234      6.1077 (   0.00%)      4.8047 *  21.33%*
Amean     265      7.1313 (   0.00%)      5.1243 *  28.14%*
Amean     296      7.7557 (   0.00%)      5.5940 *  27.87%*

While different machines showed different results, in general
there were much less CPU migrations of tasks

tbench4
                           5.15.0-rc3             5.15.0-rc3
                              vanilla  sched-wakeeflips-v1r1
Hmean     1         824.05 (   0.00%)      802.56 *  -2.61%*
Hmean     2        1578.49 (   0.00%)     1645.11 *   4.22%*
Hmean     4        2959.08 (   0.00%)     2984.75 *   0.87%*
Hmean     8        5080.09 (   0.00%)     5173.35 *   1.84%*
Hmean     16       8276.02 (   0.00%)     9327.17 *  12.70%*
Hmean     32      15501.61 (   0.00%)    15925.55 *   2.73%*
Hmean     64      27313.67 (   0.00%)    24107.81 * -11.74%*
Hmean     128     32928.19 (   0.00%)    36261.75 *  10.12%*
Hmean     256     35434.73 (   0.00%)    38670.61 *   9.13%*
Hmean     512     50098.34 (   0.00%)    53243.75 *   6.28%*
Hmean     1024    69503.69 (   0.00%)    67425.26 *  -2.99%*

Bit of a mixed bag but wins more than it loses.

A new workload was added that runs a kernel build in the background
-jNR_CPUS while NR_CPUS pairs of tasks run Netperf TCP_RR. The
intent is to see if heavy background tasks disrupt ligher tasks

multi subtest kernbench
                               5.15.0-rc3             5.15.0-rc3
                                  vanilla  sched-wakeeflips-v1r1
Min       elsp-256       20.80 (   0.00%)       14.89 (  28.41%)
Amean     elsp-256       24.08 (   0.00%)       20.94 (  13.05%)
Stddev    elsp-256        3.32 (   0.00%)        4.68 ( -41.16%)
CoeffVar  elsp-256       13.78 (   0.00%)       22.36 ( -62.33%)
Max       elsp-256       29.11 (   0.00%)       26.49 (   9.00%)

multi subtest netperf-tcp-rr
                        5.15.0-rc3             5.15.0-rc3
                           vanilla  sched-wakeeflips-v1r1
Min       1    48286.26 (   0.00%)    49101.48 (   1.69%)
Hmean     1    62894.82 (   0.00%)    68963.51 *   9.65%*
Stddev    1     7600.56 (   0.00%)     8804.82 ( -15.84%)
Max       1    78975.16 (   0.00%)    87124.67 (  10.32%)

The variability is higher as a result of the patch but both workloads
experienced improved performance.

[1] https://lore.kernel.org/r/02c977d239c312de5e15c77803118dcf1e11f216.camel@gmx.de

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff69f245b939..d00af3b97d8f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5865,6 +5865,14 @@ static void record_wakee(struct task_struct *p)
 	}
 
 	if (current->last_wakee != p) {
+		int min = __this_cpu_read(sd_llc_size) << 1;
+		/*
+		 * Couple the wakee flips to the waker for the case where it
+		 * doesn't accrue flips, taking care to not push the wakee
+		 * high enough that the wake_wide() heuristic fails.
+		 */
+		if (current->wakee_flips > p->wakee_flips * min)
+			p->wakee_flips++;
 		current->last_wakee = p;
 		current->wakee_flips++;
 	}
@@ -5895,7 +5903,7 @@ static int wake_wide(struct task_struct *p)
 
 	if (master < slave)
 		swap(master, slave);
-	if (slave < factor || master < slave * factor)
+	if ((slave < factor && master < (factor>>1)*factor) || master < slave * factor)
 		return 0;
 	return 1;
 }
-- 
2.31.1


  reply	other threads:[~2021-10-21 14:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 14:56 [PATCH 0/2] Reduce stacking and overscheduling Mel Gorman
2021-10-21 14:56 ` Mel Gorman [this message]
2021-10-22 10:26   ` [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers Mike Galbraith
2021-10-22 11:05     ` Mel Gorman
2021-10-22 12:00       ` Mike Galbraith
2021-10-25  6:35       ` Mike Galbraith
2021-10-26  8:18         ` Mel Gorman
2021-10-26 10:15           ` Mike Galbraith
2021-10-26 10:41             ` Mike Galbraith
2021-10-26 11:57               ` Mel Gorman
2021-10-26 12:13                 ` Mike Galbraith
2021-10-27  2:09                   ` Mike Galbraith
2021-10-27  9:00                     ` Mel Gorman
2021-10-27 10:18                       ` Mike Galbraith
2021-11-09 11:56   ` Peter Zijlstra
2021-11-09 12:55     ` Mike Galbraith
2021-10-21 14:56 ` [PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity Mel Gorman
2021-10-28  9:48 [PATCH v4 0/2] Reduce stacking and overscheduling Mel Gorman
2021-10-28  9:48 ` [PATCH 1/2] sched/fair: Couple wakee flips with heavy wakers Mel Gorman
2021-10-28 16:19   ` Tao Zhou
2021-10-29  8:42     ` Mel Gorman
2021-11-10  9:53       ` Tao Zhou
2021-11-10 15:40         ` Mike Galbraith
2021-10-29 15:17   ` Vincent Guittot
2021-10-30  3:11     ` Mike Galbraith
2021-10-30  4:12       ` Mike Galbraith
2021-11-01  8:56     ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211021145603.5313-2-mgorman@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.