linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Phil Auld <pauld@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Hillf Danton <hdanton@sina.com>, Parth Shah <parth@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance
Date: Thu, 31 Oct 2019 17:41:39 +0100	[thread overview]
Message-ID: <CAKfTPtCm5dM4G0D6jSenNQSbpnKkUpOev24Ms02aszPyxyfuuQ@mail.gmail.com> (raw)
In-Reply-To: <20191031135715.GA5738@pauld.bos.csb>

On Thu, 31 Oct 2019 at 14:57, Phil Auld <pauld@redhat.com> wrote:
>
> Hi Vincent,
>
> On Wed, Oct 30, 2019 at 06:25:49PM +0100 Vincent Guittot wrote:
> > On Wed, 30 Oct 2019 at 15:39, Phil Auld <pauld@redhat.com> wrote:
> > > > That fact that the 4 nodes works well but not the 8 nodes is a bit
> > > > surprising except if this means more NUMA level in the sched_domain
> > > > topology
> > > > Could you give us more details about the sched domain topology ?
> > > >
> > >
> > > The 8-node system has 5 sched domain levels.  The 4-node system only
> > > has 3.
> >
> > That's an interesting difference. and your additional tests on a 8
> > nodes with 3 level tends to confirm that the number of level make a
> > difference
> > I need to study a bit more how this can impact the spread of tasks
>
> So I think I understand what my numbers have been showing.
>
> I believe the numa balancing is causing problems.
>
> Here's numbers from the test on 5.4-rc3+ without your series:
>
> echo 1 >  /proc/sys/kernel/numa_balancing
> lu.C.x_156_GROUP_1  Average    10.87  0.00   0.00   11.49  36.69  34.26  30.59  32.10
> lu.C.x_156_GROUP_2  Average    20.15  16.32  9.49   24.91  21.07  20.93  21.63  21.50
> lu.C.x_156_GROUP_3  Average    21.27  17.23  11.84  21.80  20.91  20.68  21.11  21.16
> lu.C.x_156_GROUP_4  Average    19.44  6.53   8.71   19.72  22.95  23.16  28.85  26.64
> lu.C.x_156_GROUP_5  Average    20.59  6.20   11.32  14.63  28.73  30.36  22.20  21.98
> lu.C.x_156_NORMAL_1 Average    20.50  19.95  20.40  20.45  18.75  19.35  18.25  18.35
> lu.C.x_156_NORMAL_2 Average    17.15  19.04  18.42  18.69  21.35  21.42  20.00  19.92
> lu.C.x_156_NORMAL_3 Average    18.00  18.15  17.55  17.60  18.90  18.40  19.90  19.75
> lu.C.x_156_NORMAL_4 Average    20.53  20.05  20.21  19.11  19.00  19.47  19.37  18.26
> lu.C.x_156_NORMAL_5 Average    18.72  18.78  19.72  18.50  19.67  19.72  21.11  19.78
>
> ============156_GROUP========Mop/s===================================
> min     q1      median  q3      max
> 1564.63 3003.87 3928.23 5411.13 8386.66
> ============156_GROUP========time====================================
> min     q1      median  q3      max
> 243.12  376.82  519.06  678.79  1303.18
> ============156_NORMAL========Mop/s===================================
> min     q1      median  q3      max
> 13845.6 18013.8 18545.5 19359.9 19647.4
> ============156_NORMAL========time====================================
> min     q1      median  q3      max
> 103.78  105.32  109.95  113.19  147.27
>
> (This one above is especially bad... we don't usually see 0.00s, but overall it's
> basically on par. It's reflected in the spread of the results).
>
>
> echo 0 >  /proc/sys/kernel/numa_balancing
> lu.C.x_156_GROUP_1  Average    17.75  19.30  21.20  21.20  20.20  20.80  18.90  16.65
> lu.C.x_156_GROUP_2  Average    18.38  19.25  21.00  20.06  20.19  20.31  19.56  17.25
> lu.C.x_156_GROUP_3  Average    21.81  21.00  18.38  16.86  20.81  21.48  18.24  17.43
> lu.C.x_156_GROUP_4  Average    20.48  20.96  19.61  17.61  17.57  19.74  18.48  21.57
> lu.C.x_156_GROUP_5  Average    23.32  21.96  19.16  14.28  21.44  22.56  17.00  16.28
> lu.C.x_156_NORMAL_1 Average    19.50  19.83  19.58  19.25  19.58  19.42  19.42  19.42
> lu.C.x_156_NORMAL_2 Average    18.90  18.40  20.00  19.80  19.70  19.30  19.80  20.10
> lu.C.x_156_NORMAL_3 Average    19.45  19.09  19.91  20.09  19.45  18.73  19.45  19.82
> lu.C.x_156_NORMAL_4 Average    19.64  19.27  19.64  19.00  19.82  19.55  19.73  19.36
> lu.C.x_156_NORMAL_5 Average    18.75  19.42  20.08  19.67  18.75  19.50  19.92  19.92
>
> ============156_GROUP========Mop/s===================================
> min     q1      median  q3      max
> 14956.3 16346.5 17505.7 18440.6 22492.7
> ============156_GROUP========time====================================
> min     q1      median  q3      max
> 90.65   110.57  116.48  124.74  136.33
> ============156_NORMAL========Mop/s===================================
> min     q1      median  q3      max
> 29801.3 30739.2 31967.5 32151.3 34036
> ============156_NORMAL========time====================================
> min     q1      median  q3      max
> 59.91   63.42   63.78   66.33   68.42
>
>
> Note there is a significant improvement already. But we are seeing imbalance due to
> using weighted load and averages. In this case it's only 55% slowdown rather than
> the 5x. But the overall performance if the benchmark is also much better in both cases.
>
>
>
> Here's the same test, same system with the full series (lb_v4a as I've been calling it):
>
> echo 1 >  /proc/sys/kernel/numa_balancing
> lu.C.x_156_GROUP_1  Average    18.59  19.36  19.50  18.86  20.41  20.59  18.27  20.41
> lu.C.x_156_GROUP_2  Average    19.52  20.52  20.48  21.17  19.52  19.09  17.70  18.00
> lu.C.x_156_GROUP_3  Average    20.58  20.71  20.17  20.50  18.46  19.50  18.58  17.50
> lu.C.x_156_GROUP_4  Average    18.95  19.63  19.47  19.84  18.79  19.84  20.84  18.63
> lu.C.x_156_GROUP_5  Average    16.85  17.96  19.89  19.15  19.26  20.48  21.70  20.70
> lu.C.x_156_NORMAL_1 Average    18.04  18.48  20.00  19.72  20.72  20.48  18.48  20.08
> lu.C.x_156_NORMAL_2 Average    18.22  20.56  19.50  19.39  20.67  19.83  18.44  19.39
> lu.C.x_156_NORMAL_3 Average    17.72  19.61  19.56  19.17  20.17  19.89  20.78  19.11
> lu.C.x_156_NORMAL_4 Average    18.05  19.74  20.21  19.89  20.32  20.26  19.16  18.37
> lu.C.x_156_NORMAL_5 Average    18.89  19.95  20.21  20.63  19.84  19.26  19.26  17.95
>
> ============156_GROUP========Mop/s===================================
> min     q1      median  q3      max
> 13460.1 14949   15851.7 16391.4 18993
> ============156_GROUP========time====================================
> min     q1      median  q3      max
> 107.35  124.39  128.63  136.4   151.48
> ============156_NORMAL========Mop/s===================================
> min     q1      median  q3      max
> 14418.5 18512.4 19049.5 19682   19808.8
> ============156_NORMAL========time====================================
> min     q1      median  q3      max
> 102.93  103.6   107.04  110.14  141.42
>
>
> echo 0 >  /proc/sys/kernel/numa_balancing
> lu.C.x_156_GROUP_1  Average    19.00  19.33  19.33  19.58  20.08  19.67  19.83  19.17
> lu.C.x_156_GROUP_2  Average    18.55  19.91  20.09  19.27  18.82  19.27  19.91  20.18
> lu.C.x_156_GROUP_3  Average    18.42  19.08  19.75  19.00  19.50  20.08  20.25  19.92
> lu.C.x_156_GROUP_4  Average    18.42  19.83  19.17  19.50  19.58  19.83  19.83  19.83
> lu.C.x_156_GROUP_5  Average    19.17  19.42  20.17  19.92  19.25  18.58  19.92  19.58
> lu.C.x_156_NORMAL_1 Average    19.25  19.50  19.92  18.92  19.33  19.75  19.58  19.75
> lu.C.x_156_NORMAL_2 Average    19.42  19.25  17.83  18.17  19.83  20.50  20.42  20.58
> lu.C.x_156_NORMAL_3 Average    18.58  19.33  19.75  18.25  19.42  20.25  20.08  20.33
> lu.C.x_156_NORMAL_4 Average    19.00  19.55  19.73  18.73  19.55  20.00  19.64  19.82
> lu.C.x_156_NORMAL_5 Average    19.25  19.25  19.50  18.75  19.92  19.58  19.92  19.83
>
> ============156_GROUP========Mop/s===================================
> min     q1      median  q3      max
> 28520.1 29024.2 29042.1 29367.4 31235.2
> ============156_GROUP========time====================================
> min     q1      median  q3      max
> 65.28   69.43   70.21   70.25   71.49
> ============156_NORMAL========Mop/s===================================
> min     q1      median  q3      max
> 28974.5 29806.5 30237.1 30907.4 31830.1
> ============156_NORMAL========time====================================
> min     q1      median  q3      max
> 64.06   65.97   67.43   68.41   70.37
>
>
> This all now makes sense. Looking at the numa balancing code a bit you can see
> that it still uses load so it will still be subject to making bogus decisions
> based on the weighted load. In this case it's been actively working against the
> load balancer because of that.

Thanks for the tests and interesting results

>
> I think with the three numa levels on this system the numa balancing was able to
> win more often.  We don't see the same level of this result on systems with only
> one SD_NUMA level.
>
> Following the other part of this thread, I have to add that I'm of the opinion
> that the weighted load (which is all we have now I believe) really should be used
> only in extreme cases of overload to deal with fairness. And even then maybe not.
> As far as I can see, once the fair group scheduling is involved, that load is
> basically a random number between 1 and 1024.  It really has no bearing on how
> much  "load" a task will put on a cpu.   Any comparison of that to cpu capacity
> is pretty meaningless.
>
> I'm sure there are workloads for which the numa balancing is more important. But
> even then I suspect it is making the wrong decisions more often than not. I think
> a similar rework may be needed :)

Yes , there is probably space for a better collaboration between the
load and numa balancing

>
> I've asked our perf team to try the full battery of tests with numa balancing
> disabled  to see what it shows across the board.
>
>
> Good job on this and thanks for the time looking at my specific issues.

Thanks for your help


>
>
> As far as this series is concerned, and as far as it matters:
>
> Acked-by: Phil Auld <pauld@redhat.com>
>
>
>
> Cheers,
> Phil
>
> --
>

  reply	other threads:[~2019-10-31 16:41 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 13:26 [PATCH v4 00/10] sched/fair: rework the CFS load balance Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 01/11] sched/fair: clean up asym packing Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Clean " tip-bot2 for Vincent Guittot
2019-10-30 14:51   ` [PATCH v4 01/11] sched/fair: clean " Mel Gorman
2019-10-30 16:03     ` Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 02/11] sched/fair: rename sum_nr_running to sum_h_nr_running Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rename sg_lb_stats::sum_nr_running " tip-bot2 for Vincent Guittot
2019-10-30 14:53   ` [PATCH v4 02/11] sched/fair: rename sum_nr_running " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 03/11] sched/fair: remove meaningless imbalance calculation Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Remove " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 04/11] sched/fair: rework load_balance Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rework load_balance() tip-bot2 for Vincent Guittot
2019-10-30 15:45   ` [PATCH v4 04/11] sched/fair: rework load_balance Mel Gorman
2019-10-30 16:16     ` Valentin Schneider
2019-10-31  9:09     ` Vincent Guittot
2019-10-31 10:15       ` Mel Gorman
2019-10-31 11:13         ` Vincent Guittot
2019-10-31 11:40           ` Mel Gorman
2019-11-08 16:35             ` Vincent Guittot
2019-11-08 18:37               ` Mel Gorman
2019-11-12 10:58                 ` Vincent Guittot
2019-11-12 15:06                   ` Mel Gorman
2019-11-12 15:40                     ` Vincent Guittot
2019-11-12 17:45                       ` Mel Gorman
2019-11-18 13:50     ` Ingo Molnar
2019-11-18 13:57       ` Vincent Guittot
2019-11-18 14:51       ` Mel Gorman
2019-10-18 13:26 ` [PATCH v4 05/11] sched/fair: use rq->nr_running when balancing load Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-30 15:54   ` [PATCH v4 05/11] sched/fair: use " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 06/11] sched/fair: use load instead of runnable load in load_balance Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use load instead of runnable load in load_balance() tip-bot2 for Vincent Guittot
2019-10-30 15:58   ` [PATCH v4 06/11] sched/fair: use load instead of runnable load in load_balance Mel Gorman
2019-10-18 13:26 ` [PATCH v4 07/11] sched/fair: evenly spread tasks when not overloaded Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Spread out tasks evenly " tip-bot2 for Vincent Guittot
2019-10-30 16:03   ` [PATCH v4 07/11] sched/fair: evenly spread tasks " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 08/11] sched/fair: use utilization to select misfit task Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 09/11] sched/fair: use load instead of runnable load in wakeup path Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 10/11] sched/fair: optimize find_idlest_group Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Optimize find_idlest_group() tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 11/11] sched/fair: rework find_idlest_group Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rework find_idlest_group() tip-bot2 for Vincent Guittot
2019-10-22 16:46   ` [PATCH] sched/fair: fix rework of find_idlest_group() Vincent Guittot
2019-10-23  7:50     ` Chen, Rong A
2019-10-30 16:07     ` Mel Gorman
2019-11-18 17:42     ` [tip: sched/core] sched/fair: Fix " tip-bot2 for Vincent Guittot
2019-11-22 14:37     ` [PATCH] sched/fair: fix " Valentin Schneider
2019-11-25  9:16       ` Vincent Guittot
2019-11-25 11:03         ` Valentin Schneider
2019-11-20 11:58   ` [PATCH v4 11/11] sched/fair: rework find_idlest_group Qais Yousef
2019-11-20 13:21     ` Vincent Guittot
2019-11-20 16:53       ` Vincent Guittot
2019-11-20 17:34         ` Qais Yousef
2019-11-20 17:43           ` Vincent Guittot
2019-11-20 18:10             ` Qais Yousef
2019-11-20 18:20               ` Vincent Guittot
2019-11-20 18:27                 ` Qais Yousef
2019-11-20 19:28                   ` Vincent Guittot
2019-11-20 19:55                     ` Qais Yousef
2019-11-21 14:58                       ` Qais Yousef
2019-11-22 14:34   ` Valentin Schneider
2019-11-25  9:59     ` Vincent Guittot
2019-11-25 11:13       ` Valentin Schneider
2019-10-21  7:50 ` [PATCH v4 00/10] sched/fair: rework the CFS load balance Ingo Molnar
2019-10-21  8:44   ` Vincent Guittot
2019-10-21 12:56     ` Phil Auld
2019-10-24 12:38     ` Phil Auld
2019-10-24 13:46       ` Phil Auld
2019-10-24 14:59         ` Vincent Guittot
2019-10-25 13:33           ` Phil Auld
2019-10-28 13:03             ` Vincent Guittot
2019-10-30 14:39               ` Phil Auld
2019-10-30 16:24                 ` Dietmar Eggemann
2019-10-30 16:35                   ` Valentin Schneider
2019-10-30 17:19                     ` Phil Auld
2019-10-30 17:25                       ` Valentin Schneider
2019-10-30 17:29                         ` Phil Auld
2019-10-30 17:28                       ` Vincent Guittot
2019-10-30 17:44                         ` Phil Auld
2019-10-30 17:25                 ` Vincent Guittot
2019-10-31 13:57                   ` Phil Auld
2019-10-31 16:41                     ` Vincent Guittot [this message]
2019-10-30 16:24   ` Mel Gorman
2019-10-30 16:35     ` Vincent Guittot
2019-11-18 13:15     ` Ingo Molnar
2019-11-25 12:48 ` Valentin Schneider
2020-01-03 16:39   ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtCm5dM4G0D6jSenNQSbpnKkUpOev24Ms02aszPyxyfuuQ@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).