All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phil Auld <pauld@redhat.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Hillf Danton <hdanton@sina.com>, Parth Shah <parth@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance
Date: Thu, 31 Oct 2019 09:57:15 -0400	[thread overview]
Message-ID: <20191031135715.GA5738@pauld.bos.csb> (raw)
In-Reply-To: <CAKfTPtCR93MBPjKhMSyMZJTqVS7YWBPCnk3DmSEq2Q0MVxm1ug@mail.gmail.com>

Hi Vincent,

On Wed, Oct 30, 2019 at 06:25:49PM +0100 Vincent Guittot wrote:
> On Wed, 30 Oct 2019 at 15:39, Phil Auld <pauld@redhat.com> wrote:
> > > That fact that the 4 nodes works well but not the 8 nodes is a bit
> > > surprising except if this means more NUMA level in the sched_domain
> > > topology
> > > Could you give us more details about the sched domain topology ?
> > >
> >
> > The 8-node system has 5 sched domain levels.  The 4-node system only
> > has 3.
> 
> That's an interesting difference. and your additional tests on a 8
> nodes with 3 level tends to confirm that the number of level make a
> difference
> I need to study a bit more how this can impact the spread of tasks

So I think I understand what my numbers have been showing. 

I believe the numa balancing is causing problems.

Here's numbers from the test on 5.4-rc3+ without your series: 

echo 1 >  /proc/sys/kernel/numa_balancing 
lu.C.x_156_GROUP_1  Average    10.87  0.00   0.00   11.49  36.69  34.26  30.59  32.10
lu.C.x_156_GROUP_2  Average    20.15  16.32  9.49   24.91  21.07  20.93  21.63  21.50
lu.C.x_156_GROUP_3  Average    21.27  17.23  11.84  21.80  20.91  20.68  21.11  21.16
lu.C.x_156_GROUP_4  Average    19.44  6.53   8.71   19.72  22.95  23.16  28.85  26.64
lu.C.x_156_GROUP_5  Average    20.59  6.20   11.32  14.63  28.73  30.36  22.20  21.98
lu.C.x_156_NORMAL_1 Average    20.50  19.95  20.40  20.45  18.75  19.35  18.25  18.35
lu.C.x_156_NORMAL_2 Average    17.15  19.04  18.42  18.69  21.35  21.42  20.00  19.92
lu.C.x_156_NORMAL_3 Average    18.00  18.15  17.55  17.60  18.90  18.40  19.90  19.75
lu.C.x_156_NORMAL_4 Average    20.53  20.05  20.21  19.11  19.00  19.47  19.37  18.26
lu.C.x_156_NORMAL_5 Average    18.72  18.78  19.72  18.50  19.67  19.72  21.11  19.78

============156_GROUP========Mop/s===================================
min	q1	median	q3	max
1564.63	3003.87	3928.23	5411.13	8386.66
============156_GROUP========time====================================
min	q1	median	q3	max
243.12	376.82	519.06	678.79	1303.18
============156_NORMAL========Mop/s===================================
min	q1	median	q3	max
13845.6	18013.8	18545.5	19359.9	19647.4
============156_NORMAL========time====================================
min	q1	median	q3	max
103.78	105.32	109.95	113.19	147.27

(This one above is especially bad... we don't usually see 0.00s, but overall it's 
basically on par. It's reflected in the spread of the results).
 

echo 0 >  /proc/sys/kernel/numa_balancing 
lu.C.x_156_GROUP_1  Average    17.75  19.30  21.20  21.20  20.20  20.80  18.90  16.65
lu.C.x_156_GROUP_2  Average    18.38  19.25  21.00  20.06  20.19  20.31  19.56  17.25
lu.C.x_156_GROUP_3  Average    21.81  21.00  18.38  16.86  20.81  21.48  18.24  17.43
lu.C.x_156_GROUP_4  Average    20.48  20.96  19.61  17.61  17.57  19.74  18.48  21.57
lu.C.x_156_GROUP_5  Average    23.32  21.96  19.16  14.28  21.44  22.56  17.00  16.28
lu.C.x_156_NORMAL_1 Average    19.50  19.83  19.58  19.25  19.58  19.42  19.42  19.42
lu.C.x_156_NORMAL_2 Average    18.90  18.40  20.00  19.80  19.70  19.30  19.80  20.10
lu.C.x_156_NORMAL_3 Average    19.45  19.09  19.91  20.09  19.45  18.73  19.45  19.82
lu.C.x_156_NORMAL_4 Average    19.64  19.27  19.64  19.00  19.82  19.55  19.73  19.36
lu.C.x_156_NORMAL_5 Average    18.75  19.42  20.08  19.67  18.75  19.50  19.92  19.92

============156_GROUP========Mop/s===================================
min	q1	median	q3	max
14956.3	16346.5	17505.7	18440.6	22492.7
============156_GROUP========time====================================
min	q1	median	q3	max
90.65	110.57	116.48	124.74	136.33
============156_NORMAL========Mop/s===================================
min	q1	median	q3	max
29801.3	30739.2	31967.5	32151.3	34036
============156_NORMAL========time====================================
min	q1	median	q3	max
59.91	63.42	63.78	66.33	68.42


Note there is a significant improvement already. But we are seeing imbalance due to
using weighted load and averages. In this case it's only 55% slowdown rather than
the 5x. But the overall performance if the benchmark is also much better in both cases.



Here's the same test, same system with the full series (lb_v4a as I've been calling it):

echo 1 >  /proc/sys/kernel/numa_balancing 
lu.C.x_156_GROUP_1  Average    18.59  19.36  19.50  18.86  20.41  20.59  18.27  20.41
lu.C.x_156_GROUP_2  Average    19.52  20.52  20.48  21.17  19.52  19.09  17.70  18.00
lu.C.x_156_GROUP_3  Average    20.58  20.71  20.17  20.50  18.46  19.50  18.58  17.50
lu.C.x_156_GROUP_4  Average    18.95  19.63  19.47  19.84  18.79  19.84  20.84  18.63
lu.C.x_156_GROUP_5  Average    16.85  17.96  19.89  19.15  19.26  20.48  21.70  20.70
lu.C.x_156_NORMAL_1 Average    18.04  18.48  20.00  19.72  20.72  20.48  18.48  20.08
lu.C.x_156_NORMAL_2 Average    18.22  20.56  19.50  19.39  20.67  19.83  18.44  19.39
lu.C.x_156_NORMAL_3 Average    17.72  19.61  19.56  19.17  20.17  19.89  20.78  19.11
lu.C.x_156_NORMAL_4 Average    18.05  19.74  20.21  19.89  20.32  20.26  19.16  18.37
lu.C.x_156_NORMAL_5 Average    18.89  19.95  20.21  20.63  19.84  19.26  19.26  17.95

============156_GROUP========Mop/s===================================
min	q1	median	q3	max
13460.1	14949	15851.7	16391.4	18993
============156_GROUP========time====================================
min	q1	median	q3	max
107.35	124.39	128.63	136.4	151.48
============156_NORMAL========Mop/s===================================
min	q1	median	q3	max
14418.5	18512.4	19049.5	19682	19808.8
============156_NORMAL========time====================================
min	q1	median	q3	max
102.93	103.6	107.04	110.14	141.42


echo 0 >  /proc/sys/kernel/numa_balancing 
lu.C.x_156_GROUP_1  Average    19.00  19.33  19.33  19.58  20.08  19.67  19.83  19.17
lu.C.x_156_GROUP_2  Average    18.55  19.91  20.09  19.27  18.82  19.27  19.91  20.18
lu.C.x_156_GROUP_3  Average    18.42  19.08  19.75  19.00  19.50  20.08  20.25  19.92
lu.C.x_156_GROUP_4  Average    18.42  19.83  19.17  19.50  19.58  19.83  19.83  19.83
lu.C.x_156_GROUP_5  Average    19.17  19.42  20.17  19.92  19.25  18.58  19.92  19.58
lu.C.x_156_NORMAL_1 Average    19.25  19.50  19.92  18.92  19.33  19.75  19.58  19.75
lu.C.x_156_NORMAL_2 Average    19.42  19.25  17.83  18.17  19.83  20.50  20.42  20.58
lu.C.x_156_NORMAL_3 Average    18.58  19.33  19.75  18.25  19.42  20.25  20.08  20.33
lu.C.x_156_NORMAL_4 Average    19.00  19.55  19.73  18.73  19.55  20.00  19.64  19.82
lu.C.x_156_NORMAL_5 Average    19.25  19.25  19.50  18.75  19.92  19.58  19.92  19.83

============156_GROUP========Mop/s===================================
min	q1	median	q3	max
28520.1	29024.2	29042.1	29367.4	31235.2
============156_GROUP========time====================================
min	q1	median	q3	max
65.28	69.43	70.21	70.25	71.49
============156_NORMAL========Mop/s===================================
min	q1	median	q3	max
28974.5	29806.5	30237.1	30907.4	31830.1
============156_NORMAL========time====================================
min	q1	median	q3	max
64.06	65.97	67.43	68.41	70.37


This all now makes sense. Looking at the numa balancing code a bit you can see 
that it still uses load so it will still be subject to making bogus decisions 
based on the weighted load. In this case it's been actively working against the 
load balancer because of that.

I think with the three numa levels on this system the numa balancing was able to 
win more often.  We don't see the same level of this result on systems with only 
one SD_NUMA level.

Following the other part of this thread, I have to add that I'm of the opinion 
that the weighted load (which is all we have now I believe) really should be used 
only in extreme cases of overload to deal with fairness. And even then maybe not.  
As far as I can see, once the fair group scheduling is involved, that load is 
basically a random number between 1 and 1024.  It really has no bearing on how 
much  "load" a task will put on a cpu.   Any comparison of that to cpu capacity 
is pretty meaningless. 

I'm sure there are workloads for which the numa balancing is more important. But 
even then I suspect it is making the wrong decisions more often than not. I think 
a similar rework may be needed :)

I've asked our perf team to try the full battery of tests with numa balancing 
disabled  to see what it shows across the board.


Good job on this and thanks for the time looking at my specific issues. 


As far as this series is concerned, and as far as it matters: 

Acked-by: Phil Auld <pauld@redhat.com>



Cheers,
Phil

-- 


  reply	other threads:[~2019-10-31 13:57 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18 13:26 [PATCH v4 00/10] sched/fair: rework the CFS load balance Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 01/11] sched/fair: clean up asym packing Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Clean " tip-bot2 for Vincent Guittot
2019-10-30 14:51   ` [PATCH v4 01/11] sched/fair: clean " Mel Gorman
2019-10-30 16:03     ` Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 02/11] sched/fair: rename sum_nr_running to sum_h_nr_running Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rename sg_lb_stats::sum_nr_running " tip-bot2 for Vincent Guittot
2019-10-30 14:53   ` [PATCH v4 02/11] sched/fair: rename sum_nr_running " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 03/11] sched/fair: remove meaningless imbalance calculation Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Remove " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 04/11] sched/fair: rework load_balance Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rework load_balance() tip-bot2 for Vincent Guittot
2019-10-30 15:45   ` [PATCH v4 04/11] sched/fair: rework load_balance Mel Gorman
2019-10-30 16:16     ` Valentin Schneider
2019-10-31  9:09     ` Vincent Guittot
2019-10-31 10:15       ` Mel Gorman
2019-10-31 11:13         ` Vincent Guittot
2019-10-31 11:40           ` Mel Gorman
2019-11-08 16:35             ` Vincent Guittot
2019-11-08 18:37               ` Mel Gorman
2019-11-12 10:58                 ` Vincent Guittot
2019-11-12 15:06                   ` Mel Gorman
2019-11-12 15:40                     ` Vincent Guittot
2019-11-12 17:45                       ` Mel Gorman
2019-11-18 13:50     ` Ingo Molnar
2019-11-18 13:57       ` Vincent Guittot
2019-11-18 14:51       ` Mel Gorman
2019-10-18 13:26 ` [PATCH v4 05/11] sched/fair: use rq->nr_running when balancing load Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-30 15:54   ` [PATCH v4 05/11] sched/fair: use " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 06/11] sched/fair: use load instead of runnable load in load_balance Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use load instead of runnable load in load_balance() tip-bot2 for Vincent Guittot
2019-10-30 15:58   ` [PATCH v4 06/11] sched/fair: use load instead of runnable load in load_balance Mel Gorman
2019-10-18 13:26 ` [PATCH v4 07/11] sched/fair: evenly spread tasks when not overloaded Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Spread out tasks evenly " tip-bot2 for Vincent Guittot
2019-10-30 16:03   ` [PATCH v4 07/11] sched/fair: evenly spread tasks " Mel Gorman
2019-10-18 13:26 ` [PATCH v4 08/11] sched/fair: use utilization to select misfit task Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 09/11] sched/fair: use load instead of runnable load in wakeup path Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Use " tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 10/11] sched/fair: optimize find_idlest_group Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Optimize find_idlest_group() tip-bot2 for Vincent Guittot
2019-10-18 13:26 ` [PATCH v4 11/11] sched/fair: rework find_idlest_group Vincent Guittot
2019-10-21  9:12   ` [tip: sched/core] sched/fair: Rework find_idlest_group() tip-bot2 for Vincent Guittot
2019-10-22 16:46   ` [PATCH] sched/fair: fix rework of find_idlest_group() Vincent Guittot
2019-10-23  7:50     ` Chen, Rong A
2019-10-30 16:07     ` Mel Gorman
2019-11-18 17:42     ` [tip: sched/core] sched/fair: Fix " tip-bot2 for Vincent Guittot
2019-11-22 14:37     ` [PATCH] sched/fair: fix " Valentin Schneider
2019-11-25  9:16       ` Vincent Guittot
2019-11-25 11:03         ` Valentin Schneider
2019-11-20 11:58   ` [PATCH v4 11/11] sched/fair: rework find_idlest_group Qais Yousef
2019-11-20 13:21     ` Vincent Guittot
2019-11-20 16:53       ` Vincent Guittot
2019-11-20 17:34         ` Qais Yousef
2019-11-20 17:43           ` Vincent Guittot
2019-11-20 18:10             ` Qais Yousef
2019-11-20 18:20               ` Vincent Guittot
2019-11-20 18:27                 ` Qais Yousef
2019-11-20 19:28                   ` Vincent Guittot
2019-11-20 19:55                     ` Qais Yousef
2019-11-21 14:58                       ` Qais Yousef
2019-11-22 14:34   ` Valentin Schneider
2019-11-25  9:59     ` Vincent Guittot
2019-11-25 11:13       ` Valentin Schneider
2019-10-21  7:50 ` [PATCH v4 00/10] sched/fair: rework the CFS load balance Ingo Molnar
2019-10-21  8:44   ` Vincent Guittot
2019-10-21 12:56     ` Phil Auld
2019-10-24 12:38     ` Phil Auld
2019-10-24 13:46       ` Phil Auld
2019-10-24 14:59         ` Vincent Guittot
2019-10-25 13:33           ` Phil Auld
2019-10-28 13:03             ` Vincent Guittot
2019-10-30 14:39               ` Phil Auld
2019-10-30 16:24                 ` Dietmar Eggemann
2019-10-30 16:35                   ` Valentin Schneider
2019-10-30 17:19                     ` Phil Auld
2019-10-30 17:25                       ` Valentin Schneider
2019-10-30 17:29                         ` Phil Auld
2019-10-30 17:28                       ` Vincent Guittot
2019-10-30 17:44                         ` Phil Auld
2019-10-30 17:25                 ` Vincent Guittot
2019-10-31 13:57                   ` Phil Auld [this message]
2019-10-31 16:41                     ` Vincent Guittot
2019-10-30 16:24   ` Mel Gorman
2019-10-30 16:35     ` Vincent Guittot
2019-11-18 13:15     ` Ingo Molnar
2019-11-25 12:48 ` Valentin Schneider
2020-01-03 16:39   ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191031135715.GA5738@pauld.bos.csb \
    --to=pauld@redhat.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.