All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Mike Galbraith <efault@gmx.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs
Date: Wed, 15 Dec 2021 12:25:50 +0000	[thread overview]
Message-ID: <20211215122550.GR3366@techsingularity.net> (raw)
In-Reply-To: <YbnW/vLgE8MmQopN@BLR-5CG11610CF.amd.com>

On Wed, Dec 15, 2021 at 05:22:30PM +0530, Gautham R. Shenoy wrote:
> Hello Mel,
> 
> 
> On Mon, Dec 13, 2021 at 08:17:37PM +0530, Gautham R. Shenoy wrote:
> 
> > 
> > Thanks for the patch. I will queue this one for tonight.
> >
> 
> Getting the numbers took a bit longer than I expected.
> 

No worries.

> > > <SNIP>
> > > +				/*
> > > +				 * Set span based on top domain that places
> > > +				 * tasks in sibling domains.
> > > +				 */
> > > +				top = sd;
> > > +				top_p = top->parent;
> > > +				while (top_p && (top_p->flags & SD_PREFER_SIBLING)) {
> > > +					top = top->parent;
> > > +					top_p = top->parent;
> > > +				}
> > > +				imb_span = top_p ? top_p->span_weight : sd->span_weight;
> > >  			} else {
> > > -				sd->imb_numa_nr = imb * (sd->span_weight / imb_span);
> > > +				int factor = max(1U, (sd->span_weight / imb_span));
> > > +
> 
> 
> So for the first NUMA domain, the sd->imb_numa_nr will be imb, which
> turns out to be 2 for Zen2 and Zen3 processors across all Nodes Per Socket Settings.
> 
> On a 2 Socket Zen3:
> 
> NPS=1
>    child=MC, llc_weight=16, sd=DIE. sd->span_weight=128 imb=max(2U, (16*16/128) / 4)=2
>    top_p = NUMA, imb_span = 256.
> 
>    NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/256) = 2
> 
> NPS=2
>    child=MC, llc_weight=16, sd=NODE. sd->span_weight=64 imb=max(2U, (16*16/64) / 4) = 2
>    top_p = NUMA, imb_span = 128.
> 
>    NUMA: sd->span_weight =128; sd->imb_numa_nr = 2 * (128/128) = 2
>    NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/128) = 4
> 
> NPS=4:
>    child=MC, llc_weight=16, sd=NODE. sd->span_weight=32 imb=max(2U, (16*16/32) / 4) = 2
>    top_p = NUMA, imb_span = 128.
> 
>    NUMA: sd->span_weight =128; sd->imb_numa_nr = 2 * (128/128) = 2
>    NUMA: sd->span_weight =256; sd->imb_numa_nr = 2 * (256/128) = 4
> 
> Again, we will be more aggressively load balancing across the two
> sockets in NPS=1 mode compared to NPS=2/4.
> 

Yes, but I felt it was reasonable behaviour because we have to strike
some sort of balance between allowing a NUMA imbalance up to a point
to prevent communicating tasks being pulled apart and v3 broke that
completely. There will always be a tradeoff between tasks that want to
remain local to each other and others that prefer to spread as wide as
possible as quickly as possible.

> <SNIP>
> If we retain the (2,4) thresholds from v4.1 but use them in
> allow_numa_imbalance() as in v3 we get
> 
> NPS=4
> Test:	 mel-v4.2
>  Copy:	 225860.12 (498.11%)
> Scale:	 227869.07 (572.58%)
>   Add:	 278365.58 (624.93%)
> Triad:	 264315.44 (596.62%)
> 

The potential problem with this is that it probably will work for
netperf when it's a single communicating pair but may not work as well
when there are multiple communicating pairs or a number of communicating
tasks that exceed numa_imb_nr.

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> NPS=1
> ======
> Clients: tip-core   mel-v3    mel-v4    mel-v4.1
>     1	 633.19     619.16    632.94    619.27
>     	 (0.00%)    (-2.21%)  (-0.03%)	(-2.19%)
> 	 
>     2	 1152.48    1189.88   1184.82   1189.19
>     	 (0.00%)    (3.24%)   (2.80%)	(3.18%)
> 	 
>     4	 1946.46    2177.40   1979.56	2196.09
>     	 (0.00%)    (11.86%)  (1.70%)	(12.82%)
> 	 
>     8	 3553.29    3564.50   3678.07	3668.77
>     	 (0.00%)    (0.31%)   (3.51%)	(3.24%)
> 	 
>    16	 6217.03    6484.58   6249.29	6534.73
>    	 (0.00%)    (4.30%)   (0.51%)	(5.11%)
> 	 
>    32	 11702.59   12185.77  12005.99	11917.57
>    	 (0.00%)    (4.12%)   (2.59%)	(1.83%)
> 	 
>    64	 18394.56   19535.11  19080.19	19500.55
>    	 (0.00%)    (6.20%)   (3.72%)	(6.01%)
> 	 
>   128	 27231.02   31759.92  27200.52	30358.99
>   	 (0.00%)    (16.63%)  (-0.11%)	(11.48%)
> 	 
>   256	 33166.10   24474.30  31639.98	24788.12
>   	 (0.00%)    (-26.20%) (-4.60%)	(-25.26%)
> 	 
>   512	 41605.44   54823.57  46684.48	54559.02
>   	 (0.00%)    (31.77%)  (12.20%)	(31.13%)
> 	 
>  1024	 53650.54   56329.39  44422.99	56320.66
>  	 (0.00%)    (4.99%)   (-17.19%)	(4.97%) 
> 
> 
> We see that the v4.1 performs better than v4 in most cases except when
> the number of clients=256 where the spread strategy seems to be
> hurting as we see degradation in both v3 and v4.1. This is true even
> for NPS=2 and NPS=4 cases (see below).
> 

The 256 client case is a bit of a crapshoot. At that point, the NUMA
imbalancing is disabled and the machine is overloaded.

> NPS=2
> =====
> Clients: tip-core   mel-v3    mel-v4    mel-v4.1
>     1	 629.76	    620.91    629.11	631.95
>     	 (0.00%)    (-1.40%)  (-0.10%)	(0.34%)
> 	 
>     2	 1176.96    1203.12   1169.09	1186.74
>     	 (0.00%)    (2.22%)   (-0.66%)	(0.83%)
> 	 
>     4	 1990.97    2228.04   1888.19	1995.21
>     	 (0.00%)    (11.90%)  (-5.16%)	(0.21%)
> 	 
>     8	 3534.57    3617.16   3660.30	3548.09
>     	 (0.00%)    (2.33%)   (3.55%)	(0.38%)
> 	 
>    16	 6294.71    6547.80   6504.13	6470.34
>    	 (0.00%)    (4.02%)   (3.32%)	(2.79%)
> 	 
>    32	 12035.73   12143.03  11396.26	11860.91
>    	 (0.00%)    (0.89%)   (-5.31%)	(-1.45%)
> 	 
>    64	 18583.39   19439.12  17126.47	18799.54
>    	 (0.00%)    (4.60%)   (-7.83%)	(1.16%)
> 	 
>   128	 27811.89   30562.84  28090.29	27468.94
>   	 (0.00%)    (9.89%)   (1.00%)	(-1.23%)
> 	 
>   256	 28148.95   26488.57  29117.13	23628.29
>   	 (0.00%)    (-5.89%)  (3.43%)	(-16.05%)
> 	 
>   512	 43934.15   52796.38  42603.49	41725.75
>   	 (0.00%)    (20.17%)  (-3.02%)	(-5.02%)
> 	 
>  1024	 54391.65   53891.83  48419.09	43913.40
>  	 (0.00%)    (-0.91%)  (-10.98%)	(-19.26%)
> 
> In this case, v4.1 performs as good as v4 upto 64 clients. But after
> that we see degradation. The degradation is significant in 1024
> clients case.
> 

Kinda the same, it's more likely to be run-to-run variance because the
machine is overloaded.

> NPS=4
> =====
> Clients: tip-core   mel-v3    mel-v4    mel-v4.1    mel-v4.2
>     1	 622.65	    617.83    667.34	644.76	    617.58
>     	 (0.00%)    (-0.77%)  (7.17%)	(3.55%)	    (-0.81%)
> 	 
>     2	 1160.62    1182.30   1294.08	1193.88	    1182.55
>     	 (0.00%)    (1.86%)   (11.49%)	(2.86%)	    (1.88%)
> 	 
>     4	 1961.14    2171.91   2477.71	1929.56	    2116.01
>     	 (0.00%)    (10.74%)  (26.34%)	(-1.61%)    (7.89%)
> 	 
>     8	 3662.94    3447.98   4067.40	3627.43	    3580.32
>     	 (0.00%)    (-5.86%)  (11.04%)	(-0.96%)    (-2.25%)
> 	 
>    16	 6490.92    5871.93   6924.32	6660.13	    6413.34
>    	 (0.00%)    (-9.53%)  (6.67%)	(2.60%)	    (-1.19%)
> 	 
>    32	 11831.81   12004.30  12709.06	12187.78    11767.46
>    	 (0.00%)    (1.45%)   (7.41%)	(3.00%)	    (-0.54%)
> 	 
>    64	 17717.36   18406.79  18785.41	18820.33    18197.86
>    	 (0.00%)    (3.89%)   (6.02%)	(6.22%)	    (2.71%)
> 	 
>   128	 27723.35   27777.34  27939.63	27399.64    24310.93
>   	 (0.00%)    (0.19%)   (0.78%)	(-1.16%)    (-12.30%)
> 	 
>   256	 30919.69   23937.03  35412.26	26780.37    24642.24
>   	 (0.00%)    (-22.58%) (14.52%)	(-13.38%)   (-20.30%)
> 	 
>   512	 43366.03   49570.65  43830.84	43654.42    41031.90
>   	 (0.00%)    (14.30%)  (1.07%)	(0.66%)	    (-5.38%)
> 	 
>  1024	 46960.83   53576.16  50557.19	43743.07    40884.98
>  	 (0.00%)    (14.08%)  (7.65%)	(-6.85%)    (-12.93%)
> 
> 
> In the NPS=4 case, clearly v4 provides the best results.
> 
> v4.1 does better v4.2 since it is able to hold off spreading for a
> longer period compared to v4.2.
> 

Most likely because v4.2 is disabling the allowed NUMA imbalance too
soon. This is the trade-off between favouring communicating tasks over
embararassingly parallel problems.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-12-15 12:25 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  9:33 [PATCH v4 0/2] Adjust NUMA imbalance for multiple LLCs Mel Gorman
2021-12-10  9:33 ` [PATCH 1/2] sched/fair: Use weight of SD_NUMA domain in find_busiest_group Mel Gorman
2021-12-21 10:53   ` Vincent Guittot
2021-12-21 11:32     ` Mel Gorman
2021-12-21 13:05       ` Vincent Guittot
2021-12-10  9:33 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Mel Gorman
2021-12-13  8:28   ` Gautham R. Shenoy
2021-12-13 13:01     ` Mel Gorman
2021-12-13 14:47       ` Gautham R. Shenoy
2021-12-15 11:52         ` Gautham R. Shenoy
2021-12-15 12:25           ` Mel Gorman [this message]
2021-12-16 18:33             ` Gautham R. Shenoy
2021-12-20 11:12               ` Mel Gorman
2021-12-21 15:03                 ` Gautham R. Shenoy
2021-12-21 17:13                 ` Vincent Guittot
2021-12-22  8:52                   ` Jirka Hladky
2022-01-04 19:52                     ` Jirka Hladky
2022-01-05 10:42                   ` Mel Gorman
2022-01-05 10:49                     ` Mel Gorman
2022-01-10 15:53                     ` Vincent Guittot
2022-01-12 10:24                       ` Mel Gorman
2021-12-17 19:54   ` Gautham R. Shenoy
  -- strict thread matches above, loose matches on Subject: below --
2022-02-08  9:43 [PATCH v6 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-08  9:43 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-08 16:19   ` Gautham R. Shenoy
2022-02-09  5:10   ` K Prateek Nayak
2022-02-09 10:33     ` Mel Gorman
2022-02-11 19:02       ` Jirka Hladky
2022-02-14 10:27   ` Srikar Dronamraju
2022-02-14 11:03   ` Vincent Guittot
2022-02-03 14:46 [PATCH v5 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-03 14:46 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-04  1:30   ` kernel test robot
2022-02-04  7:06   ` Srikar Dronamraju
2022-02-04  9:04     ` Mel Gorman
2022-02-04 15:07   ` Nayak, KPrateek (K Prateek)
2022-02-04 16:45     ` Mel Gorman
2021-12-01 15:18 [PATCH v3 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-12-01 15:18 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2021-12-03  8:15   ` Barry Song
2021-12-03 10:50     ` Mel Gorman
2021-12-03 11:14       ` Barry Song
2021-12-03 13:27         ` Mel Gorman
2021-12-04 10:40   ` Peter Zijlstra
2021-12-06  8:48     ` Gautham R. Shenoy
2021-12-06 14:51       ` Peter Zijlstra
2021-12-06 15:12     ` Mel Gorman
2021-12-09 14:23       ` Valentin Schneider
2021-12-09 15:43         ` Mel Gorman
2021-11-25 15:19 [PATCH 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-11-25 15:19 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2021-11-26 23:22   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211215122550.GR3366@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=gautham.shenoy@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.