linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: "Gautham R. Shenoy" <gautham.shenoy@amd.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Valentin Schneider <Valentin.Schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Mike Galbraith <efault@gmx.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs
Date: Wed, 12 Jan 2022 10:24:43 +0000	[thread overview]
Message-ID: <20220112102443.GZ3366@techsingularity.net> (raw)
In-Reply-To: <CAKfTPtBCdgKb7gBDoFo3ictVYhgQGcneHViEtYj8o=WVH3kTaA@mail.gmail.com>

On Mon, Jan 10, 2022 at 04:53:26PM +0100, Vincent Guittot wrote:
> On Wed, 5 Jan 2022 at 11:42, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > On Tue, Dec 21, 2021 at 06:13:15PM +0100, Vincent Guittot wrote:
> > > > <SNIP>
> > > >
> > > > @@ -9050,9 +9054,9 @@ static bool update_pick_idlest(struct sched_group *idlest,
> > > >   * This is an approximation as the number of running tasks may not be
> > > >   * related to the number of busy CPUs due to sched_setaffinity.
> > > >   */
> > > > -static inline bool allow_numa_imbalance(int dst_running, int dst_weight)
> > > > +static inline bool allow_numa_imbalance(int dst_running, int imb_numa_nr)
> > > >  {
> > > > -       return (dst_running < (dst_weight >> 2));
> > > > +       return dst_running < imb_numa_nr;
> > > >  }
> > > >
> > > >  /*
> > > >
> > > > <SNIP>
> > > >
> > > > @@ -9280,19 +9285,13 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
> > > >         }
> > > >  }
> > > >
> > > > -#define NUMA_IMBALANCE_MIN 2
> > > > -
> > > >  static inline long adjust_numa_imbalance(int imbalance,
> > > > -                               int dst_running, int dst_weight)
> > > > +                               int dst_running, int imb_numa_nr)
> > > >  {
> > > > -       if (!allow_numa_imbalance(dst_running, dst_weight))
> > > > +       if (!allow_numa_imbalance(dst_running, imb_numa_nr))
> > > >                 return imbalance;
> > > >
> > > > -       /*
> > > > -        * Allow a small imbalance based on a simple pair of communicating
> > > > -        * tasks that remain local when the destination is lightly loaded.
> > > > -        */
> > > > -       if (imbalance <= NUMA_IMBALANCE_MIN)
> > > > +       if (imbalance <= imb_numa_nr)
> > >
> > > Isn't this always true ?
> > >
> > > imbalance is "always" < dst_running as imbalance is usually the number
> > > of these tasks that we would like to migrate
> > >
> >
> > It's not necessarily true. allow_numa_imbalanced is checking if
> > dst_running < imb_numa_nr and adjust_numa_imbalance is checking the
> > imbalance.
> >
> > imb_numa_nr = 4
> > dst_running = 2
> > imbalance   = 1
> >
> > In that case, imbalance of 1 is ok, but 2 is not.
> 
> I don't catch your example. Why is imbalance = 2 not ok in your
> example above ? allow_numa_imbalance still returns true (dst-running <
> imb_numa_nr) and we still have imbalance <= imb_numa_nr
> 

At the time I wrote it, the comparison looked like < instead of <=.

> Also the name dst_running is quite confusing; In the case of
> calculate_imbalance, busiest->nr_running is passed as dst_running
> argument. But the busiest group is the src not the dst of the balance
> 
> Then,  imbalance < busiest->nr_running in load_balance because we try
> to even the number of task running in each groups without emptying it
> and allow_numa_imbalance checks that dst_running < imb_numa_nr. So we
> have imbalance < dst_running < imb_numa_nr
> 

But either way, you have a valid point. The patch as-is is too complex
and doing too much and is failing to make progress as a result. I'm going
to go back to the drawing board and come up with a simpler version that
adjusts the cut-off depending on topology but only allows an imbalance
of NUMA_IMBALANCE_MIN and tidy up the inconsistencies.

> > This?
> >
> >                                  * The 25% imbalance is an arbitrary cutoff
> >                                  * based on SMT-2 to balance between memory
> >                                  * bandwidth and avoiding premature sharing
> >                                  * of HT resources and SMT-4 or SMT-8 *may*
> >                                  * benefit from a different cutoff. nr_llcs
> >                                  * are accounted for to mitigate premature
> >                                  * cache eviction due to multiple tasks
> >                                  * using one cache while a sibling cache
> >                                  * remains relatively idle.
> >
> > > For example, why is it better than just 25% of the LLC weight ?
> >
> > Because lets say there are 2 LLCs then an imbalance based on just the LLC
> > weight might allow 2 tasks to share one cache while another is idle. This
> > is the original problem whereby the vanilla imbalance allowed multiple
> > LLCs on the same node to be overloaded which hurt workloads that prefer
> > to spread wide.
> 
> In this case, shouldn't it be (llc_weight >> 2) * nr_llcs to fill each
> llc up to 25%  ? instead of dividing by nr_llcs
> 
> As an example, you have
> 1 node with 1 LLC with 128 CPUs will get an imb_numa_nr = 32
> 1 node with 2 LLC with 64 CPUs each will get an imb_numa_nr = 8
> 1 node with 4 LLC with 32 CPUs each will get an imb_numa_nr = 2
> 
> sd->imb_numa_nr is used at NUMA level so the more LLC you have the
> lower imbalance is allowed
> 

The more LLCs, the lower the threshold where imbalances is allowed is
deliberate given that the motivating problem was that embarassingly
parallel problems on AMD suffer due to overloading some LLCs while
others remain idle.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2022-01-12 10:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10  9:33 [PATCH v4 0/2] Adjust NUMA imbalance for multiple LLCs Mel Gorman
2021-12-10  9:33 ` [PATCH 1/2] sched/fair: Use weight of SD_NUMA domain in find_busiest_group Mel Gorman
2021-12-21 10:53   ` Vincent Guittot
2021-12-21 11:32     ` Mel Gorman
2021-12-21 13:05       ` Vincent Guittot
2021-12-10  9:33 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Mel Gorman
2021-12-13  8:28   ` Gautham R. Shenoy
2021-12-13 13:01     ` Mel Gorman
2021-12-13 14:47       ` Gautham R. Shenoy
2021-12-15 11:52         ` Gautham R. Shenoy
2021-12-15 12:25           ` Mel Gorman
2021-12-16 18:33             ` Gautham R. Shenoy
2021-12-20 11:12               ` Mel Gorman
2021-12-21 15:03                 ` Gautham R. Shenoy
2021-12-21 17:13                 ` Vincent Guittot
2021-12-22  8:52                   ` Jirka Hladky
2022-01-04 19:52                     ` Jirka Hladky
2022-01-05 10:42                   ` Mel Gorman
2022-01-05 10:49                     ` Mel Gorman
2022-01-10 15:53                     ` Vincent Guittot
2022-01-12 10:24                       ` Mel Gorman [this message]
2021-12-17 19:54   ` Gautham R. Shenoy
  -- strict thread matches above, loose matches on Subject: below --
2022-02-08  9:43 [PATCH v6 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-08  9:43 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-08 16:19   ` Gautham R. Shenoy
2022-02-09  5:10   ` K Prateek Nayak
2022-02-09 10:33     ` Mel Gorman
2022-02-11 19:02       ` Jirka Hladky
2022-02-14 10:27   ` Srikar Dronamraju
2022-02-14 11:03   ` Vincent Guittot
2022-02-03 14:46 [PATCH v5 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-03 14:46 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-04  7:06   ` Srikar Dronamraju
2022-02-04  9:04     ` Mel Gorman
2022-02-04 15:07   ` Nayak, KPrateek (K Prateek)
2022-02-04 16:45     ` Mel Gorman
2021-12-01 15:18 [PATCH v3 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-12-01 15:18 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2021-12-03  8:15   ` Barry Song
2021-12-03 10:50     ` Mel Gorman
2021-12-03 11:14       ` Barry Song
2021-12-03 13:27         ` Mel Gorman
2021-12-04 10:40   ` Peter Zijlstra
2021-12-06  8:48     ` Gautham R. Shenoy
2021-12-06 14:51       ` Peter Zijlstra
2021-12-06 15:12     ` Mel Gorman
2021-12-09 14:23       ` Valentin Schneider
2021-12-09 15:43         ` Mel Gorman
2021-11-25 15:19 [PATCH 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-11-25 15:19 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220112102443.GZ3366@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=Valentin.Schneider@arm.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=gautham.shenoy@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).