linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Mike Galbraith <efault@gmx.de>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	"Gautham R. Shenoy" <gautham.shenoy@amd.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs
Date: Mon, 6 Dec 2021 15:12:06 +0000	[thread overview]
Message-ID: <20211206151206.GH3366@techsingularity.net> (raw)
In-Reply-To: <20211204104056.GR16608@worktop.programming.kicks-ass.net>

On Sat, Dec 04, 2021 at 11:40:56AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 01, 2021 at 03:18:44PM +0000, Mel Gorman wrote:
> > +	/* Calculate allowed NUMA imbalance */
> > +	for_each_cpu(i, cpu_map) {
> > +		int imb_numa_nr = 0;
> > +
> > +		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
> > +			struct sched_domain *child = sd->child;
> > +
> > +			if (!(sd->flags & SD_SHARE_PKG_RESOURCES) && child &&
> > +			    (child->flags & SD_SHARE_PKG_RESOURCES)) {
> > +				int nr_groups;
> > +
> > +				nr_groups = sd->span_weight / child->span_weight;
> > +				imb_numa_nr = max(1U, ((child->span_weight) >> 1) /
> > +						(nr_groups * num_online_nodes()));
> > +			}
> > +
> > +			sd->imb_numa_nr = imb_numa_nr;
> > +		}
> 
> OK, so let's see. All domains with SHARE_PKG_RESOURCES set will have
> imb_numa_nr = 0, all domains above it will have the same value
> calculated here.
> 
> So far so good I suppose :-)
> 

Good start :)

> Then nr_groups is what it says on the tin; we could've equally well
> iterated sd->groups and gotten the same number, but this is simpler.
> 

I also thought it would be clearer.

> Now, imb_numa_nr is where the magic happens, the way it's written
> doesn't help, but it's something like:
> 
> 	(child->span_weight / 2) / (nr_groups * num_online_nodes())
> 
> With a minimum value of 1. So the larger the system is, or the smaller
> the LLCs, the smaller this number gets, right?
> 

Correct.

> So my ivb-ep that has 20 cpus in a LLC and 2 nodes, will get: (20 / 2)
> / (1 * 2) = 10, while the ivb-ex will get: (20/2) / (1*4) = 5.
> 
> But a Zen box that has only like 4 CPUs per LLC will have 1, regardless
> of how many nodes it has.
> 

The minimum of one was to allow a pair of communicating tasks to remain
on one node even if it's imbalacnced.

> Now, I'm thinking this assumes (fairly reasonable) that the level above
> LLC is a node, but I don't think we need to assume this, while also not
> assuming the balance domain spans the whole machine (yay paritions!).
> 
> 	for (top = sd; top->parent; top = top->parent)
> 		;
> 
> 	nr_llcs = top->span_weight / child->span_weight;
> 	imb_numa_nr = max(1, child->span_weight / nr_llcs);
> 
> which for my ivb-ep gets me:  20 / (40 / 20) = 10
> and the Zen system will have:  4 / (huge number) = 1
> 
> Now, the exp: a / (b / a) is equivalent to a * (a / b) or a^2/b, so we
> can also write the above as:
> 
> 	(child->span_weight * child->span_weight) / top->span_weight;
> 

Gautham had similar reasoning to calculate the imbalance at each
higher-level domain instead of using a static value throughout and
it does make sense. For each level and splitting the imbalance between
two domains, this works out as


	/*
	 * Calculate an allowed NUMA imbalance such that LLCs do not get
	 * imbalanced.
	 */
	for_each_cpu(i, cpu_map) {
		for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
			struct sched_domain *child = sd->child;

			if (!(sd->flags & SD_SHARE_PKG_RESOURCES) && child &&
			    (child->flags & SD_SHARE_PKG_RESOURCES)) {
				struct sched_domain *top = sd;
				unsigned int llc_sq;

				/*
				 * nr_llcs = (top->span_weight / llc_weight);
				 * imb = (child_weight / nr_llcs) >> 1
				 *
				 * is equivalent to
				 *
				 * imb = (llc_weight^2 / top->span_weight) >> 1
				 *
				 */
				llc_sq = child->span_weight * child->span_weight;
				while (top) {
					top->imb_numa_nr = max(1U,
						(llc_sq / top->span_weight) >> 1);
					top = top->parent;
				}

				break;
			}
		}
	}

I'll test this and should have results tomorrow.

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2021-12-06 15:18 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-01 15:18 [PATCH v3 0/2] Adjust NUMA imbalance for multiple LLCs Mel Gorman
2021-12-01 15:18 ` [PATCH 1/2] sched/fair: Use weight of SD_NUMA domain in find_busiest_group Mel Gorman
2021-12-03  8:38   ` Barry Song
2021-12-03  9:51     ` Gautham R. Shenoy
2021-12-03 10:53     ` Mel Gorman
2021-12-01 15:18 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Mel Gorman
2021-12-03  8:15   ` Barry Song
2021-12-03 10:50     ` Mel Gorman
2021-12-03 11:14       ` Barry Song
2021-12-03 13:27         ` Mel Gorman
2021-12-04 10:40   ` Peter Zijlstra
2021-12-06  8:48     ` Gautham R. Shenoy
2021-12-06 14:51       ` Peter Zijlstra
2021-12-06 15:12     ` Mel Gorman [this message]
2021-12-09 14:23       ` Valentin Schneider
2021-12-09 15:43         ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2022-02-08  9:43 [PATCH v6 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-08  9:43 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-08 16:19   ` Gautham R. Shenoy
2022-02-09  5:10   ` K Prateek Nayak
2022-02-09 10:33     ` Mel Gorman
2022-02-11 19:02       ` Jirka Hladky
2022-02-14 10:27   ` Srikar Dronamraju
2022-02-14 11:03   ` Vincent Guittot
2022-02-03 14:46 [PATCH v5 0/2] Adjust NUMA imbalance for " Mel Gorman
2022-02-03 14:46 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2022-02-04  7:06   ` Srikar Dronamraju
2022-02-04  9:04     ` Mel Gorman
2022-02-04 15:07   ` Nayak, KPrateek (K Prateek)
2022-02-04 16:45     ` Mel Gorman
2021-12-10  9:33 [PATCH v4 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-12-10  9:33 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman
2021-12-13  8:28   ` Gautham R. Shenoy
2021-12-13 13:01     ` Mel Gorman
2021-12-13 14:47       ` Gautham R. Shenoy
2021-12-15 11:52         ` Gautham R. Shenoy
2021-12-15 12:25           ` Mel Gorman
2021-12-16 18:33             ` Gautham R. Shenoy
2021-12-20 11:12               ` Mel Gorman
2021-12-21 15:03                 ` Gautham R. Shenoy
2021-12-21 17:13                 ` Vincent Guittot
2021-12-22  8:52                   ` Jirka Hladky
2022-01-04 19:52                     ` Jirka Hladky
2022-01-05 10:42                   ` Mel Gorman
2022-01-05 10:49                     ` Mel Gorman
2022-01-10 15:53                     ` Vincent Guittot
2022-01-12 10:24                       ` Mel Gorman
2021-12-17 19:54   ` Gautham R. Shenoy
2021-11-25 15:19 [PATCH 0/2] Adjust NUMA imbalance for " Mel Gorman
2021-11-25 15:19 ` [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans " Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211206151206.GH3366@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=efault@gmx.de \
    --cc=gautham.shenoy@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).