All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@kernel.org>, Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Hillf Danton <hdanton@sina.com>, Parth Shah <parth@linux.ibm.com>,
	Rik van Riel <riel@surriel.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains v2
Date: Tue, 7 Jan 2020 12:22:55 +0100	[thread overview]
Message-ID: <20200107112255.GV2827@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20200107095655.GF3466@techsingularity.net>

On Tue, Jan 07, 2020 at 09:56:55AM +0000, Mel Gorman wrote:

> Much more importantly, doing what you suggest allows an imbalance
> of more CPUs than are backed by a single LLC. On high-end AMD EPYC 2
> machines, busiest->group_weight scaled by imbalance_pct spans multiple L3
> caches. That is going to have side-effects. While I also do not account
> for the LLC group_weight, it's unlikely the cut-off I used would be
> smaller than an LLC cache on a large machine as the cache.
> 
> These two points are why I didn't take the group weight into account.
> 
> Now if you want, I can do what you suggest anyway as long as you are happy
> that the child domain weight is also taken into account and to bound the
> largest possible allowed imbalance to deal with the case of a node having
> multiple small LLC caches. That means that some machines will be using the
> size of the node and some machines will use the size of an LLC. It's less
> predictable overall as some machines will be "special" relative to others
> making it harder to reproduce certain problems locally but it would take
> imbalance_pct into account in a way that you're happy with.
> 
> Also bear in mind that whether LLC is accounted for or not, the final
> result should be halved similar to the other imbalance calculations to
> avoid over or under load balancing.

> +		/* Consider allowing a small imbalance between NUMA groups */
> +		if (env->sd->flags & SD_NUMA) {
> +			struct sched_domain *child = env->sd->child;

This assumes sd-child exists, which should be true for NUMA domains I
suppose.

> +			unsigned int imbalance_adj;
> +
> +			/*
> +			 * Calculate an acceptable degree of imbalance based
> +			 * on imbalance_adj. However, do not allow a greater
> +			 * imbalance than the child domains weight to avoid
> +			 * a case where the allowed imbalance spans multiple
> +			 * LLCs.
> +			 */

That comment is a wee misleading, @child is not an LLC per se. This
could be the NUMA distance 2 domain, in which case @child is the NUMA
distance 1 group.

That said, even then it probably makes sense to ensure you don't idle a
whole smaller distance group.

> +			imbalance_adj = busiest->group_weight * (env->sd->imbalance_pct - 100) / 100;
> +			imbalance_adj = min(imbalance_adj, child->span_weight);
> +			imbalance_adj >>= 1;
> +
> +			/*
> +			 * Ignore small imbalances when the busiest group has
> +			 * low utilisation.
> +			 */
> +			if (busiest->sum_nr_running < imbalance_adj)
> +				env->imbalance = 0;
> +		}
> +
>  		return;
>  	}
>  

  parent reply	other threads:[~2020-01-07 11:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20  8:42 [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains v2 Mel Gorman
2019-12-20 12:40 ` Valentin Schneider
2019-12-20 14:22   ` Mel Gorman
2019-12-20 15:32     ` Valentin Schneider
2019-12-21 11:25   ` Mel Gorman
2019-12-22 12:00 ` Srikar Dronamraju
2019-12-23 13:31 ` Vincent Guittot
2019-12-23 13:41   ` Vincent Guittot
2020-01-03 14:31   ` Mel Gorman
2020-01-06 13:55     ` Vincent Guittot
2020-01-06 14:52       ` Mel Gorman
2020-01-07  8:38         ` Vincent Guittot
2020-01-07  9:56           ` Mel Gorman
2020-01-07 11:17             ` Vincent Guittot
2020-01-07 11:56               ` Mel Gorman
2020-01-07 16:00                 ` Vincent Guittot
2020-01-07 20:24                   ` Mel Gorman
2020-01-08  8:25                     ` Vincent Guittot
2020-01-08  8:49                       ` Mel Gorman
2020-01-08 13:18                     ` Peter Zijlstra
2020-01-08 14:03                       ` Mel Gorman
2020-01-08 16:46                         ` Vincent Guittot
2020-01-08 18:03                           ` Mel Gorman
2020-01-07 11:22             ` Peter Zijlstra [this message]
2020-01-07 11:42               ` Mel Gorman
2020-01-07 12:29                 ` Peter Zijlstra
2020-01-07 12:28               ` Peter Zijlstra
2020-01-07 19:26             ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200107112255.GV2827@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.