All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Mel Gorman <mgorman@techsingularity.net>,
	Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	pauld@redhat.com, srikar@linux.vnet.ibm.com,
	quentin.perret@arm.com, dietmar.eggemann@arm.com,
	Morten.Rasmussen@arm.com, hdanton@sina.com, parth@linux.ibm.com,
	riel@surriel.com, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains v2
Date: Fri, 20 Dec 2019 12:40:02 +0000	[thread overview]
Message-ID: <d44ae0ff-3bd7-fab1-66d0-71769c078918@arm.com> (raw)
In-Reply-To: <20191220084252.GL3178@techsingularity.net>

On 20/12/2019 08:42, Mel Gorman wrote:
> In general, the patch simply seeks to avoid unnecessarily cross-node
> migrations when a machine is lightly loaded but shows benefits for other
> workloads. While tests are still running, so far it seems to benefit
> light-utilisation smaller workloads on large machines and does not appear
> to do any harm to larger or parallelised workloads.
> 
> [valentin.schneider@arm.com: Reformat code flow, correct comment, use idle_cpus]

I think only the comment bit is still there in this version and it's not
really worth mentioning (but I do thank you for doing it!).

> @@ -8671,6 +8667,39 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  			return;
>  		}
>  
> +		/* Consider allowing a small imbalance between NUMA groups */
> +		if (env->sd->flags & SD_NUMA) {
> +			unsigned int imbalance_adj, imbalance_max;
> +
> +			/*
> +			 * imbalance_adj is the allowable degree of imbalance
> +			 * to exist between two NUMA domains. It's calculated
> +			 * relative to imbalance_pct with a minimum of two
> +			 * tasks or idle CPUs. The choice of two is due to
> +			 * the most basic case of two communicating tasks
> +			 * that should remain on the same NUMA node after
> +			 * wakeup.
> +			 */
> +			imbalance_adj = max(2U, (busiest->group_weight *
> +				(env->sd->imbalance_pct - 100) / 100) >> 1);
> +
> +			/*
> +			 * Ignore small imbalances unless the busiest sd has
> +			 * almost half as many busy CPUs as there are
> +			 * available CPUs in the busiest group. Note that

This is all on the busiest group, so this should be more like:

			 * Ignore small imbalances unless almost half of the
			 * busiest sg's CPUs are busy.

> +			 * it is not exactly half as imbalance_adj must be
> +			 * accounted for or the two domains do not converge
> +			 * as equally balanced if the number of busy tasks is
> +			 * roughly the size of one NUMA domain.
> +			 */
> +			imbalance_max = (busiest->group_weight >> 1) + imbalance_adj;
> +			if (env->imbalance <= imbalance_adj &&

I'm confused now, have we set env->imbalance to anything at this point? AIUI
Vincent's suggestion was to hinge this purely on the weight vs idle_cpus /
nr_running, IOW not use imbalance:

if (sd->flags & SD_NUMA) {                                                                         
	imbalance_adj = max(2U, (busiest->group_weight *                                           
				 (env->sd->imbalance_pct - 100) / 100) >> 1);                      
	imbalance_max = (busiest->group_weight >> 1) + imbalance_adj;                              
                                                                                                     
	if (busiest->idle_cpus >= imbalance_max) {                                                 
		env->imbalance = 0;                                                                
		return;                                                                            
	}                                                                                          
}                                                                                                  
       
Now, I have to say I'm not sold on the idle_cpus thing, I'd much rather use
the number of runnable tasks. We are setting up a threshold for how far we
are willing to ignore imbalances; if we have overloaded CPUs we *really*
should try to solve this. Number of tasks is the safer option IMO: when we
do have one task per CPU, it'll be the same as if we had used idle_cpus, and
when we don't have one task per CPU we'll load-balance more often that we
would have with idle_cpus.

> +			    busiest->idle_cpus >= imbalance_max) {
> +				env->imbalance = 0;
> +				return;
> +			}
> +		}
> +
>  		if (busiest->group_weight == 1 || sds->prefer_sibling) {
>  			unsigned int nr_diff = busiest->sum_nr_running;
>  			/*
> 

  reply	other threads:[~2019-12-20 12:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-20  8:42 [PATCH] sched, fair: Allow a small degree of load imbalance between SD_NUMA domains v2 Mel Gorman
2019-12-20 12:40 ` Valentin Schneider [this message]
2019-12-20 14:22   ` Mel Gorman
2019-12-20 15:32     ` Valentin Schneider
2019-12-21 11:25   ` Mel Gorman
2019-12-22 12:00 ` Srikar Dronamraju
2019-12-23 13:31 ` Vincent Guittot
2019-12-23 13:41   ` Vincent Guittot
2020-01-03 14:31   ` Mel Gorman
2020-01-06 13:55     ` Vincent Guittot
2020-01-06 14:52       ` Mel Gorman
2020-01-07  8:38         ` Vincent Guittot
2020-01-07  9:56           ` Mel Gorman
2020-01-07 11:17             ` Vincent Guittot
2020-01-07 11:56               ` Mel Gorman
2020-01-07 16:00                 ` Vincent Guittot
2020-01-07 20:24                   ` Mel Gorman
2020-01-08  8:25                     ` Vincent Guittot
2020-01-08  8:49                       ` Mel Gorman
2020-01-08 13:18                     ` Peter Zijlstra
2020-01-08 14:03                       ` Mel Gorman
2020-01-08 16:46                         ` Vincent Guittot
2020-01-08 18:03                           ` Mel Gorman
2020-01-07 11:22             ` Peter Zijlstra
2020-01-07 11:42               ` Mel Gorman
2020-01-07 12:29                 ` Peter Zijlstra
2020-01-07 12:28               ` Peter Zijlstra
2020-01-07 19:26             ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d44ae0ff-3bd7-fab1-66d0-71769c078918@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=riel@surriel.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.