All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Rik van Riel <riel@redhat.com>
Cc: Artem Bityutskiy <dedekind1@gmail.com>,
	linux-kernel@vger.kernel.org, mgorman@suse.de,
	jhladky@redhat.com
Subject: Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination
Date: Wed, 6 May 2015 19:00:38 +0200	[thread overview]
Message-ID: <20150506170038.GB23123@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20150506114128.0c846a37@cuia.bos.redhat.com>

On Wed, May 06, 2015 at 11:41:28AM -0400, Rik van Riel wrote:

> Peter, Mel, I think it may be time to stop waiting for the impedance
> mismatch between the load balancer and NUMA balancing to be resolved,
> and try to just avoid the issue in the NUMA balancing code...

That's a wee bit unfair since we 'all' decided to let the numa thing
rest for a while. So obviously that issue didn't get resolved.

>  kernel/sched/fair.c | 30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ffeaa4105e48..480e6a35ab35 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1409,6 +1409,30 @@ static void task_numa_find_cpu(struct task_numa_env *env,
>  	}
>  }
>  
> +/* Only move tasks to a NUMA node less busy than the current node. */
> +static bool numa_has_capacity(struct task_numa_env *env)
> +{
> +	struct numa_stats *src = &env->src_stats;
> +	struct numa_stats *dst = &env->dst_stats;
> +
> +	if (src->has_free_capacity && !dst->has_free_capacity)
> +		return false;
> +
> +	/*
> +	 * Only consider a task move if the source has a higher destination
> +	 * than the destination, corrected for CPU capacity on each node.
> +	 *
> +	 *      src->load                dst->load
> +	 * --------------------- vs ---------------------
> +	 * src->compute_capacity    dst->compute_capacity
> +	 */
> +	if (src->load * dst->compute_capacity >
> +	    dst->load * src->compute_capacity)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int task_numa_migrate(struct task_struct *p)
>  {
>  	struct task_numa_env env = {
> @@ -1463,7 +1487,8 @@ static int task_numa_migrate(struct task_struct *p)
>  	update_numa_stats(&env.dst_stats, env.dst_nid);
>  
>  	/* Try to find a spot on the preferred nid. */
> -	task_numa_find_cpu(&env, taskimp, groupimp);
> +	if (numa_has_capacity(&env))
> +		task_numa_find_cpu(&env, taskimp, groupimp);
>  
>  	/*
>  	 * Look at other nodes in these cases:
> @@ -1494,7 +1519,8 @@ static int task_numa_migrate(struct task_struct *p)
>  			env.dist = dist;
>  			env.dst_nid = nid;
>  			update_numa_stats(&env.dst_stats, env.dst_nid);
> -			task_numa_find_cpu(&env, taskimp, groupimp);
> +			if (numa_has_capacity(&env))
> +				task_numa_find_cpu(&env, taskimp, groupimp);
>  		}
>  	}

Does this not 'duplicate' the logic that we tried for with
task_numa_compare():balance section? That is where we try to avoid
making a decision that the regular load-balancer will dislike and undo.

Alternatively; you can view that as a cpu guard and the proposed as a
node guard, in which case, should it not live inside
task_numa_find_cpu()? Instead of guarding all call sites.

In any case, should we mix a bit of imbalance_pct in there?

/me goes ponder this a bit further..

  reply	other threads:[~2015-05-06 17:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-06 10:35 autoNUMA web workload regression Artem Bityutskiy
2015-05-06 10:37 ` Bityutskiy, Artem
2015-05-06 14:40 ` Rik van Riel
2015-05-06 15:41 ` [PATCH] numa,sched: only consider less busy nodes as numa balancing destination Rik van Riel
2015-05-06 17:00   ` Peter Zijlstra [this message]
2015-05-06 17:06     ` Rik van Riel
2015-05-07 13:29   ` Artem Bityutskiy
2015-05-08 13:13   ` Artem Bityutskiy
2015-05-08 20:03     ` Rik van Riel
2015-05-08 22:52       ` Rik van Riel
2015-05-11 11:11       ` Artem Bityutskiy
2015-05-11 14:20         ` Rik van Riel
2015-05-12 13:50       ` Artem Bityutskiy
2015-05-12 15:45         ` Rik van Riel
2015-05-13  6:29           ` Peter Zijlstra
2015-05-13  6:31             ` Peter Zijlstra
2015-05-13 10:59             ` Artem Bityutskiy
2015-05-13 13:51             ` Rik van Riel
2015-05-11 12:44   ` Jirka Hladky
2015-05-11 14:44     ` Rik van Riel
2015-05-26 20:29   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150506170038.GB23123@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=dedekind1@gmail.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.