All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Valentin Schneider <vschneid@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ricardo Neri <ricardo.neri@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Len Brown <len.brown@intel.com>, Mel Gorman <mgorman@suse.de>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Tim C . Chen" <tim.c.chen@intel.com>
Subject: Re: [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group
Date: Wed, 28 Dec 2022 20:00:03 -0800	[thread overview]
Message-ID: <20221229040003.GA11497@ranerica-svr.sc.intel.com> (raw)
In-Reply-To: <xhsmhv8m3e5sx.mognet@vschneid.remote.csb>

On Thu, Dec 22, 2022 at 04:55:58PM +0000, Valentin Schneider wrote:
> On 22/11/22 12:35, Ricardo Neri wrote:
> > @@ -8926,25 +8924,16 @@ static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> >               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >       }
> >
> > -	/* @dst_cpu has SMT siblings. */
> > -
> > -	if (sg_is_smt) {
> > -		int local_busy_cpus = sds->local->group_weight -
> > -				      sds->local_stat.idle_cpus;
> > -		int busy_cpus_delta = sg_busy_cpus - local_busy_cpus;
> > -
> > -		if (busy_cpus_delta == 1)
> > -			return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> > -
> > -		return false;
> > -	}
> > -
> >       /*
> > -	 * @sg does not have SMT siblings. Ensure that @sds::local does not end
> > -	 * up with more than one busy SMT sibling and only pull tasks if there
> > -	 * are not busy CPUs (i.e., no CPU has running tasks).
> > +	 * @dst_cpu has SMT siblings. Do asym_packing load balancing only if
> > +	 * all its siblings are idle (moving tasks between physical cores in
> > +	 * which some SMT siblings are busy results in the same throughput).
> > +	 *
> > +	 * If the difference in the number of busy CPUs is two or more, let
> > +	 * find_busiest_group() take care of it. We only care if @sg has
> > +	 * exactly one busy CPU. This covers SMT and non-SMT sched groups.
> >        */
> > -	if (!sds->local_stat.sum_nr_running)
> > +	if (sg_busy_cpus == 1 && !sds->local_stat.sum_nr_running)
> >               return sched_asym_prefer(dst_cpu, sg->asym_prefer_cpu);
> >
> 
> Some of this is new to me - I had missed the original series introducing
> this. However ISTM that this is conflating two concepts: SMT occupancy
> balancing, and asym packing.
> 
> Take the !local_is_smt :: sg_busy_cpus >= 2 :: return true; path. It does
> not involve asym packing priorities at all. This can end up in an
> ASYM_PACKING load balance,

Yes, this the desired result: an idle low-priority CPU should help a high-
priority core with more than one busy SMT sibling. But yes, it does not
relate to priorities and can be implemented differently.

> which per calculate_imbalance() tries to move
> *all* tasks to the higher priority CPU - in the case of SMT balancing,
> we don't want to totally empty the core, just its siblings.

But it will not empty the core, only one of its SMT siblings. A single
sibling will be selected in find_busiest_queue(). The other siblings will
be unaffected.

> 
> Is there an ITMT/big.LITTLE (or however x86 calls it) case that invalidates
> the above?

Please see below.

> 
> Say, what's not sufficient with the below? AFAICT the only thing that isn't
> covered is the sg_busy_cpus >= 2 thing, but IMO that's SMT balancing, not
> asym packing - if the current calculate_imbalance() doesn't do it, it
> should be fixed to do it.

Agreed.

>Looking at the
> 
>   local->group_type == group_has_spare
> 
> case, it looks like it should DTRT.

I had tried (and failed) to have find_busiest_group() handle the
!local_is_smt :: sg_busy_cpus >= 2 case. Now I think I made it work.

When the busiest group is not overloaded, find_busiest_group() and
local->group = group_has_spare during an idle load balance events the
number of *idle* CPUs. However, this does not work if the local and busiest
groups have different weights. In SMT2, for instance, if busiest has 2 busy
CPUs (i.e., 0 idle CPUs) and the destination CPU is idle, the difference in
the number of idle CPUs is 1. find_busiest_group() will incorrectly goto
out_balanced.

This issue very visible in Intel hybrid processors because the big cores
have SMT but the small cores do not. It can, however, be reproduced in non-
hybrid processors by offlining the SMT siblings of some cores.

The problem can be fixed by instead balancing the number of *busy* CPUs,
which is what in general we want, IMO. (When two groups have the same
weight, it is equivalent to balancing the number of idle CPUs).

This patch worked for me:

@@ -9787,14 +9787,18 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 			lsub_positive(&nr_diff, local->sum_nr_running);
 			env->imbalance = nr_diff;
 		} else {
+			unsigned int busiest_busy_cpus, local_busy_cpus;
+
+			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
+			local_busy_cpus = local->group_weight - local->idle_cpus;
 
 			/*
 			 * If there is no overload, we just want to even the number of
-			 * idle cpus.
+			 * busy cpus.
 			 */
 			env->migration_type = migrate_task;
-			env->imbalance = max_t(long, 0,
-					       (local->idle_cpus - busiest->idle_cpus));
+			env->imbalance = max_t(long, 0 ,
+					       (busiest_busy_cpus -  local_busy_cpus));
 		}
 
 #ifdef CONFIG_NUMA
@@ -9981,18 +9985,24 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 			 */
 			goto out_balanced;
 
-		if (busiest->group_weight > 1 &&
-		    local->idle_cpus <= (busiest->idle_cpus + 1))
-			/*
-			 * If the busiest group is not overloaded
-			 * and there is no imbalance between this and busiest
-			 * group wrt idle CPUs, it is balanced. The imbalance
-			 * becomes significant if the diff is greater than 1
-			 * otherwise we might end up to just move the imbalance
-			 * on another group. Of course this applies only if
-			 * there is more than 1 CPU per group.
-			 */
-			goto out_balanced;
+		if (busiest->group_weight > 1) {
+			unsigned int local_busy_cpus, busiest_busy_cpus;
+
+			local_busy_cpus = local->group_weight - local->idle_cpus;
+			busiest_busy_cpus = busiest->group_weight - busiest->idle_cpus;
+
+			if (busiest_busy_cpus <= local_busy_cpus + 1)
+				/*
+				 * If the busiest group is not overloaded
+				 * and there is no imbalance between this and busiest
+				 * group wrt busy CPUs, it is balanced. The imbalance
+				 * becomes significant if the diff is greater than 1
+				 * otherwise we might end up to just move the imbalance
+				 * on another group. Of course this applies only if
+				 * there is more than 1 CPU per group.
+				 */
+				goto out_balanced;
+		}
 
 		if (busiest->sum_h_nr_running == 1)
 			/*

With this I can remove the sg_busy_cpus >=2 thing from asym_smt_can_pull_tasks().

> 
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 224107278471f..15eb2d3cff186 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9176,12 +9176,15 @@ static inline bool
>  sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
>  	   struct sched_group *group)
>  {
> -	/* Only do SMT checks if either local or candidate have SMT siblings */
> -	if ((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> -	    (group->flags & SD_SHARE_CPUCAPACITY))
> -		return asym_smt_can_pull_tasks(env->dst_cpu, sds, sgs, group);
> +	/*
> +	 * For SMT, env->idle != CPU_NOT_IDLE isn't sufficient, we need to make
> +	 * sure the whole core is idle.
> +	 */
> +	if (((sds->local->flags & SD_SHARE_CPUCAPACITY) ||
> +	     (group->flags & SD_SHARE_CPUCAPACITY)) &&
> +	    !test_idle_cores(env->dst_cpu))

But test_idle_cores() tests for *any* idle core in the highest domain with the
SD_SHARE_PKG_RESOURCES flag. Here we are only interested in the SMT siblings
of env->dst_cpu. If is_core_idle(env->dst_cpu) is used, then I agree with the
change.

Thanks and BR,
Ricardo

  reply	other threads:[~2022-12-29  3:51 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-22 20:35 [PATCH v2 0/7] x86/sched: Avoid unnecessary migrations within SMT domains Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 1/7] sched/fair: Generalize asym_packing logic for SMT local sched group Ricardo Neri
2022-12-06 17:22   ` Dietmar Eggemann
2022-12-12 17:53     ` Ricardo Neri
2022-12-21 13:03       ` Dietmar Eggemann
2022-12-22  4:32         ` Ricardo Neri
2022-12-22 11:12           ` Dietmar Eggemann
2022-12-23 13:11             ` Ricardo Neri
2022-12-22 16:55   ` Valentin Schneider
2022-12-29  4:00     ` Ricardo Neri [this message]
2023-01-11 16:04       ` Valentin Schneider
2023-01-13 19:02         ` Ricardo Neri
2023-01-16  4:05           ` Ricardo Neri
2023-01-16 19:07             ` Valentin Schneider
2023-01-17 12:49               ` Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 2/7] sched: Prepare sched_asym_prefer() to handle idle state of SMT siblings Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 3/7] sched: Teach arch_asym_cpu_priority() the " Ricardo Neri
2022-12-06 17:54   ` Dietmar Eggemann
2022-12-12 17:54     ` Ricardo Neri
2022-12-21 17:12       ` Dietmar Eggemann
2022-12-22  4:55         ` Ricardo Neri
2022-12-22 16:56           ` Valentin Schneider
2022-11-22 20:35 ` [PATCH v2 4/7] sched/fair: Introduce sched_smt_siblings_idle() Ricardo Neri
2022-12-06 18:03   ` Dietmar Eggemann
2022-12-12 17:54     ` Ricardo Neri
2022-12-22 11:12       ` Dietmar Eggemann
2022-12-22 16:56   ` Valentin Schneider
2022-12-24  5:28     ` Ricardo Neri
2022-12-28 15:29       ` Chen Yu
2022-12-30  0:17         ` Ricardo Neri
2023-01-10 19:21       ` Valentin Schneider
2022-11-22 20:35 ` [PATCH v2 5/7] x86/sched: Remove SD_ASYM_PACKING from the "SMT" domain Ricardo Neri
2022-12-08 16:03   ` Ionela Voinescu
2022-12-14 16:59     ` Ricardo Neri
2022-12-15 16:48       ` Valentin Schneider
2022-12-20  0:42         ` Ricardo Neri
2022-12-22 16:56           ` Valentin Schneider
2022-12-29 19:02             ` Ricardo Neri
2023-01-10 19:17               ` Valentin Schneider
2023-01-13  1:31                 ` Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 6/7] x86/sched/itmt: Give all SMT siblings of a core the same priority Ricardo Neri
2022-11-22 20:35 ` [PATCH v2 7/7] x86/sched/itmt: Consider the idle state of SMT siblings Ricardo Neri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221229040003.GA11497@ranerica-svr.sc.intel.com \
    --to=ricardo.neri-calderon@linux.intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=ricardo.neri@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.