Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core

From: Abel Wu <wuyun.abel@bytedance.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Josh Don <joshdon@google.com>, Chen Yu <yu.c.chen@intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: Re: [PATCH 1/5] sched/fair: ignore SIS_UTIL when has idle core
Date: Mon, 5 Sep 2022 22:40:00 +0800	[thread overview]
Message-ID: <1fc40679-b7c3-24f2-aa27-f1edab71228e@bytedance.com> (raw)
In-Reply-To: <20220902102528.keooutttg3hq3sy5@techsingularity.net>

On 9/2/22 6:25 PM, Mel Gorman Wrote:
> For the simple case, I was expecting the static depth to *not* match load
> because it's unclear what the scaling should be for load or if it had a
> benefit. If investigating scaling the scan depth to load, it would still
> make sense to compare it to a static depth. The depth of 2 cores was to
> partially match the old SIS_PROP behaviour of the minimum depth to scan.
> 
>                  if (span_avg > 4*avg_cost)
>                          nr = div_u64(span_avg, avg_cost);
>                  else
>                          nr = 4;
> 
> nr is not proportional to cores although it could be
> https://lore.kernel.org/all/20210726102247.21437-7-mgorman@techsingularity.net/
> 
> This is not tested or properly checked for correctness but for
> illustrative purposes something like this should conduct a limited scan when
> overloaded. It has a side-effect that the has_idle_cores hint gets cleared
> for a partial scan for idle cores but the hint is probably wrong anyway.
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6089251a4720..59b27a2ef465 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6427,21 +6427,36 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>   		if (sd_share) {
>   			/* because !--nr is the condition to stop scan */
>   			nr = READ_ONCE(sd_share->nr_idle_scan) + 1;
> -			/* overloaded LLC is unlikely to have idle cpu/core */
> -			if (nr == 1)
> -				return -1;
> +
> +			/*
> +			 * Non-overloaded case: Scan full domain if there is
> +			 * 	an idle core. Otherwise, scan for an idle
> +			 * 	CPU based on nr_idle_scan
> +			 * Overloaded case: Unlikely to have an idle CPU but
> +			 * 	conduct a limited scan if there is potentially
> +			 * 	an idle core.
> +			 */
> +			if (nr > 1) {
> +				if (has_idle_core)
> +					nr = sd->span_weight;
> +			} else {
> +				if (!has_idle_core)
> +					return -1;
> +				nr = 2;
> +			}
>   		}
>   	}
>   
>   	for_each_cpu_wrap(cpu, cpus, target + 1) {
> +		if (!--nr)
> +			break;
> +
>   		if (has_idle_core) {
>   			i = select_idle_core(p, cpu, cpus, &idle_cpu);
>   			if ((unsigned int)i < nr_cpumask_bits)
>   				return i;
>   
>   		} else {
> -			if (!--nr)
> -				return -1;
>   			idle_cpu = __select_idle_cpu(cpu, p);
>   			if ((unsigned int)idle_cpu < nr_cpumask_bits)
>   				break;

I spent last few days testing this, with 3 variations (assume
has_idle_core):

  a) full or limited (2cores) scan when !nr_idle_scan
  b) whether clear sds->has_idle_core when partial scan failed
  c) scale scan depth with load or not

some observations:

  1) It seems always bad if not clear sds->has_idle_core when
     partial scan fails. It is due to over partially scanned
     but still can not find an idle core. (Following ones are
     based on clearing has_idle_core even in partial scans.)

  2) Unconditionally full scan when has_idle_core is not good
     for netperf_{udp,tcp} and tbench4. It is probably because
     the SIS success rate of these workloads is already high
     enough (netperf ~= 100%, tbench4 ~= 50%, compared to that
     hackbench ~= 3.5%) which negate a lot of the benefit full
     scan brings.

  3) Scaling scan depth with load seems good for the hackbench
     socket tests, and neutral in pipe tests. And I think this
     is just the case you mentioned before, under fast wake-up
     workloads the has_idle_core will become not that reliable,
     so a full scan won't always win.

Best Regards,
Abel