All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Chen Yu <yu.c.chen@intel.com>
Cc: linux-kernel@vger.kernel.org, Tim Chen <tim.c.chen@intel.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Barry Song <21cnbao@gmail.com>,
	Barry Song <song.bao.hua@hisilicon.com>,
	Yicong Yang <yangyicong@hisilicon.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Aubrey Li <aubrey.li@intel.com>, Len Brown <len.brown@intel.com>,
	Zhang Rui <rui.zhang@intel.com>
Subject: Re: [PATCH][RFC] sched: Stop searching for idle cpu if the LLC domain is overloaded
Date: Mon, 7 Feb 2022 14:52:53 +0100	[thread overview]
Message-ID: <20220207135253.GF23216@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20220207034013.599214-1-yu.c.chen@intel.com>

On Mon, Feb 07, 2022 at 11:40:13AM +0800, Chen Yu wrote:
> It would be ideal to have a crystal ball to predict the success rate
> of finding an idle cpu/core in the LLC. If it is doomed to fail,
> there is no need to search in the LLC domain. There are many potential
> metrics which could be used to predict the success rate. And the
> metric should be carefully chosen that, it should help reduce the
> unnecessary cpu runqueue scan, but meanwhile not give up the opportunity
> to find an idle cpu.
> 
> Choose average cpu utilization as the candidate, since the util_avg is
> a metric of accumulated historic activity, which seems to be more accurate
> than instantaneous metrics(such as rq->nr_running) on calculating the probability
> of find an idle cpu. Only when the average cpu utilization has reaches
> 85% of the total cpu capacity, this domain is regarded as overloaded.
> The reason to choose 85% is that, this is the threshold of an overloaded
> LLC sched group(imbalance_pct = 117, threshold = 100 / 117 * 100%).

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5146163bfabb..1a58befe892d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c

> @@ -6280,6 +6281,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>  	if (!this_sd)
>  		return -1;
>  
> +	sd_share = rcu_dereference(per_cpu(sd_llc_shared, target));
> +	if (sd_share && READ_ONCE(sd_share->overloaded))
> +		return -1;
> +
>  	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>  
>  	if (sched_feat(SIS_PROP) && !has_idle_core) {

> @@ -9268,6 +9275,29 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>  		WRITE_ONCE(rd->overutilized, SG_OVERUTILIZED);
>  		trace_sched_overutilized_tp(rd, SG_OVERUTILIZED);
>  	}
> +
> +	/*
> +	 * Check if the LLC domain is overloaded. The overload hint
> +	 * could be used to skip the LLC domain idle cpu search in
> +	 * select_idle_cpu(). The update of this hint occurs during
> +	 * periodic load balancing, rather than frequent newidle balance.
> +	 */
> +	if (env->idle != CPU_NEWLY_IDLE &&
> +	    env->sd->span_weight == per_cpu(sd_llc_size, env->dst_cpu)) {
> +		struct sched_domain_shared *sd_share =
> +			rcu_dereference(per_cpu(sd_llc_shared, env->dst_cpu));
> +
> +		if (!sd_share)
> +			return;
> +
> +		/*
> +		 * Derived from group_is_overloaded(). The default imbalance_pct
> +		 * is 117 on LLC domain, which means the threshold of average
> +		 * utilization is 85%.
> +		 */
> +		WRITE_ONCE(sd_share->overloaded, (sds->total_capacity * 100) <
> +			   (sum_util * env->sd->imbalance_pct));
> +	}
>  }

So the only problem I have with this is that this is somewhat of a
binary toggle. The moment we hit that magical 85% we suddenly change
behaviour.

Would it not be possible to replace the SIS_PROP logic with something
based on this sum_util metric? Such that when sum_util is very low we
scan more, while when sum_util hits 85% we naturally stop scanning
entirely.

That way the behaviour is gradual.

  reply	other threads:[~2022-02-07 14:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-07  3:40 [PATCH][RFC] sched: Stop searching for idle cpu if the LLC domain is overloaded Chen Yu
2022-02-07 13:52 ` Peter Zijlstra [this message]
2022-02-08  6:24   ` Chen Yu
2022-02-10  7:57 ` [sched] e9accc2386: stress-ng.dccp.ops_per_sec 46.9% improvement kernel test robot
2022-02-10  7:57   ` kernel test robot
2022-02-23 10:05 ` [PATCH][RFC] sched: Stop searching for idle cpu if the LLC domain is overloaded K Prateek Nayak
2022-02-27  3:02   ` Chen Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220207135253.GF23216@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=21cnbao@gmail.com \
    --cc=aubrey.li@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=rui.zhang@intel.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=yangyicong@hisilicon.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.