Linux-ACPI Archive on lore.kernel.org
 help / color / Atom feed
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Barry Song <song.bao.hua@hisilicon.com>,
	tim.c.chen@linux.intel.com, catalin.marinas@arm.com,
	will@kernel.org, rjw@rjwysocki.net, vincent.guittot@linaro.org,
	bp@alien8.de, tglx@linutronix.de, mingo@redhat.com,
	lenb@kernel.org, peterz@infradead.org, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de
Cc: msys.mizuma@gmail.com, valentin.schneider@arm.com,
	gregkh@linuxfoundation.org, jonathan.cameron@huawei.com,
	juri.lelli@redhat.com, mark.rutland@arm.com,
	sudeep.holla@arm.com, aubrey.li@linux.intel.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	x86@kernel.org, xuwei5@huawei.com, prime.zeng@hisilicon.com,
	guodong.xu@linaro.org, yangyicong@huawei.com,
	liguozhu@hisilicon.com, linuxarm@openeuler.org, hpa@zytor.com
Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC
Date: Tue, 27 Apr 2021 13:35:35 +0200
Message-ID: <80f489f9-8c88-95d8-8241-f0cfd2c2ac66@arm.com> (raw)
In-Reply-To: <20210420001844.9116-4-song.bao.hua@hisilicon.com>

On 20/04/2021 02:18, Barry Song wrote:

[...]

> @@ -5786,11 +5786,12 @@ static void record_wakee(struct task_struct *p)
>   * whatever is irrelevant, spread criteria is apparent partner count exceeds
>   * socket size.
>   */
> -static int wake_wide(struct task_struct *p)
> +static int wake_wide(struct task_struct *p, int cluster)
>  {
>  	unsigned int master = current->wakee_flips;
>  	unsigned int slave = p->wakee_flips;
> -	int factor = __this_cpu_read(sd_llc_size);
> +	int factor = cluster ? __this_cpu_read(sd_cluster_size) :
> +		__this_cpu_read(sd_llc_size);

I don't see that the wake_wide() change has any effect here. None of the
sched domains has SD_BALANCE_WAKE set so a wakeup (WF_TTWU) can never
end up in the slow path.
Have you seen a diff when running your `lmbench stream` workload in what
wake_wide() returns when you use `sd cluster size` instead of `sd llc
size` as factor?

I guess for you,  wakeups are now subdivided into faster (cluster = 4
CPUs) and fast (llc = 24 CPUs) via sis(), not into fast (sis()) and slow
(find_idlest_cpu()).

>  
>  	if (master < slave)
>  		swap(master, slave);

[...]

> @@ -6745,6 +6748,12 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  	int want_affine = 0;
>  	/* SD_flags and WF_flags share the first nibble */
>  	int sd_flag = wake_flags & 0xF;
> +	/*
> +	 * if cpu and prev_cpu share LLC, consider cluster sibling rather
> +	 * than llc. this is typically true while tasks are bound within
> +	 * one numa
> +	 */
> +	int cluster = sched_cluster_active() && cpus_share_cache(cpu, prev_cpu, 0);

So you changed from scanning cluster before LLC to scan either cluster
or LLC.

And this is based on whether `this_cpu` and `prev_cpu` are sharing LLC
or not. So you only see an effect when running the workload with
`numactl -N X ...`.

>  
>  	if (wake_flags & WF_TTWU) {
>  		record_wakee(p);
> @@ -6756,7 +6765,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  			new_cpu = prev_cpu;
>  		}
>  
> -		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, p->cpus_ptr);
> +		want_affine = !wake_wide(p, cluster) && cpumask_test_cpu(cpu, p->cpus_ptr);
>  	}
>  
>  	rcu_read_lock();
> @@ -6768,7 +6777,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  		if (want_affine && (tmp->flags & SD_WAKE_AFFINE) &&
>  		    cpumask_test_cpu(prev_cpu, sched_domain_span(tmp))) {
>  			if (cpu != prev_cpu)
> -				new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync);
> +				new_cpu = wake_affine(tmp, p, cpu, prev_cpu, sync, cluster);
>  
>  			sd = NULL; /* Prefer wake_affine over balance flags */
>  			break;
> @@ -6785,7 +6794,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
>  		new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
>  	} else if (wake_flags & WF_TTWU) { /* XXX always ? */
>  		/* Fast path */
> -		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> +		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu, cluster);
>  
>  		if (want_affine)
>  			current->recent_used_cpu = cpu;

[...]

  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20  0:18 [RFC PATCH v6 0/4] scheduler: expose the topology of clusters and add cluster scheduler Barry Song
2021-04-20  0:18 ` [RFC PATCH v6 1/4] topology: Represent clusters of CPUs within a die Barry Song
2021-04-28  9:48   ` Andrew Jones
2021-04-30  3:46     ` Song Bao Hua (Barry Song)
2021-04-20  0:18 ` [RFC PATCH v6 2/4] scheduler: add scheduler level for clusters Barry Song
2021-04-20  0:18 ` [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks within one LLC Barry Song
2021-04-27 11:35   ` Dietmar Eggemann [this message]
2021-04-28  9:51     ` Song Bao Hua (Barry Song)
2021-04-28 13:04       ` Vincent Guittot
2021-04-28 16:47         ` Dietmar Eggemann
     [not found]           ` <185746c4d02a485ca8f3509439328b26@hisilicon.com>
2021-04-30 10:42             ` Dietmar Eggemann
2021-05-03  6:19               ` Song Bao Hua (Barry Song)
2021-05-03 11:35               ` Song Bao Hua (Barry Song)
2021-05-05 12:29                 ` Dietmar Eggemann
2021-05-07 13:07                   ` Song Bao Hua (Barry Song)
2021-05-13 12:32                     ` Dietmar Eggemann
2021-05-25  8:14                       ` Song Bao Hua (Barry Song)
2021-05-26  9:54                       ` Song Bao Hua (Barry Song)
2021-04-20  0:18 ` [RFC PATCH v6 4/4] scheduler: Add cluster scheduler level for x86 Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80f489f9-8c88-95d8-8241-f0cfd2c2ac66@arm.com \
    --to=dietmar.eggemann@arm.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guodong.xu@linaro.org \
    --cc=hpa@zytor.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=juri.lelli@redhat.com \
    --cc=lenb@kernel.org \
    --cc=liguozhu@hisilicon.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=msys.mizuma@gmail.com \
    --cc=peterz@infradead.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=song.bao.hua@hisilicon.com \
    --cc=sudeep.holla@arm.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xuwei5@huawei.com \
    --cc=yangyicong@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ACPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-acpi/0 linux-acpi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-acpi linux-acpi/ https://lore.kernel.org/linux-acpi \
		linux-acpi@vger.kernel.org
	public-inbox-index linux-acpi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-acpi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git