All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: Barry Song <song.bao.hua@hisilicon.com>,
	vincent.guittot@linaro.org, mgorman@suse.de, mingo@kernel.org,
	peterz@infradead.org, dietmar.eggemann@arm.com,
	morten.rasmussen@arm.com, linux-kernel@vger.kernel.org
Cc: linuxarm@openeuler.org, xuwei5@huawei.com,
	liguozhu@hisilicon.com, tiantao6@hisilicon.com,
	wanghuiqiang@huawei.com, prime.zeng@hisilicon.com,
	jonathan.cameron@huawei.com, guodong.xu@linaro.org,
	Barry Song <song.bao.hua@hisilicon.com>,
	Meelis Roos <mroos@linux.ee>
Subject: Re: [PATCH] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2
Date: Tue, 02 Feb 2021 15:17:15 +0000	[thread overview]
Message-ID: <jhj4kiu4hz8.mognet@arm.com> (raw)
In-Reply-To: <20210201033830.15040-1-song.bao.hua@hisilicon.com>

On 01/02/21 16:38, Barry Song wrote:
> @@ -964,6 +941,12 @@ static void init_overlap_sched_group(struct sched_domain *sd,
>
>       build_balance_mask(sd, sg, mask);
>       cpu = cpumask_first_and(sched_group_span(sg), mask);
> +	/*
> +	 * for the group generated by grandchild, use the sgc of 2nd cpu
> +	 * because the 1st cpu might be used by another sched_group
> +	 */
> +	if (from_grandchild && cpumask_weight(mask) > 1)
> +		cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
>
>       sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);

So you are getting a (hopefully) unique ID for this group span at this
given topology level (i.e. sd->private) but as I had stated in that list of
issues, this creates an sgc that isn't attached to the local group of any
sched_domain, and thus won't get its capacity values updated.

This can actually be seen via the capacity values you're getting at build
time:

> [    0.868907] CPU0 attaching sched-domain(s):
...
> [    0.869542]    domain-2: span=0-5 level=NUMA
> [    0.869559]     groups: 0:{ span=0-3 cap=4002 }, 5:{ span=4-5 cap=2048 }
                                                          ^^^^^^^^^^^^^^^^
> [    0.871177] CPU4 attaching sched-domain(s):
...
> [    0.871200]   groups: 4:{ span=4 cap=977 }, 5:{ span=5 cap=1001 }
> [    0.871243]   domain-1: span=4-7 level=NUMA
> [    0.871257]    groups: 4:{ span=4-5 cap=1978 }, 6:{ span=6-7 cap=1968 }
                                ^^^^^^^^^^^^^^^^

IMO what we want to do here is to hook this CPU0-domain-2-group5 to the sgc
of CPU4-domain1-group4. I've done that in the below diff - this gives us
groups with sgc's owned at lower topology levels, but this will only ever
be true for non-local groups. This has the added benefit of working with
single-CPU nodes. Briefly tested on your topology and the sunfire's (via
QEMU), and I didn't get screamed at.

Before the fun police comes and impounds my keyboard, I'd like to point out
that we could leverage this cross-level sgc referencing hack to further
change the NUMA domains and pretty much get rid of overlapping groups
(that's what I was fumbling with in [1]).

[1]: http://lore.kernel.org/r/jhjwnw11ak2.mognet@arm.com

That is, rather than building overlapping groups and fixing them whenever
that breaks (distance > 2), we could have:
- the local group being the child domain's span (as always)
- all non-local NUMA groups spanning a single node each, with the right sgc
  cross-referencing.

Thoughts?

--->8---
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b748999c9e11..ef43abb6b1fb 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -932,21 +932,15 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
 
 static void init_overlap_sched_group(struct sched_domain *sd,
 				     struct sched_group *sg,
-				     int from_grandchild)
+				     struct sched_domain *grandchild)
 {
 	struct cpumask *mask = sched_domains_tmpmask2;
-	struct sd_data *sdd = sd->private;
+	struct sd_data *sdd = grandchild ? grandchild->private : sd->private;
 	struct cpumask *sg_span;
 	int cpu;
 
 	build_balance_mask(sd, sg, mask);
 	cpu = cpumask_first_and(sched_group_span(sg), mask);
-	/*
-	 * for the group generated by grandchild, use the sgc of 2nd cpu
-	 * because the 1st cpu might be used by another sched_group
-	 */
-	if (from_grandchild && cpumask_weight(mask) > 1)
-		cpu = cpumask_next_and(cpu, sched_group_span(sg), mask);
 
 	sg->sgc = *per_cpu_ptr(sdd->sgc, cpu);
 	if (atomic_inc_return(&sg->sgc->ref) == 1)
@@ -979,7 +973,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 
 	for_each_cpu_wrap(i, span, cpu) {
 		struct cpumask *sg_span;
-		int from_grandchild = 0;
+		bool from_grandchild = false;
 
 		if (cpumask_test_cpu(i, covered))
 			continue;
@@ -1033,7 +1027,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		       !cpumask_subset(sched_domain_span(sibling->child),
 				       span)) {
 			sibling = sibling->child;
-			from_grandchild = 1;
+			from_grandchild = true;
 		}
 
 		sg = build_group_from_child_sched_domain(sibling, cpu);
@@ -1043,7 +1037,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		sg_span = sched_group_span(sg);
 		cpumask_or(covered, covered, sg_span);
 
-		init_overlap_sched_group(sd, sg, from_grandchild);
+		init_overlap_sched_group(sd, sg, from_grandchild ? sibling : NULL);
 
 		if (!first)
 			first = sg;


  parent reply	other threads:[~2021-02-02 15:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-01  3:38 [PATCH] sched/topology: fix the issue groups don't span domain->span for NUMA diameter > 2 Barry Song
2021-02-01  5:36 ` kernel test robot
2021-02-01  5:36   ` kernel test robot
2021-02-01 18:11 ` Valentin Schneider
2021-02-01 21:49   ` Song Bao Hua (Barry Song)
2021-02-02 15:17 ` Valentin Schneider [this message]
2021-02-02 16:48   ` Valentin Schneider
2021-02-03 10:23   ` Song Bao Hua (Barry Song)
2021-02-03 11:42     ` Valentin Schneider
2021-02-03 10:27   ` Song Bao Hua (Barry Song)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jhj4kiu4hz8.mognet@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=guodong.xu@linaro.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=liguozhu@hisilicon.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxarm@openeuler.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=mroos@linux.ee \
    --cc=peterz@infradead.org \
    --cc=prime.zeng@hisilicon.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=tiantao6@hisilicon.com \
    --cc=vincent.guittot@linaro.org \
    --cc=wanghuiqiang@huawei.com \
    --cc=xuwei5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.