From: "Zengtao (B)" <prime.zeng@hisilicon.com> To: Valentin Schneider <valentin.schneider@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Sudeep Holla <sudeep.holla@arm.com>, Robin Murphy <robin.murphy@arm.com>, Jeremy Linton <Jeremy.Linton@arm.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com> Subject: RE: [PATCH] arm64: topology: Stop using MPIDR for topology information Date: Wed, 2 Sep 2020 03:24:17 +0000 [thread overview] Message-ID: <678F3D1BB717D949B966B68EAEB446ED482417F4@DGGEMM506-MBX.china.huawei.com> (raw) In-Reply-To: <20200829130016.26106-1-valentin.schneider@arm.com> Hi Valentin: > -----Original Message----- > From: Valentin Schneider [mailto:valentin.schneider@arm.com] > Sent: Saturday, August 29, 2020 9:00 PM > To: linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org > Cc: Catalin Marinas; Will Deacon; Sudeep Holla; Robin Murphy; Jeremy > Linton; Dietmar Eggemann; Morten Rasmussen; Zengtao (B) > Subject: [PATCH] arm64: topology: Stop using MPIDR for topology > information > > In the absence of ACPI or DT topology data, we fallback to haphazardly > decoding *something* out of MPIDR. Sadly, the contents of that register > are > mostly unusable due to the implementation leniancy and things like Aff0 > having to be capped to 15 (despite being encoded on 8 bits). > > Consider a simple system with a single package of 32 cores, all under the > same LLC. We ought to be shoving them in the same core_sibling mask, > but > MPIDR is going to look like: > > | CPU | 0 | ... | 15 | 16 | ... | 31 | > |------+---+-----+----+----+-----+----+ > | Aff0 | 0 | ... | 15 | 0 | ... | 15 | > | Aff1 | 0 | ... | 0 | 1 | ... | 1 | > | Aff2 | 0 | ... | 0 | 0 | ... | 0 | > > Which will eventually yield > > core_sibling(0-15) == 0-15 > core_sibling(16-31) == 16-31 > > NUMA woes > ========= > > If we try to play games with this and set up NUMA boundaries within those > groups of 16 cores via e.g. QEMU: > > # Node0: 0-9; Node1: 10-19 > $ qemu-system-aarch64 <blah> \ > -smp 20 -numa node,cpus=0-9,nodeid=0 -numa > node,cpus=10-19,nodeid=1 > > The scheduler's MC domain (all CPUs with same LLC) is going to be built via > > arch_topology.c::cpu_coregroup_mask() > > In there we try to figure out a sensible mask out of the topology > information we have. In short, here we'll pick the smallest of NUMA or > core sibling mask. > > node_mask(CPU9) == 0-9 > core_sibling(CPU9) == 0-15 > > MC mask for CPU9 will thus be 0-9, not a problem. > > node_mask(CPU10) == 10-19 > core_sibling(CPU10) == 0-15 > > MC mask for CPU10 will thus be 10-19, not a problem. > > node_mask(CPU16) == 10-19 > core_sibling(CPU16) == 16-19 > > MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two > different unique MC spans, and the scheduler has no idea what to make of > that. That triggers the WARN_ON() added by commit > > ccf74128d66c ("sched/topology: Assert non-NUMA topology masks > don't (partially) overlap") > > Fixing MPIDR-derived topology > ============================= > > We could try to come up with some cleverer scheme to figure out which of > the available masks to pick, but really if one of those masks resulted from > MPIDR then it should be discarded because it's bound to be bogus. > > I was hoping to give MPIDR a chance for SMT, to figure out which threads > are > in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out > to me that there are systems out there where *all* cores have non-zero > values in their higher affinity fields (e.g. RK3288 has "5" in all of its > cores' MPIDR.Aff1), which would expose a bogus core ID to userspace. > > Stop using MPIDR for topology information. When no other source of > topology > information is available, mark each CPU as its own core and its NUMA > node > as its LLC domain. I agree with your idea to remove the topology functionality of MPIDR , but I think we need also consider ARM32 and GIC. > > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> > --- > arch/arm64/kernel/topology.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index 0801a0f3c156..ff1dd1dbfe64 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -36,21 +36,23 @@ void store_cpu_topology(unsigned int cpuid) > if (mpidr & MPIDR_UP_BITMASK) > return; > > - /* Create cpu topology mapping based on MPIDR. */ > - if (mpidr & MPIDR_MT_BITMASK) { > - /* Multiprocessor system : Multi-threads per core */ > - cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); > - cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); > - cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) | > - MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; > - } else { > - /* Multiprocessor system : Single-thread per core */ > - cpuid_topo->thread_id = -1; > - cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); > - cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) | > - MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | > - MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; > - } > + /* > + * This would be the place to create cpu topology based on MPIDR. > + * > + * However, it cannot be trusted to depict the actual topology; some > + * pieces of the architecture enforce an artificial cap on Aff0 values > + * (e.g. GICv3's ICC_SGI1R_EL1 limits it to 15), leading to an > + * artificial cycling of Aff1, Aff2 and Aff3 values. IOW, these end up > + * having absolutely no relationship to the actual underlying system > + * topology, and cannot be reasonably used as core / package ID. > + * > + * If the MT bit is set, Aff0 *could* be used to define a thread ID, but > + * we still wouldn't be able to obtain a sane core ID. This means we > + * need to entirely ignore MPIDR for any topology deduction. > + */ > + cpuid_topo->thread_id = -1; > + cpuid_topo->core_id = cpuid; > + cpuid_topo->package_id = cpu_to_node(cpuid); > > pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", > cpuid, cpuid_topo->package_id, cpuid_topo->core_id, > -- > 2.27.0
WARNING: multiple messages have this Message-ID (diff)
From: "Zengtao (B)" <prime.zeng@hisilicon.com> To: Valentin Schneider <valentin.schneider@arm.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com>, Robin Murphy <robin.murphy@arm.com>, Jeremy Linton <Jeremy.Linton@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Sudeep Holla <sudeep.holla@arm.com>, Will Deacon <will@kernel.org>, Dietmar Eggemann <dietmar.eggemann@arm.com> Subject: RE: [PATCH] arm64: topology: Stop using MPIDR for topology information Date: Wed, 2 Sep 2020 03:24:17 +0000 [thread overview] Message-ID: <678F3D1BB717D949B966B68EAEB446ED482417F4@DGGEMM506-MBX.china.huawei.com> (raw) In-Reply-To: <20200829130016.26106-1-valentin.schneider@arm.com> Hi Valentin: > -----Original Message----- > From: Valentin Schneider [mailto:valentin.schneider@arm.com] > Sent: Saturday, August 29, 2020 9:00 PM > To: linux-kernel@vger.kernel.org; linux-arm-kernel@lists.infradead.org > Cc: Catalin Marinas; Will Deacon; Sudeep Holla; Robin Murphy; Jeremy > Linton; Dietmar Eggemann; Morten Rasmussen; Zengtao (B) > Subject: [PATCH] arm64: topology: Stop using MPIDR for topology > information > > In the absence of ACPI or DT topology data, we fallback to haphazardly > decoding *something* out of MPIDR. Sadly, the contents of that register > are > mostly unusable due to the implementation leniancy and things like Aff0 > having to be capped to 15 (despite being encoded on 8 bits). > > Consider a simple system with a single package of 32 cores, all under the > same LLC. We ought to be shoving them in the same core_sibling mask, > but > MPIDR is going to look like: > > | CPU | 0 | ... | 15 | 16 | ... | 31 | > |------+---+-----+----+----+-----+----+ > | Aff0 | 0 | ... | 15 | 0 | ... | 15 | > | Aff1 | 0 | ... | 0 | 1 | ... | 1 | > | Aff2 | 0 | ... | 0 | 0 | ... | 0 | > > Which will eventually yield > > core_sibling(0-15) == 0-15 > core_sibling(16-31) == 16-31 > > NUMA woes > ========= > > If we try to play games with this and set up NUMA boundaries within those > groups of 16 cores via e.g. QEMU: > > # Node0: 0-9; Node1: 10-19 > $ qemu-system-aarch64 <blah> \ > -smp 20 -numa node,cpus=0-9,nodeid=0 -numa > node,cpus=10-19,nodeid=1 > > The scheduler's MC domain (all CPUs with same LLC) is going to be built via > > arch_topology.c::cpu_coregroup_mask() > > In there we try to figure out a sensible mask out of the topology > information we have. In short, here we'll pick the smallest of NUMA or > core sibling mask. > > node_mask(CPU9) == 0-9 > core_sibling(CPU9) == 0-15 > > MC mask for CPU9 will thus be 0-9, not a problem. > > node_mask(CPU10) == 10-19 > core_sibling(CPU10) == 0-15 > > MC mask for CPU10 will thus be 10-19, not a problem. > > node_mask(CPU16) == 10-19 > core_sibling(CPU16) == 16-19 > > MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two > different unique MC spans, and the scheduler has no idea what to make of > that. That triggers the WARN_ON() added by commit > > ccf74128d66c ("sched/topology: Assert non-NUMA topology masks > don't (partially) overlap") > > Fixing MPIDR-derived topology > ============================= > > We could try to come up with some cleverer scheme to figure out which of > the available masks to pick, but really if one of those masks resulted from > MPIDR then it should be discarded because it's bound to be bogus. > > I was hoping to give MPIDR a chance for SMT, to figure out which threads > are > in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out > to me that there are systems out there where *all* cores have non-zero > values in their higher affinity fields (e.g. RK3288 has "5" in all of its > cores' MPIDR.Aff1), which would expose a bogus core ID to userspace. > > Stop using MPIDR for topology information. When no other source of > topology > information is available, mark each CPU as its own core and its NUMA > node > as its LLC domain. I agree with your idea to remove the topology functionality of MPIDR , but I think we need also consider ARM32 and GIC. > > Signed-off-by: Valentin Schneider <valentin.schneider@arm.com> > --- > arch/arm64/kernel/topology.c | 32 +++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index 0801a0f3c156..ff1dd1dbfe64 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -36,21 +36,23 @@ void store_cpu_topology(unsigned int cpuid) > if (mpidr & MPIDR_UP_BITMASK) > return; > > - /* Create cpu topology mapping based on MPIDR. */ > - if (mpidr & MPIDR_MT_BITMASK) { > - /* Multiprocessor system : Multi-threads per core */ > - cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); > - cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); > - cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) | > - MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; > - } else { > - /* Multiprocessor system : Single-thread per core */ > - cpuid_topo->thread_id = -1; > - cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); > - cpuid_topo->package_id = MPIDR_AFFINITY_LEVEL(mpidr, 1) | > - MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | > - MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; > - } > + /* > + * This would be the place to create cpu topology based on MPIDR. > + * > + * However, it cannot be trusted to depict the actual topology; some > + * pieces of the architecture enforce an artificial cap on Aff0 values > + * (e.g. GICv3's ICC_SGI1R_EL1 limits it to 15), leading to an > + * artificial cycling of Aff1, Aff2 and Aff3 values. IOW, these end up > + * having absolutely no relationship to the actual underlying system > + * topology, and cannot be reasonably used as core / package ID. > + * > + * If the MT bit is set, Aff0 *could* be used to define a thread ID, but > + * we still wouldn't be able to obtain a sane core ID. This means we > + * need to entirely ignore MPIDR for any topology deduction. > + */ > + cpuid_topo->thread_id = -1; > + cpuid_topo->core_id = cpuid; > + cpuid_topo->package_id = cpu_to_node(cpuid); > > pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", > cpuid, cpuid_topo->package_id, cpuid_topo->core_id, > -- > 2.27.0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-09-02 3:24 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-29 13:00 [PATCH] arm64: topology: Stop using MPIDR for topology information Valentin Schneider 2020-08-29 13:00 ` Valentin Schneider 2020-09-02 3:24 ` Zengtao (B) [this message] 2020-09-02 3:24 ` Zengtao (B) 2020-09-02 10:50 ` Sudeep Holla 2020-09-02 10:50 ` Sudeep Holla 2020-09-02 10:52 ` Valentin Schneider 2020-09-02 10:52 ` Valentin Schneider 2020-09-03 1:44 ` Zengtao (B) 2020-09-03 1:44 ` Zengtao (B) 2020-09-09 11:21 ` Valentin Schneider 2020-09-09 11:21 ` Valentin Schneider 2020-09-02 10:04 ` Sudeep Holla 2020-09-02 10:04 ` Sudeep Holla 2020-09-02 10:52 ` Valentin Schneider 2020-09-02 10:52 ` Valentin Schneider 2020-09-07 21:35 ` Will Deacon 2020-09-07 21:35 ` Will Deacon
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=678F3D1BB717D949B966B68EAEB446ED482417F4@DGGEMM506-MBX.china.huawei.com \ --to=prime.zeng@hisilicon.com \ --cc=Jeremy.Linton@arm.com \ --cc=catalin.marinas@arm.com \ --cc=dietmar.eggemann@arm.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=morten.rasmussen@arm.com \ --cc=robin.murphy@arm.com \ --cc=sudeep.holla@arm.com \ --cc=valentin.schneider@arm.com \ --cc=will@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.