From: Thomas Gleixner <tglx@linutronix.de>
To: Usama Arif <usama.arif@bytedance.com>,
dwmw2@infradead.org, arjan@linux.intel.com
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com,
paulmck@kernel.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de,
hewenliang4@huawei.com, thomas.lendacky@amd.com,
seanjc@google.com, pmenzel@molgen.mpg.de,
fam.zheng@bytedance.com, punit.agrawal@bytedance.com,
simon.evans@bytedance.com, liangma@liangbit.com,
David Woodhouse <dwmw@amazon.co.uk>
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask
Date: Tue, 07 Feb 2023 00:20:43 +0100 [thread overview]
Message-ID: <87a61qxtx0.ffs@tglx> (raw)
In-Reply-To: <20230202215625.3248306-2-usama.arif@bytedance.com>
Usama!
On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> For each CPU being brought up, the alloc_clustermask() function
> allocates a new struct cluster_mask just in case it's needed. Then the
> target CPU actually runs, and in init_x2apic_ldr() it either uses a
> cluster_mask from a previous CPU in the same cluster, or consumes the
> "spare" one and sets the global pointer to NULL.
>
> That isn't going to parallelise stunningly well.
>
> Ditch the global variable, let alloc_clustermask() install the struct
> *directly* in the per_cpu data for the CPU being brought up. As an
> optimisation, actually make it do so for *all* present CPUs in the same
> cluster, which means only one iteration over for_each_present_cpu()
> instead of doing so repeatedly, once for each CPU.
>
> Now, in fact, there's no point in the 'node' or 'clusterid' members of
> the struct cluster_mask, so just kill it and use struct cpumask instead.
>
> This was a harmless "bug" while CPU bringup wasn't actually happening in
> parallel. It's about to become less harmless...
Just to be clear. There is no bug in todays code and therefore this:
> Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")
tag is unjustified. It'll just cause the stable robots to backport it
for no reason.
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
How is this SOB chain correct? It's unclear to me how Paul got involved
here, but let's assume he handed the patch over to you, then this still
lacks a SOB from you.
> +/*
> + * As an optimisation during boot, set the cluster_mask for *all*
> + * present CPUs at once, to prevent *each* of them having to iterate
> + * over the others to find the existing cluster_mask.
> + */
> +static void prefill_clustermask(struct cpumask *cmsk, u32 cluster)
> +{
> + int cpu;
> +
> + for_each_present_cpu(cpu) {
> + u32 apicid = apic->cpu_present_to_apicid(cpu);
Lacks a newline between declaration and code.
> + if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> + struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
> +
> + BUG_ON(*cpu_cmsk && *cpu_cmsk != cmsk);
While I agree that changing an in use mask pointer would be fatal, I
really have to ask why this code would be invoked on a partially
initialized cluster at all and why that would be correct.
if (WARN_ON_ONCE(*cpu_cmsk == cmsk))
return;
BUG_ON(*cpu_mask);
if at all. But of course that falls over with the way how this code is
invoked below.
> + *cpu_cmsk = cmsk;
> + }
>
> -static int alloc_clustermask(unsigned int cpu, int node)
> +static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
> {
> + struct cpumask *cmsk = NULL;
> + unsigned int cpu_i;
> + u32 apicid;
> +
> if (per_cpu(cluster_masks, cpu))
> return 0;
> - /*
> - * If a hotplug spare mask exists, check whether it's on the right
> - * node. If not, free it and allocate a new one.
> - */
> - if (cluster_hotplug_mask) {
> - if (cluster_hotplug_mask->node == node)
> - return 0;
> - kfree(cluster_hotplug_mask);
> +
> + /* For the hotplug case, don't always allocate a new one */
-ENOPARSE
> + if (system_state >= SYSTEM_RUNNING) {
> + for_each_present_cpu(cpu_i) {
> + apicid = apic->cpu_present_to_apicid(cpu_i);
> + if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> + cmsk = per_cpu(cluster_masks, cpu_i);
> + if (cmsk)
> + break;
> + }
> + }
> + }
> + if (!cmsk) {
> + cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
> + if (!cmsk)
> + return -ENOMEM;
> }
...
> + per_cpu(cluster_masks, cpu) = cmsk;
> +
> + if (system_state < SYSTEM_RUNNING)
> + prefill_clustermask(cmsk, cluster);
TBH. The logic of this code is anything but obvious. Something like the
uncompiled below perhaps?
Thanks,
tglx
---
@@ -116,44 +109,90 @@
+
+/*
+ * As an optimisation during boot, set the cluster_mask for all present
+ * CPUs at once, to prevent each of them having to iterate over the others
+ * to find the existing cluster_mask.
+ */
+static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster)
+{
+ int cpu_i;
- cluster = apicid >> 16;
- for_each_online_cpu(cpu) {
- cmsk = per_cpu(cluster_masks, cpu);
- /* Matching cluster found. Link and update it. */
- if (cmsk && cmsk->clusterid == cluster)
- goto update;
+ for_each_present_cpu(cpu_i) {
+ struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
+ u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+ if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster)
+ continue;
+
+ if (WARN_ON_ONCE(*cpu_mask == cmsk))
+ continue;
+
+ BUG_ON(*cpu_cmsk);
+ *cpu_cmsk = cmsk;
}
- cmsk = cluster_hotplug_mask;
- cmsk->clusterid = cluster;
- cluster_hotplug_mask = NULL;
-update:
- this_cpu_write(cluster_masks, cmsk);
- cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
}
-static int alloc_clustermask(unsigned int cpu, int node)
+static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
{
+ struct cpumask *cmsk;
+ unsigned int cpu_i;
+
if (per_cpu(cluster_masks, cpu))
return 0;
+
/*
- * If a hotplug spare mask exists, check whether it's on the right
- * node. If not, free it and allocate a new one.
+ * At boot time CPU present mask is stable. If the cluster is not
+ * yet initialized, allocate the mask and propagate it to all
+ * siblings in this cluster.
*/
- if (cluster_hotplug_mask) {
- if (cluster_hotplug_mask->node == node)
- return 0;
- kfree(cluster_hotplug_mask);
- }
+ if (system_state < SYSTEM_RUNNING)
+ goto alloc;
+
+ /*
+ * On post boot hotplug iterate over the present CPUs to handle the
+ * case of partial clusters as they might be presented by
+ * virtualization.
+ */
+ for_each_present_cpu(cpu_i) {
+ u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+ if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
+ cmsk = per_cpu(cluster_masks, cpu_i);
- cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
- GFP_KERNEL, node);
- if (!cluster_hotplug_mask)
+ /*
+ * If the cluster is already initialized, just
+ * store the mask and return. No point in trying to
+ * propagate.
+ */
+ if (cmsk) {
+ per_cpu(cluster_masks, cpu) = cmsk;
+ return 0;
+ }
+ }
+ }
+ /*
+ * The cluster is not initialized yet. Fall through to the boot
+ * time code which might initialize the whole cluster if it is
+ * in the CPU present mask.
+ */
+alloc:
+ cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
+ if (!cmsk)
return -ENOMEM;
- cluster_hotplug_mask->node = node;
+ per_cpu(cluster_masks, cpu) = cmsk;
+ prefill_clustermask(cmsk, cluster);
+
return 0;
}
next prev parent reply other threads:[~2023-02-06 23:20 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-02 21:56 [PATCH v6 00/11] Parallel CPU bringup for x86_64 Usama Arif
2023-02-02 21:56 ` [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask Usama Arif
2023-02-06 23:20 ` Thomas Gleixner [this message]
2023-02-07 10:57 ` David Woodhouse
2023-02-07 11:27 ` David Woodhouse
2023-02-07 14:24 ` Thomas Gleixner
2023-02-07 19:53 ` David Woodhouse
2023-02-07 20:58 ` Thomas Gleixner
2023-02-07 14:22 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> Usama Arif
2023-02-06 23:33 ` Thomas Gleixner
2023-02-07 1:24 ` Paul E. McKenney
2023-02-02 21:56 ` [PATCH v6 03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU Usama Arif
2023-02-06 23:43 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() Usama Arif
2023-02-06 23:48 ` Thomas Gleixner
[not found] ` <57195f701f6d1d70ec440c9a28cbee4cfb81dc41.camel@amazon.co.uk>
2023-02-07 14:39 ` Thomas Gleixner
2023-02-07 16:50 ` Sean Christopherson
2023-02-07 19:48 ` [EXTERNAL][PATCH " David Woodhouse
2023-02-02 21:56 ` [PATCH v6 05/11] x86/smpboot: Split up native_cpu_up into separate phases and document them Usama Arif
2023-02-06 23:59 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs Usama Arif
2023-02-02 22:30 ` David Woodhouse
2023-02-02 22:50 ` [External] " Usama Arif
2023-02-03 8:14 ` David Woodhouse
2023-02-03 14:41 ` Arjan van de Ven
2023-02-03 18:17 ` Sean Christopherson
2023-02-07 0:07 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs Usama Arif
2023-02-03 19:48 ` Kim Phillips
[not found] ` <d5ec64236ba75f0d3f3718fb69b2cb9169d8af0a.camel@amazon.co.uk>
2023-02-03 21:45 ` Kim Phillips
2023-02-03 22:25 ` [EXTERNAL][PATCH " David Woodhouse
2023-02-04 9:07 ` [PATCH " David Woodhouse
2023-02-04 10:09 ` David Woodhouse
2023-02-04 15:40 ` David Woodhouse
2023-02-04 18:18 ` Arjan van de Ven
2023-02-04 22:31 ` David Woodhouse
2023-02-05 22:13 ` [External] " Usama Arif
2023-02-06 8:05 ` David Woodhouse
2023-02-06 12:11 ` Usama Arif
2023-02-06 18:07 ` Sean Christopherson
2023-02-06 17:58 ` Kim Phillips
2023-02-07 16:27 ` Kim Phillips
2023-02-07 0:23 ` Thomas Gleixner
2023-02-07 10:04 ` David Woodhouse
2023-02-07 14:44 ` Thomas Gleixner
2023-02-07 0:09 ` Thomas Gleixner
[not found] ` <cbd9e88e738dc0c479e87121ca82431731905c73.camel@amazon.co.uk>
2023-02-07 14:46 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 08/11] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Usama Arif
2023-02-07 0:28 ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 09/11] x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 10/11] x86/smpboot: Serialize topology updates for secondary bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 11/11] x86/smpboot: reuse timer calibration Usama Arif
2023-02-07 0:31 ` Thomas Gleixner
2023-02-07 23:16 ` Arjan van de Ven
2023-02-07 23:55 ` Thomas Gleixner
2023-02-05 19:17 ` [PATCH v6 00/11] Parallel CPU bringup for x86_64 Russ Anderson
2023-02-06 8:28 ` David Woodhouse
2023-02-06 12:18 ` [External] " Usama Arif
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a61qxtx0.ffs@tglx \
--to=tglx@linutronix.de \
--cc=arjan@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dwmw2@infradead.org \
--cc=dwmw@amazon.co.uk \
--cc=fam.zheng@bytedance.com \
--cc=hewenliang4@huawei.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=liangma@liangbit.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mimoja@mimoja.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=pbonzini@redhat.com \
--cc=pmenzel@molgen.mpg.de \
--cc=punit.agrawal@bytedance.com \
--cc=rcu@vger.kernel.org \
--cc=seanjc@google.com \
--cc=simon.evans@bytedance.com \
--cc=thomas.lendacky@amd.com \
--cc=usama.arif@bytedance.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).