linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Usama Arif <usama.arif@bytedance.com>,
	dwmw2@infradead.org, arjan@linux.intel.com
Cc: mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, x86@kernel.org, pbonzini@redhat.com,
	paulmck@kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de,
	hewenliang4@huawei.com, thomas.lendacky@amd.com,
	seanjc@google.com, pmenzel@molgen.mpg.de,
	fam.zheng@bytedance.com, punit.agrawal@bytedance.com,
	simon.evans@bytedance.com, liangma@liangbit.com,
	David Woodhouse <dwmw@amazon.co.uk>
Subject: Re: [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask
Date: Tue, 07 Feb 2023 00:20:43 +0100	[thread overview]
Message-ID: <87a61qxtx0.ffs@tglx> (raw)
In-Reply-To: <20230202215625.3248306-2-usama.arif@bytedance.com>

Usama!

On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
> For each CPU being brought up, the alloc_clustermask() function
> allocates a new struct cluster_mask just in case it's needed. Then the
> target CPU actually runs, and in init_x2apic_ldr() it either uses a
> cluster_mask from a previous CPU in the same cluster, or consumes the
> "spare" one and sets the global pointer to NULL.
>
> That isn't going to parallelise stunningly well.
>
> Ditch the global variable, let alloc_clustermask() install the struct
> *directly* in the per_cpu data for the CPU being brought up. As an
> optimisation, actually make it do so for *all* present CPUs in the same
> cluster, which means only one iteration over for_each_present_cpu()
> instead of doing so repeatedly, once for each CPU.
>
> Now, in fact, there's no point in the 'node' or 'clusterid' members of
> the struct cluster_mask, so just kill it and use struct cpumask instead.
>
> This was a harmless "bug" while CPU bringup wasn't actually happening in
> parallel. It's about to become less harmless...

Just to be clear. There is no bug in todays code and therefore this:

> Fixes: 023a611748fd5 ("x86/apic/x2apic: Simplify cluster management")

tag is unjustified. It'll just cause the stable robots to backport it
for no reason.

> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

How is this SOB chain correct? It's unclear to me how Paul got involved
here, but let's assume he handed the patch over to you, then this still
lacks a SOB from you.

> +/*
> + * As an optimisation during boot, set the cluster_mask for *all*
> + * present CPUs at once, to prevent *each* of them having to iterate
> + * over the others to find the existing cluster_mask.
> + */
> +static void prefill_clustermask(struct cpumask *cmsk, u32 cluster)
> +{
> +	int cpu;
> +
> +	for_each_present_cpu(cpu) {
> +		u32 apicid = apic->cpu_present_to_apicid(cpu);

Lacks a newline between declaration and code.

> +		if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> +			struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
> +
> +			BUG_ON(*cpu_cmsk && *cpu_cmsk != cmsk);

While I agree that changing an in use mask pointer would be fatal, I
really have to ask why this code would be invoked on a partially
initialized cluster at all and why that would be correct.

                        if (WARN_ON_ONCE(*cpu_cmsk == cmsk))
                        	return;
                        BUG_ON(*cpu_mask);

if at all. But of course that falls over with the way how this code is
invoked below.

> +			*cpu_cmsk = cmsk;
> +		}
>  
> -static int alloc_clustermask(unsigned int cpu, int node)
> +static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
>  {
> +	struct cpumask *cmsk = NULL;
> +	unsigned int cpu_i;
> +	u32 apicid;
> +
>  	if (per_cpu(cluster_masks, cpu))
>  		return 0;
> -	/*
> -	 * If a hotplug spare mask exists, check whether it's on the right
> -	 * node. If not, free it and allocate a new one.
> -	 */
> -	if (cluster_hotplug_mask) {
> -		if (cluster_hotplug_mask->node == node)
> -			return 0;
> -		kfree(cluster_hotplug_mask);
> +
> +	/* For the hotplug case, don't always allocate a new one */

-ENOPARSE

> +	if (system_state >= SYSTEM_RUNNING) {
> +		for_each_present_cpu(cpu_i) {
> +			apicid = apic->cpu_present_to_apicid(cpu_i);
> +			if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
> +				cmsk = per_cpu(cluster_masks, cpu_i);
> +				if (cmsk)
> +					break;
> +			}
> +		}
> +	}
> +	if (!cmsk) {
> +		cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
> +		if (!cmsk)
> +			return -ENOMEM;
>  	}

...
  
> +	per_cpu(cluster_masks, cpu) = cmsk;
> +
> +	if (system_state < SYSTEM_RUNNING)
> +		prefill_clustermask(cmsk, cluster);

TBH. The logic of this code is anything but obvious. Something like the
uncompiled below perhaps?

Thanks,

        tglx
---

@@ -116,44 +109,90 @@
+
+/*
+ * As an optimisation during boot, set the cluster_mask for all present
+ * CPUs at once, to prevent each of them having to iterate over the others
+ * to find the existing cluster_mask.
+ */
+static void prefill_clustermask(struct cpumask *cmsk, unsigned int cpu, u32 cluster)
+{
+	int cpu_i;
 
-	cluster = apicid >> 16;
-	for_each_online_cpu(cpu) {
-		cmsk = per_cpu(cluster_masks, cpu);
-		/* Matching cluster found. Link and update it. */
-		if (cmsk && cmsk->clusterid == cluster)
-			goto update;
+	for_each_present_cpu(cpu_i) {
+		struct cpumask **cpu_cmsk = &per_cpu(cluster_masks, cpu);
+		u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+		if (apicid == BAD_APICID || cpu_i == cpu || apic_cluster(apicid) != cluster)
+			continue;
+
+		if (WARN_ON_ONCE(*cpu_mask == cmsk))
+			continue;
+
+		BUG_ON(*cpu_cmsk);
+		*cpu_cmsk = cmsk;
 	}
-	cmsk = cluster_hotplug_mask;
-	cmsk->clusterid = cluster;
-	cluster_hotplug_mask = NULL;
-update:
-	this_cpu_write(cluster_masks, cmsk);
-	cpumask_set_cpu(smp_processor_id(), &cmsk->mask);
 }
 
-static int alloc_clustermask(unsigned int cpu, int node)
+static int alloc_clustermask(unsigned int cpu, u32 cluster, int node)
 {
+	struct cpumask *cmsk;
+	unsigned int cpu_i;
+
 	if (per_cpu(cluster_masks, cpu))
 		return 0;
+
 	/*
-	 * If a hotplug spare mask exists, check whether it's on the right
-	 * node. If not, free it and allocate a new one.
+	 * At boot time CPU present mask is stable. If the cluster is not
+	 * yet initialized, allocate the mask and propagate it to all
+	 * siblings in this cluster.
 	 */
-	if (cluster_hotplug_mask) {
-		if (cluster_hotplug_mask->node == node)
-			return 0;
-		kfree(cluster_hotplug_mask);
-	}
+	if (system_state < SYSTEM_RUNNING)
+		goto alloc;
+
+	/*
+	 * On post boot hotplug iterate over the present CPUs to handle the
+	 * case of partial clusters as they might be presented by
+	 * virtualization.
+	 */
+	for_each_present_cpu(cpu_i) {
+		u32 apicid = apic->cpu_present_to_apicid(cpu_i);
+
+		if (apicid != BAD_APICID && apic_cluster(apicid) == cluster) {
+			cmsk = per_cpu(cluster_masks, cpu_i);
 
-	cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask),
-					    GFP_KERNEL, node);
-	if (!cluster_hotplug_mask)
+			/*
+			 * If the cluster is already initialized, just
+			 * store the mask and return. No point in trying to
+			 * propagate.
+			 */
+			if (cmsk) {
+				per_cpu(cluster_masks, cpu) = cmsk;
+				return 0;
+			}
+		}
+	}
+	/*
+	 * The cluster is not initialized yet. Fall through to the boot
+	 * time code which might initialize the whole cluster if it is
+	 * in the CPU present mask.
+	 */
+alloc:
+	cmsk = kzalloc_node(sizeof(*cmsk), GFP_KERNEL, node);
+	if (!cmsk)
 		return -ENOMEM;
-	cluster_hotplug_mask->node = node;
+	per_cpu(cluster_masks, cpu) = cmsk;
+	prefill_clustermask(cmsk, cluster);
+
 	return 0;
 }



  reply	other threads:[~2023-02-06 23:20 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-02 21:56 [PATCH v6 00/11] Parallel CPU bringup for x86_64 Usama Arif
2023-02-02 21:56 ` [PATCH v6 01/11] x86/apic/x2apic: Fix parallel handling of cluster_mask Usama Arif
2023-02-06 23:20   ` Thomas Gleixner [this message]
2023-02-07 10:57     ` David Woodhouse
2023-02-07 11:27       ` David Woodhouse
2023-02-07 14:24         ` Thomas Gleixner
2023-02-07 19:53           ` David Woodhouse
2023-02-07 20:58             ` Thomas Gleixner
2023-02-07 14:22       ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 02/11] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> Usama Arif
2023-02-06 23:33   ` Thomas Gleixner
2023-02-07  1:24     ` Paul E. McKenney
2023-02-02 21:56 ` [PATCH v6 03/11] cpu/hotplug: Add dynamic parallel bringup states before CPUHP_BRINGUP_CPU Usama Arif
2023-02-06 23:43   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 04/11] x86/smpboot: Reference count on smpboot_setup_warm_reset_vector() Usama Arif
2023-02-06 23:48   ` Thomas Gleixner
     [not found]     ` <57195f701f6d1d70ec440c9a28cbee4cfb81dc41.camel@amazon.co.uk>
2023-02-07 14:39       ` Thomas Gleixner
2023-02-07 16:50         ` Sean Christopherson
2023-02-07 19:48         ` [EXTERNAL][PATCH " David Woodhouse
2023-02-02 21:56 ` [PATCH v6 05/11] x86/smpboot: Split up native_cpu_up into separate phases and document them Usama Arif
2023-02-06 23:59   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 06/11] x86/smpboot: Support parallel startup of secondary CPUs Usama Arif
2023-02-02 22:30   ` David Woodhouse
2023-02-02 22:50     ` [External] " Usama Arif
2023-02-03  8:14       ` David Woodhouse
2023-02-03 14:41         ` Arjan van de Ven
2023-02-03 18:17   ` Sean Christopherson
2023-02-07  0:07   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 07/11] x86/smpboot: Disable parallel boot for AMD CPUs Usama Arif
2023-02-03 19:48   ` Kim Phillips
     [not found]     ` <d5ec64236ba75f0d3f3718fb69b2cb9169d8af0a.camel@amazon.co.uk>
2023-02-03 21:45       ` Kim Phillips
2023-02-03 22:25         ` [EXTERNAL][PATCH " David Woodhouse
2023-02-04  9:07     ` [PATCH " David Woodhouse
2023-02-04 10:09     ` David Woodhouse
2023-02-04 15:40     ` David Woodhouse
2023-02-04 18:18       ` Arjan van de Ven
2023-02-04 22:31         ` David Woodhouse
2023-02-05 22:13           ` [External] " Usama Arif
2023-02-06  8:05             ` David Woodhouse
2023-02-06 12:11               ` Usama Arif
2023-02-06 18:07                 ` Sean Christopherson
2023-02-06 17:58       ` Kim Phillips
2023-02-07 16:27         ` Kim Phillips
2023-02-07  0:23       ` Thomas Gleixner
2023-02-07 10:04         ` David Woodhouse
2023-02-07 14:44           ` Thomas Gleixner
2023-02-07  0:09   ` Thomas Gleixner
     [not found]     ` <cbd9e88e738dc0c479e87121ca82431731905c73.camel@amazon.co.uk>
2023-02-07 14:46       ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 08/11] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Usama Arif
2023-02-07  0:28   ` Thomas Gleixner
2023-02-02 21:56 ` [PATCH v6 09/11] x86/mtrr: Avoid repeated save of MTRRs on boot-time CPU bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 10/11] x86/smpboot: Serialize topology updates for secondary bringup Usama Arif
2023-02-02 21:56 ` [PATCH v6 11/11] x86/smpboot: reuse timer calibration Usama Arif
2023-02-07  0:31   ` Thomas Gleixner
2023-02-07 23:16   ` Arjan van de Ven
2023-02-07 23:55     ` Thomas Gleixner
2023-02-05 19:17 ` [PATCH v6 00/11] Parallel CPU bringup for x86_64 Russ Anderson
2023-02-06  8:28   ` David Woodhouse
2023-02-06 12:18     ` [External] " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a61qxtx0.ffs@tglx \
    --to=tglx@linutronix.de \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=fam.zheng@bytedance.com \
    --cc=hewenliang4@huawei.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mimoja@mimoja.de \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=pmenzel@molgen.mpg.de \
    --cc=punit.agrawal@bytedance.com \
    --cc=rcu@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=simon.evans@bytedance.com \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).