linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Usama Arif <usama.arif@bytedance.com>
To: dwmw2@infradead.org, tglx@linutronix.de, kim.phillips@amd.com,
	brgerst@gmail.com
Cc: piotrgorski@cachyos.org, oleksandr@natalenko.name,
	arjan@linux.intel.com, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, x86@kernel.org,
	pbonzini@redhat.com, paulmck@kernel.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com,
	thomas.lendacky@amd.com, seanjc@google.com,
	pmenzel@molgen.mpg.de, fam.zheng@bytedance.com,
	punit.agrawal@bytedance.com, simon.evans@bytedance.com,
	liangma@liangbit.com, gpiccoli@igalia.com,
	David Woodhouse <dwmw@amazon.co.uk>,
	Usama Arif <usama.arif@bytedance.com>,
	Mark Rutland <mark.rutland@arm.com>
Subject: [PATCH v17 3/8] cpu/hotplug: Add CPUHP_BP_PARALLEL_STARTUP state before CPUHP_BRINGUP_CPU
Date: Tue, 28 Mar 2023 20:57:53 +0100	[thread overview]
Message-ID: <20230328195758.1049469-4-usama.arif@bytedance.com> (raw)
In-Reply-To: <20230328195758.1049469-1-usama.arif@bytedance.com>

From: David Woodhouse <dwmw@amazon.co.uk>

There is often significant latency in the early stages of CPU bringup,
and time is wasted by waking each CPU (e.g. with SIPI/INIT/INIT on x86)
and then waiting for it to make its way through hardware powerup and
through firmware before finally reaching the kernel entry point and
moving on through its startup.

Allow a platform to register a pre-bringup CPUHP state to which each
CPU can be stepped in parallel, thus absorbing some of that latency.

There is a subtlety here: even with an empty CPUHP_BP_PARALLEL_STARTUP
step, this means that *all* CPUs are brought through the prepare states
all the way to CPUHP_BP_PARALLEL_STARTUP before any of them are taken
to CPUHP_BRINGUP_CPU and then are allowed to run for themselves to
CPUHP_ONLINE.

So any combination of prepare/start calls which depend on A-B ordering
for each CPU in turn would explore horribly. As an example, the X2APIC
code prior to commit cefad862f238 ("x86/apic/x2apic: Allow CPU
cluster_mask to be populated in parallel") would allocate a new cluster
mask "just in case" and store it in a global variable in the prep stage,
then the AP would potentially consume that preallocated structure and set
the global pointer to NULL to be reallocated in CPUHP_X2APIC_PREPARE for
the next CPU. Which doesn't work at all if the prepare step is run for
all the CPUs first.

Any platform enabling the CPUHP_BP_PARALLEL_STARTUP step must be
reviewed and tested to ensure that such issues do not exist, and the
existing behaviour of each AP through to CPUHP_BP_PREPARE_DYN and then
immediately to CPUHP_BRINGUP_CPU and CPUHP_ONLINE only one at a time
does not change unless such a state is registered.

Note that this does *not* yet bring each AP to the CPUHP_BRINGUP_CPU
state at the same time, only to the new CPUHP_BP_PARALLEL_STARTUP state.
The final loop in bringup_nonboot_cpus() remains the same, bringing each
AP in turn from the CPUHP_BP_PARALLEL_STARTUP (or all the way from
CPUHP_OFFLINE) to CPUHP_BRINGUP_CPU and then waiting for that AP to do
its own processing and reach CPUHP_ONLINE before releasing the next.

Parallelising that part by bringing them all to CPUHP_BRINGUP_CPU and
then waiting for them all to run to CPUHP_ONLINE at the same time is a
more complicated exercise for the future.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
---
 include/linux/cpuhotplug.h | 22 ++++++++++++++++++++++
 kernel/cpu.c               | 38 +++++++++++++++++++++++++++++++++++---
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c6fab004104a..84efd33ed3a3 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -133,6 +133,28 @@ enum cpuhp_state {
 	CPUHP_MIPS_SOC_PREPARE,
 	CPUHP_BP_PREPARE_DYN,
 	CPUHP_BP_PREPARE_DYN_END		= CPUHP_BP_PREPARE_DYN + 20,
+	/*
+	 * This is an optional state if the architecture supports parallel
+	 * startup. It's used to start bringing the CPU online (e.g. send
+	 * the startup IPI) so that the APs can run in parallel through
+	 * the low level startup code instead of waking them one by one in
+	 * CPUHP_BRINGUP_CPU. This avoids waiting for the AP to react and
+	 * shortens the serialized phase of the bringup.
+	 *
+	 * If the architecture registers this state, all APs will be taken
+	 * to it (and thus through all prior states) before any is taken
+	 * to the subsequent CPUHP_BRINGUP_CPU state.
+	 */
+	CPUHP_BP_PARALLEL_STARTUP,
+
+	/*
+	 * This step brings the AP online and takes it to the point where it
+	 * manages its own state from here on. For the time being, the rest
+	 * of the AP bringup is fully serialized despite running on the AP.
+	 * If the architecture doesn't use the CPUHP_BP_PARALLEL_STARTUP
+	 * state, this step also does all the work of bringing the CPU
+	 * online.
+	 */
 	CPUHP_BRINGUP_CPU,
 
 	/*
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 43e0a77f21e8..3382273ea3f4 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1504,13 +1504,45 @@ int bringup_hibernate_cpu(unsigned int sleep_cpu)
 
 void bringup_nonboot_cpus(unsigned int setup_max_cpus)
 {
-	unsigned int cpu;
+	unsigned int cpu, n = num_online_cpus();
 
+	/*
+	 * On architectures which have setup the CPUHP_BP_PARALLEL_STARTUP
+	 * state, this invokes all BP prepare states and the parallel
+	 * startup state sends the startup IPI to each of the to be onlined
+	 * APs. This avoids waiting for each AP to respond to the startup
+	 * IPI in CPUHP_BRINGUP_CPU. The APs proceed through the low level
+	 * bringup code and then wait for the control CPU to release them
+	 * one by one for the final onlining procedure in the loop below.
+	 *
+	 * For architectures which do not support parallel bringup all
+	 * states are fully serialized in the loop below.
+	 */
+	if (!cpuhp_step_empty(true, cpuhp_get_step(CPUHP_BP_PARALLEL_STARTUP))) {
+		for_each_present_cpu(cpu) {
+			if (n++ >= setup_max_cpus)
+				break;
+			cpu_up(cpu, CPUHP_BP_PARALLEL_STARTUP);
+		}
+	}
+
+	/* Do the per CPU serialized bringup to ONLINE state */
 	for_each_present_cpu(cpu) {
 		if (num_online_cpus() >= setup_max_cpus)
 			break;
-		if (!cpu_online(cpu))
-			cpu_up(cpu, CPUHP_ONLINE);
+
+		if (!cpu_online(cpu)) {
+			struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
+			int ret = cpu_up(cpu, CPUHP_ONLINE);
+
+			/*
+			 * Due to the above preparation loop a failed online attempt
+			 * might have only rolled back to CPUHP_BP_PARALLEL_STARTUP. Do the
+			 * remaining cleanups. NOOP for the non parallel case.
+			 */
+			if (ret && can_rollback_cpu(st))
+				WARN_ON(cpuhp_invoke_callback_range(false, cpu, st, CPUHP_OFFLINE));
+		}
 	}
 }
 
-- 
2.25.1


  parent reply	other threads:[~2023-03-28 19:59 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28 19:57 [PATCH v17 0/8] Parallel CPU bringup for x86_64 Usama Arif
2023-03-28 19:57 ` [PATCH v17 1/8] cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h> Usama Arif
2023-03-28 19:57 ` [PATCH v17 2/8] cpu/hotplug: Reset task stack state in _cpu_up() Usama Arif
2023-03-29 11:09   ` Usama Arif
2023-03-28 19:57 ` Usama Arif [this message]
2023-03-28 19:57 ` [PATCH v17 4/8] x86/smpboot: Split up native_cpu_up into separate phases and document them Usama Arif
2023-03-28 19:57 ` [PATCH v17 5/8] x86/smpboot: Support parallel startup of secondary CPUs Usama Arif
2023-03-28 19:57 ` [PATCH v17 6/8] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel Usama Arif
2023-03-30 16:46   ` Thomas Gleixner
2023-03-30 17:05     ` Borislav Petkov
2023-03-30 18:17       ` Thomas Gleixner
2023-03-31 11:40         ` [External] " Usama Arif
2023-03-28 19:57 ` [PATCH v17 7/8] x86/smpboot: Serialize topology updates for secondary bringup Usama Arif
2023-03-28 19:57 ` [PATCH v17 8/8] x86/smpboot: Allow parallel bringup for SEV-ES Usama Arif
2023-03-28 20:07   ` Borislav Petkov
2023-03-30  0:10     ` Thomas Gleixner
2023-03-30  7:51       ` Borislav Petkov
2023-03-30  8:15       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230328195758.1049469-4-usama.arif@bytedance.com \
    --to=usama.arif@bytedance.com \
    --cc=arjan@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dwmw2@infradead.org \
    --cc=dwmw@amazon.co.uk \
    --cc=fam.zheng@bytedance.com \
    --cc=gpiccoli@igalia.com \
    --cc=hewenliang4@huawei.com \
    --cc=hpa@zytor.com \
    --cc=kim.phillips@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mimoja@mimoja.de \
    --cc=mingo@redhat.com \
    --cc=oleksandr@natalenko.name \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=piotrgorski@cachyos.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=punit.agrawal@bytedance.com \
    --cc=rcu@vger.kernel.org \
    --cc=seanjc@google.com \
    --cc=simon.evans@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).