All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Mario Limonciello <mario.limonciello@amd.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Tony Battersby <tonyb@cybernetics.com>,
	Ashok Raj <ashok.raj@linux.intel.com>,
	Tony Luck <tony.luck@intel.com>,
	Arjan van de Veen <arjan@linux.intel.com>,
	Eric Biederman <ebiederm@xmission.com>,
	Ashok Raj <ashok.raj@intel.com>
Subject: [patch v3 7/7] x86/smp: Put CPUs into INIT on shutdown if possible
Date: Thu, 15 Jun 2023 22:34:00 +0200 (CEST)	[thread overview]
Message-ID: <20230615193330.608657211@linutronix.de> (raw)
In-Reply-To: 20230615190036.898273129@linutronix.de

Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.

Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.

A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.

So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
---
V3: Renamed the function to smp_park_other_cpus_in_init() so it can
    be reused for crash eventually.
---
 arch/x86/include/asm/smp.h |    2 ++
 arch/x86/kernel/smp.c      |   39 ++++++++++++++++++++++++++++++++-------
 arch/x86/kernel/smpboot.c  |   19 +++++++++++++++++++
 3 files changed, 53 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -139,6 +139,8 @@ void native_send_call_func_ipi(const str
 void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
+bool smp_park_other_cpus_in_init(void);
+
 void smp_store_boot_cpu_info(void);
 void smp_store_cpu_info(int id);
 
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -131,7 +131,7 @@ static int smp_stop_nmi_callback(unsigne
 }
 
 /*
- * this function calls the 'stop' function on all other CPUs in the system.
+ * Disable virtualization, APIC etc. and park the CPU in a HLT loop
  */
 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
 {
@@ -172,13 +172,17 @@ static void native_stop_other_cpus(int w
 	 * 2) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * 3) If #2 timed out send an NMI to the CPUs which did not
-	 *    yet report
+	 * 3) If the system uses INIT/STARTUP for CPU bringup, then
+	 *    send all present CPUs an INIT vector, which brings them
+	 *    completely out of the way.
 	 *
-	 * 4) Wait for all other CPUs to report that they reached the
+	 * 4) If #3 is not possible and #2 timed out send an NMI to the
+	 *    CPUs which did not yet report
+	 *
+	 * 5) Wait for all other CPUs to report that they reached the
 	 *    HLT loop in stop_this_cpu()
 	 *
-	 * #3 can obviously race against a CPU reaching the HLT loop late.
+	 * #4 can obviously race against a CPU reaching the HLT loop late.
 	 * That CPU will have reported already and the "have all CPUs
 	 * reached HLT" condition will be true despite the fact that the
 	 * other CPU is still handling the NMI. Again, there is no
@@ -194,7 +198,7 @@ static void native_stop_other_cpus(int w
 		/*
 		 * Don't wait longer than a second for IPI completion. The
 		 * wait request is not checked here because that would
-		 * prevent an NMI shutdown attempt in case that not all
+		 * prevent an NMI/INIT shutdown in case that not all
 		 * CPUs reach shutdown state.
 		 */
 		timeout = USEC_PER_SEC;
@@ -202,7 +206,27 @@ static void native_stop_other_cpus(int w
 			udelay(1);
 	}
 
-	/* if the REBOOT_VECTOR didn't work, try with the NMI */
+	/*
+	 * Park all other CPUs in INIT including "offline" CPUs, if
+	 * possible. That's a safe place where they can't resume execution
+	 * of HLT and then execute the HLT loop from overwritten text or
+	 * page tables.
+	 *
+	 * The only downside is a broadcast MCE, but up to the point where
+	 * the kexec() kernel brought all APs online again an MCE will just
+	 * make HLT resume and handle the MCE. The machine crashs and burns
+	 * due to overwritten text, page tables and data. So there is a
+	 * choice between fire and frying pan. The result is pretty much
+	 * the same. Chose frying pan until x86 provides a sane mechanism
+	 * to park a CPU.
+	 */
+	if (smp_park_other_cpus_in_init())
+		goto done;
+
+	/*
+	 * If park with INIT was not possible and the REBOOT_VECTOR didn't
+	 * take all secondary CPUs offline, try with the NMI.
+	 */
 	if (!cpumask_empty(&cpus_stop_mask)) {
 		/*
 		 * If NMI IPI is enabled, try to register the stop handler
@@ -234,6 +258,7 @@ static void native_stop_other_cpus(int w
 			udelay(1);
 	}
 
+done:
 	local_irq_save(flags);
 	disable_local_APIC();
 	mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1465,6 +1465,25 @@ void arch_thaw_secondary_cpus_end(void)
 	cache_aps_init();
 }
 
+bool smp_park_other_cpus_in_init(void)
+{
+	unsigned int cpu, this_cpu = smp_processor_id();
+	unsigned int apicid;
+
+	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
+		return false;
+
+	for_each_present_cpu(cpu) {
+		if (cpu == this_cpu)
+			continue;
+		apicid = apic->cpu_present_to_apicid(cpu);
+		if (apicid == BAD_APICID)
+			continue;
+		send_init_sequence(apicid);
+	}
+	return true;
+}
+
 /*
  * Early setup to make printk work.
  */


  parent reply	other threads:[~2023-06-15 20:34 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 20:33 [patch v3 0/7] x86/smp: Cure stop_other_cpus() and kexec() troubles Thomas Gleixner
2023-06-15 20:33 ` [patch v3 1/7] x86/smp: Make stop_other_cpus() more robust Thomas Gleixner
2023-06-16  1:58   ` Ashok Raj
2023-06-16  7:53     ` Thomas Gleixner
2023-06-16 14:13       ` Ashok Raj
2023-06-16 18:01         ` Thomas Gleixner
2023-06-16 20:57           ` Ashok Raj
2023-06-19 17:51             ` Ashok Raj
2023-06-20  8:09           ` Borislav Petkov
2023-06-16 16:36     ` Tony Battersby
2023-06-15 20:33 ` [patch v3 2/7] x86/smp: Dont access non-existing CPUID leaf Thomas Gleixner
2023-06-19 17:02   ` Limonciello, Mario
2023-06-19 17:15     ` Thomas Gleixner
2023-06-20  8:20   ` Borislav Petkov
2023-06-15 20:33 ` [patch v3 3/7] x86/smp: Remove pointless wmb()s from native_stop_other_cpus() Thomas Gleixner
2023-06-20  8:47   ` Borislav Petkov
2023-06-20 13:00   ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-06-15 20:33 ` [patch v3 4/7] x86/smp: Use dedicated cache-line for mwait_play_dead() Thomas Gleixner
2023-06-20  9:01   ` Borislav Petkov
2023-06-20 13:00   ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-06-15 20:33 ` [patch v3 5/7] x86/smp: Cure kexec() vs. mwait_play_dead() breakage Thomas Gleixner
2023-06-20  9:23   ` Borislav Petkov
2023-06-20 12:25     ` Thomas Gleixner
2023-06-20 13:00   ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-06-15 20:33 ` [patch v3 6/7] x86/smp: Split sending INIT IPI out into a helper function Thomas Gleixner
2023-06-20  9:29   ` Borislav Petkov
2023-06-20 12:30     ` Thomas Gleixner
2023-06-20 13:00   ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-06-15 20:34 ` Thomas Gleixner [this message]
2023-06-20 10:27   ` [patch v3 7/7] x86/smp: Put CPUs into INIT on shutdown if possible Borislav Petkov
2023-06-20 13:00   ` [tip: x86/core] " tip-bot2 for Thomas Gleixner
2023-07-03  3:44   ` [BUG REPORT] Triggering a panic in an x86 virtual machine does not wait Baokun Li
2023-07-05  8:59     ` Thomas Gleixner
2023-07-05  8:59       ` Thomas Gleixner
2023-07-06  6:44       ` Baokun Li
2023-07-06  6:44         ` Baokun Li
2023-07-07 10:18         ` Thomas Gleixner
2023-07-07 10:18           ` Thomas Gleixner
2023-07-07 12:40           ` Baokun Li
2023-07-07 12:40             ` Baokun Li
2023-07-07 13:49       ` [tip: x86/core] x86/smp: Don't send INIT to boot CPU tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230615193330.608657211@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=arjan@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=ashok.raj@linux.intel.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mario.limonciello@amd.com \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=tonyb@cybernetics.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.