linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Unify SMP stop generic logic to common code
@ 2019-08-23 11:57 Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 1/7] smp: add generic SMP-stop support " Cristian Marussi
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Hi all,

the logic underlying SMP stop and kexec crash procedures, beside containing
some arch-specific bits, is mostly generic and common across all archs:
despite this fact, such logic is now scattered across all architectures and
on some of them is flawed, in such a way that, under some specific
conditions, you can end up with a CPU left still running after a panic and
possibly lost across a subsequent kexec crash reboot. [1]

Beside the flaws on some archs, there is anyway lots of code duplication,
so this patch series attempts to move into common code all the generic SMP
stop and crash logic, fixing observed issues, and leaving only the arch
specific bits inside properly provided arch-specific helpers.

An architecture willing to rely on this SMP common logic has to define its
own helpers and set CONFIG_ARCH_USE_COMMON_SMP_STOP=y.
The series wire this up for arm64.

Behaviour is not changed for architectures not adopting this new common
logic.

Tested as follows:
- arm64:
 1. boot/reboot
 2. panic on a starting CPU within a 2 CPUs system (freezing properly)
 3. kexec reboot after a panic like 2. (not losing any CPU on reboot)
 4. kexec reboot after a panic like 2. and a simultaneous reboot
    (instrumenting code to delay the stop messages transmission
     to have time to inject a reboot -f)
- x86:
 1. boot/reboot
 2. panic
 3. kexec crash

Thanks

Cristian

[1]

[root@arch ~]# echo 1 > /sys/devices/system/cpu/cpu1/online
[root@arch ~]# [  152.583368] ------------[ cut here ]------------
[  152.583872] kernel BUG at arch/arm64/kernel/cpufeature.c:852!
[  152.584693] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[  152.585228] Modules linked in:
[  152.586040] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-00001-gcabd12118c4a-dirty #2
[  152.586218] Hardware name: Foundation-v8A (DT)
[  152.586478] pstate: 000001c5 (nzcv dAIF -PAN -UAO)
[  152.587260] pc : has_cpuid_feature+0x35c/0x360
[  152.587398] lr : verify_local_elf_hwcaps+0x6c/0xf0
[  152.587520] sp : ffff0000118bbf60
[  152.587605] x29: ffff0000118bbf60 x28: 0000000000000000
[  152.587784] x27: 0000000000000000 x26: 0000000000000000
[  152.587882] x25: ffff00001167a010 x24: ffff0000112f59f8
[  152.587992] x23: 0000000000000000 x22: 0000000000000000
[  152.588085] x21: ffff0000112ea018 x20: ffff000010fe5518
[  152.588180] x19: ffff000010ba3f30 x18: 0000000000000036
[  152.588285] x17: 0000000000000000 x16: 0000000000000000
[  152.588380] x15: 0000000000000000 x14: ffff80087a821210
[  152.588481] x13: 0000000000000000 x12: 0000000000000000
[  152.588599] x11: 0000000000000080 x10: 00400032b5503510
[  152.588709] x9 : 0000000000000000 x8 : ffff000010b93204
[  152.588810] x7 : 00000000800001d8 x6 : 0000000000000005
[  152.588910] x5 : 0000000000000000 x4 : 0000000000000000
[  152.589021] x3 : 0000000000000000 x2 : 0000000000008000
[  152.589121] x1 : 0000000000180480 x0 : 0000000000180480
[  152.589379] Call trace:
[  152.589646]  has_cpuid_feature+0x35c/0x360
[  152.589763]  verify_local_elf_hwcaps+0x6c/0xf0
[  152.589858]  check_local_cpu_capabilities+0x88/0x118
[  152.589968]  secondary_start_kernel+0xc4/0x168
[  152.590530] Code: d53801e0 17ffff58 d5380600 17ffff56 (d4210000)
[  152.592215] ---[ end trace 80ea98416149c87e ]---
[  152.592734] Kernel panic - not syncing: Attempted to kill the idle task!
[  152.593173] Kernel Offset: disabled
[  152.593501] CPU features: 0x0004,20c02008
[  152.593678] Memory Limit: none
[  152.594208] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
[root@arch ~]# bash: echo: write error: Input/output error
[root@arch ~]#
[root@arch ~]#
[root@arch ~]# echo HELO
HELO

Cristian Marussi (7):
  smp: add generic SMP-stop support to common code
  smp: unify crash_ and smp_send_stop() logic
  smp: coordinate concurrent crash/smp stop calls
  smp: address races of starting CPUs while stopping
  arm64: smp: use generic SMP stop common code
  arm64: smp: use SMP crash-stop common code
  arm64: smp: add arch specific cpu parking helper

 arch/arm64/Kconfig           |   3 +
 arch/arm64/include/asm/smp.h |   2 -
 arch/arm64/kernel/smp.c      | 127 ++++++++--------------------
 include/linux/smp.h          |  44 ++++++++++
 kernel/panic.c               |  26 ------
 kernel/smp.c                 | 158 +++++++++++++++++++++++++++++++++++
 6 files changed, 239 insertions(+), 121 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/7] smp: add generic SMP-stop support to common code
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 2/7] smp: unify crash_ and smp_send_stop() logic Cristian Marussi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

There was a lot of code duplication across architectures regarding the
SMP stop procedures' logic; moreover some of this duplicated code logic
happened to be similarly faulty across a few architectures: while fixing
such logic, move such generic logic as much as possible inside common
code.

Collect all the common logic related to SMP stop operations into the
common SMP code; any architecture willing to use such centralized logic
can select CONFIG_ARCH_USE_COMMON_STOP=y and provide the related
arch-specific helpers: in such a scenario, those architectures will
transparently start using the common code provided by smp_send_stop()
common function.

On the other side, Architectures not willing to use common code SMP stop
logic will simply leave CONFIG_ARCH_USE_COMMON_STOP undefined and carry
on executing their local arch-specific smp_send_stop() as before.

Suggested-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 include/linux/smp.h | 16 ++++++++++++
 kernel/smp.c        | 63 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 6fc856c9eda5..0fea0e6d2339 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -77,6 +77,22 @@ int smp_call_function_single_async(int cpu, call_single_data_t *csd);
  */
 extern void smp_send_stop(void);
 
+#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+/*
+ * Any Architecture willing to use STOP common logic implementation
+ * MUST at least provide the arch_smp_stop_call() helper which is in
+ * charge of its own arch-specific CPU-stop mechanism.
+ */
+extern void arch_smp_stop_call(cpumask_t *cpus);
+
+/*
+ * An Architecture CAN also provide the arch_smp_cpus_stop_complete()
+ * dedicated helper, to perform any final arch-specific operation on
+ * the local CPU once the other CPUs have been successfully stopped.
+ */
+void arch_smp_cpus_stop_complete(void);
+#endif
+
 /*
  * sends a 'reschedule' event to another CPU:
  */
diff --git a/kernel/smp.c b/kernel/smp.c
index 7dbcb402c2fc..700502ee2546 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 #include <linux/hypervisor.h>
+#include <linux/delay.h>
 
 #include "smpboot.h"
 
@@ -817,3 +818,65 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 	return sscs.ret;
 }
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
+
+#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+void __weak arch_smp_cpus_stop_complete(void) { }
+
+static inline bool any_other_cpus_online(cpumask_t *mask,
+					 unsigned int this_cpu_id)
+{
+	cpumask_copy(mask, cpu_online_mask);
+	cpumask_clear_cpu(this_cpu_id, mask);
+
+	return !cpumask_empty(mask);
+}
+
+/*
+ * This centralizes the common logic to:
+ *
+ *  - evaluate which CPUs are online and needs to be notified for stop,
+ *    while considering properly the status of the calling CPU
+ *
+ *  - call the arch-specific helpers to request the effective stop
+ *
+ *  - wait for the stop operation to be completed across all involved CPUs
+ *    monitoring the cpu_online_mask
+ */
+void smp_send_stop(void)
+{
+	unsigned int this_cpu_id;
+	cpumask_t mask;
+
+	this_cpu_id = smp_processor_id();
+	if (any_other_cpus_online(&mask, this_cpu_id)) {
+		unsigned long timeout;
+		unsigned int this_cpu_online = cpu_online(this_cpu_id);
+
+		if (system_state <= SYSTEM_RUNNING)
+			pr_crit("stopping secondary CPUs\n");
+		arch_smp_stop_call(&mask);
+
+		/*
+		 * Wait up to one second for other CPUs to stop.
+		 *
+		 * Here we rely simply on cpu_online_mask to sync with
+		 * arch-specific stop code without bloating the code with an
+		 * additional atomic_t ad-hoc counter.
+		 *
+		 * As a consequence we'll need proper explicit memory barriers
+		 * in case the other CPUs running the arch-specific stop-code
+		 * would need to commit to memory some data (like saved_regs).
+		 */
+		timeout = USEC_PER_SEC;
+		while (num_online_cpus() > this_cpu_online && timeout--)
+			udelay(1);
+		/* ensure any stopping-CPUs memory access is made visible */
+		smp_rmb();
+		if (num_online_cpus() > this_cpu_online)
+			pr_warn("failed to stop secondary CPUs %*pbl\n",
+				cpumask_pr_args(cpu_online_mask));
+	}
+	/* Perform final (possibly arch-specific) work on this CPU */
+	arch_smp_cpus_stop_complete();
+}
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/7] smp: unify crash_ and smp_send_stop() logic
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 1/7] smp: add generic SMP-stop support " Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 3/7] smp: coordinate concurrent crash/smp stop calls Cristian Marussi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

crash_smp_send_stop() logic was fairly similar to smp_send_stop():
a lot of logic and code was duplicated between the two code paths and
across a few different architectures.

Unify this underlying common logic into the existing SMP common stop
code: use a common workhorse function for both paths to perform the
common tasks while taking care to propagate to the underlying
architecture code the intent of the stop operation: a simple stop or
a crash dump stop.

Relocate the __weak crash_smp_send_stop() function from panic.c to smp.c,
since it is the crash dump entry point for the crash stop process and now
calls into this new common logic (only if this latter is enabled by
ARCH_USE_COMMON_SMP_STOP=y).

Introduce a few more helpers so that the architectures willing to use
the common code logic can provide their arch-specific bits to handle
the differences between a stop and a crash stop; architectures can
anyway decide to override as a whole the common logic providing their
own custom solution in crash_smp_send_stop() (as it was before).

Provide also a new common code method to inquiry on the outcome of an
ongoing crash_stop procedure: smp_crash_stop_failed().

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 include/linux/smp.h | 25 +++++++++++++++++++
 kernel/panic.c      | 26 --------------------
 kernel/smp.c        | 58 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 80 insertions(+), 29 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 0fea0e6d2339..21c97c175dad 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -91,8 +91,27 @@ extern void arch_smp_stop_call(cpumask_t *cpus);
  * the local CPU once the other CPUs have been successfully stopped.
  */
 void arch_smp_cpus_stop_complete(void);
+
+/*
+ * An Architecture CAN additionally provide the arch_smp_crash_call()
+ * helper which implements the arch-specific crash dump related operations.
+ *
+ * If such arch wants to fully support crash dump, this MUST be provided;
+ * when not provided the crash dump procedure will fallback to behave like
+ * a normal stop. (no saved regs, no arch-specific features disabled)
+ */
+extern void arch_smp_crash_call(cpumask_t *cpus);
+
+/* Helper to query the outcome of an ongoing crash_stop operation */
+bool smp_crash_stop_failed(void);
 #endif
 
+/*
+ * stops all CPUs but the current one propagating to all other CPUs
+ * the information that a crash_kexec is ongoing:
+ */
+void crash_smp_send_stop(void);
+
 /*
  * sends a 'reschedule' event to another CPU:
  */
@@ -156,6 +175,12 @@ static inline int get_boot_cpu_id(void)
 
 static inline void smp_send_stop(void) { }
 
+static inline void crash_smp_send_stop(void) { }
+
+#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+static inline bool smp_crash_stop_failed(void) { }
+#endif
+
 /*
  *	These macros fold the SMP functionality into a single CPU system
  */
diff --git a/kernel/panic.c b/kernel/panic.c
index 057540b6eee9..bc0dbf9c9b75 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -86,32 +86,6 @@ void __weak nmi_panic_self_stop(struct pt_regs *regs)
 	panic_smp_self_stop();
 }
 
-/*
- * Stop other CPUs in panic.  Architecture dependent code may override this
- * with more suitable version.  For example, if the architecture supports
- * crash dump, it should save registers of each stopped CPU and disable
- * per-CPU features such as virtualization extensions.
- */
-void __weak crash_smp_send_stop(void)
-{
-	static int cpus_stopped;
-
-	/*
-	 * This function can be called twice in panic path, but obviously
-	 * we execute this only once.
-	 */
-	if (cpus_stopped)
-		return;
-
-	/*
-	 * Note smp_send_stop is the usual smp shutdown function, which
-	 * unfortunately means it may not be hardened to work in a panic
-	 * situation.
-	 */
-	smp_send_stop();
-	cpus_stopped = 1;
-}
-
 atomic_t panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID);
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 700502ee2546..31e981eb9bd8 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -820,6 +820,7 @@ int smp_call_on_cpu(unsigned int cpu, int (*func)(void *), void *par, bool phys)
 EXPORT_SYMBOL_GPL(smp_call_on_cpu);
 
 #ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+
 void __weak arch_smp_cpus_stop_complete(void) { }
 
 static inline bool any_other_cpus_online(cpumask_t *mask,
@@ -831,6 +832,12 @@ static inline bool any_other_cpus_online(cpumask_t *mask,
 	return !cpumask_empty(mask);
 }
 
+void __weak arch_smp_crash_call(cpumask_t *cpus)
+{
+	pr_debug("SMP: Using generic %s() as SMP crash call.\n", __func__);
+	arch_smp_stop_call(cpus);
+}
+
 /*
  * This centralizes the common logic to:
  *
@@ -842,7 +849,7 @@ static inline bool any_other_cpus_online(cpumask_t *mask,
  *  - wait for the stop operation to be completed across all involved CPUs
  *    monitoring the cpu_online_mask
  */
-void smp_send_stop(void)
+static inline void __smp_send_stop_all(bool reason_crash)
 {
 	unsigned int this_cpu_id;
 	cpumask_t mask;
@@ -854,8 +861,11 @@ void smp_send_stop(void)
 
 		if (system_state <= SYSTEM_RUNNING)
 			pr_crit("stopping secondary CPUs\n");
-		arch_smp_stop_call(&mask);
-
+		/* smp and crash arch-backends helpers are kept distinct */
+		if (!reason_crash)
+			arch_smp_stop_call(&mask);
+		else
+			arch_smp_crash_call(&mask);
 		/*
 		 * Wait up to one second for other CPUs to stop.
 		 *
@@ -879,4 +889,46 @@ void smp_send_stop(void)
 	/* Perform final (possibly arch-specific) work on this CPU */
 	arch_smp_cpus_stop_complete();
 }
+
+void smp_send_stop(void)
+{
+	__smp_send_stop_all(false);
+}
+
+bool __weak smp_crash_stop_failed(void)
+{
+	return (num_online_cpus() > cpu_online(smp_processor_id()));
+}
+#endif
+
+/*
+ * Stop other CPUs while passing down the additional information that a
+ * crash_kexec is ongoing: it's up to the architecture implementation
+ * decide what to do.
+ *
+ * For example, Architectures supporting crash dump should provide
+ * specialized support for saving registers and disabling per-CPU features
+ * like virtualization extensions.
+ *
+ * Behaviour in the CONFIG_ARCH_USE_COMMON_SMP_STOP=n case is preserved
+ * as it was before.
+ */
+void __weak crash_smp_send_stop(void)
+{
+	static int cpus_stopped;
+
+	/*
+	 * This function can be called twice in panic path, but obviously
+	 * we execute this only once.
+	 */
+	if (cpus_stopped)
+		return;
+
+#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+	__smp_send_stop_all(true);
+#else
+	smp_send_stop();
 #endif
+
+	cpus_stopped = 1;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/7] smp: coordinate concurrent crash/smp stop calls
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 1/7] smp: add generic SMP-stop support " Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 2/7] smp: unify crash_ and smp_send_stop() logic Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Cristian Marussi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Once a stop request is in progress on one CPU, it must be carefully
evaluated what to do if another stop request is issued concurrently
on another CPU.

Given that panic and crash dump code flows are mutually exclusive,
the only alternative possible scenario which instead could lead to
concurrent stop requests, is represented by the simultaneous stop
requests possibly triggered by a concurrent halt/reboot/shutdown.

In such a case, prioritize the panic/crash procedure and most importantly
immediately park the offending CPU that was attempting the concurrent stop
request: force it to idle quietly, waiting for the ongoing stop/dump
requests to arrive.

Failing to do so would result in the offending CPU being effectively lost
on the next possible reboot triggered by the crash dump. [1]

Another scenario, where the crash/stop code path was known to be executed
twice, was during a normal panic/crash with crash_kexec_post_notifiers=1;
since this issue is similar, fold also this case-handling into this new
logic.

[1]
<<<<<---------- TRIGGERED PANIC
[  225.676014] ------------[ cut here ]------------
[  225.676656] kernel BUG at arch/arm64/kernel/cpufeature.c:852!
[  225.677253] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[  225.677660] Modules linked in:
[  225.678458] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.3.0-rc5-00004-gb8a210cd3c32-dirty #2
[  225.678621] Hardware name: Foundation-v8A (DT)
[  225.678868] pstate: 000001c5 (nzcv dAIF -PAN -UAO)
[  225.679523] pc : has_cpuid_feature+0x35c/0x360
[  225.679649] lr : verify_local_elf_hwcaps+0x6c/0xf0
[  225.679759] sp : ffff0000118cbf60
[  225.679855] x29: ffff0000118cbf60 x28: 0000000000000000
[  225.680026] x27: 0000000000000000 x26: 0000000000000000
[  225.680115] x25: ffff00001167a010 x24: ffff0000112f59f8
[  225.680207] x23: 0000000000000000 x22: 0000000000000000
[  225.680290] x21: ffff0000112ea018 x20: ffff000010fe5538
[  225.680372] x19: ffff000010ba3f30 x18: 000000000000001e
[  225.680465] x17: 0000000000000000 x16: 0000000000000000
[  225.680546] x15: 0000000000000000 x14: 0000000000000008
[  225.680629] x13: 0209018b7a9404f4 x12: 0000000000000001
[  225.680719] x11: 0000000000000080 x10: 00400032b5503510
[  225.680803] x9 : 0000000000000000 x8 : ffff000010b93204
[  225.680884] x7 : 00000000800001d8 x6 : 0000000000000005
[  225.680975] x5 : 0000000000000000 x4 : 0000000000000000
[  225.681056] x3 : 0000000000000000 x2 : 0000000000008000
[  225.681139] x1 : 0000000000180480 x0 : 0000000000180480
[  225.681423] Call trace:
[  225.681669]  has_cpuid_feature+0x35c/0x360
[  225.681762]  verify_local_elf_hwcaps+0x6c/0xf0
[  225.681836]  check_local_cpu_capabilities+0x88/0x118
[  225.681939]  secondary_start_kernel+0xc4/0x168
[  225.682432] Code: d53801e0 17ffff58 d5380600 17ffff56 (d4210000)
[  225.683998] smp: stopping secondary CPUs
[  225.684130] Delaying stop....   <<<<<------ INSTRUMENTED DEBUG_DELAY

Rebooting.                         <<<<<------ MANUAL SIMULTANEOUS REBOOT
[  232.647472] reboot: Restarting system
[  232.648363] Reboot failed -- System halted
[  239.552413] smp: failed to stop secondary CPUs 0
[  239.554730] Starting crashdump kernel...
[  239.555194] ------------[ cut here ]------------
[  239.555406] Some CPUs may be stale, kdump will be unreliable.
[  239.555802] WARNING: CPU: 3 PID: 0 at arch/arm64/kernel/machine_kexec.c:156 machine_kexec+0x3c/0x2b0
[  239.556044] Modules linked in:
[  239.556244] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.3.0-rc5-00004-gb8a210cd3c32-dirty #2
[  239.556340] Hardware name: Foundation-v8A (DT)
[  239.556459] pstate: 600003c5 (nZCv DAIF -PAN -UAO)
[  239.556587] pc : machine_kexec+0x3c/0x2b0
[  239.556700] lr : machine_kexec+0x3c/0x2b0
[  239.556775] sp : ffff0000118cbad0
[  239.556876] x29: ffff0000118cbad0 x28: ffff80087a8c3700
[  239.557012] x27: 0000000000000000 x26: 0000000000000000
[  239.557278] x25: ffff000010fe3ef0 x24: 00000000000003c0
....
[  239.561568] Bye!
[    0.000000] Booting Linux on physical CPU 0x0000000003 [0x410fd0f0]
[    0.000000] Linux version 5.2.0-rc4-00001-g93bd4bc234d5-dirty
[    0.000000] Machine model: Foundation-v8A
...
[    0.197991] smp: Bringing up secondary CPUs ...
[    0.232643] psci: failed to boot CPU1 (-22)  <<<<--- LOST CPU ON REBOOT
[    0.232861] CPU1: failed to boot: -22
[    0.306291] Detected PIPT I-cache on CPU2
[    0.310524] GICv3: CPU2: found redistributor 1 region 0:0x000000002f120000
[    0.315618] CPU2: Booted secondary processor 0x0000000001 [0x410fd0f0]
[    0.395576] Detected PIPT I-cache on CPU3
[    0.400431] GICv3: CPU3: found redistributor 2 region 0:0x000000002f140000
[    0.407252] CPU3: Booted secondary processor 0x0000000002 [0x410fd0f0]
[    0.431811] smp: Brought up 1 node, 3 CPUs
[    0.439898] SMP: Total of 3 processors activated.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 include/linux/smp.h |  3 +++
 kernel/smp.c        | 48 ++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 21c97c175dad..781b18b7fec0 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -104,6 +104,9 @@ extern void arch_smp_crash_call(cpumask_t *cpus);
 
 /* Helper to query the outcome of an ongoing crash_stop operation */
 bool smp_crash_stop_failed(void);
+
+/* A generic cpu parking helper, possibly overridden by architecture code */
+void arch_smp_cpu_park(void) __noreturn;
 #endif
 
 /*
diff --git a/kernel/smp.c b/kernel/smp.c
index 31e981eb9bd8..ea8a1cc0ec3e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -821,6 +821,12 @@ EXPORT_SYMBOL_GPL(smp_call_on_cpu);
 
 #ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
 
+void __weak arch_smp_cpu_park(void)
+{
+	while (1)
+		cpu_relax();
+}
+
 void __weak arch_smp_cpus_stop_complete(void) { }
 
 static inline bool any_other_cpus_online(cpumask_t *mask,
@@ -838,6 +844,9 @@ void __weak arch_smp_crash_call(cpumask_t *cpus)
 	arch_smp_stop_call(cpus);
 }
 
+#define	REASON_STOP	1
+#define	REASON_CRASH	2
+
 /*
  * This centralizes the common logic to:
  *
@@ -853,8 +862,38 @@ static inline void __smp_send_stop_all(bool reason_crash)
 {
 	unsigned int this_cpu_id;
 	cpumask_t mask;
+	static atomic_t stopping;
+	int was_reason;
 
 	this_cpu_id = smp_processor_id();
+	/* make sure that concurrent stop requests are handled properly */
+	was_reason = atomic_cmpxchg(&stopping, 0,
+				    reason_crash ? REASON_CRASH : REASON_STOP);
+	if (was_reason) {
+		/*
+		 * This function can be called twice in panic path if
+		 * crash_kexec is called with crash_kexec_post_notifiers=1,
+		 * but obviously we execute this only once.
+		 */
+		if (reason_crash && was_reason == REASON_CRASH)
+			return;
+		/*
+		 * In case of other concurrent STOP calls (like in a REBOOT
+		 * started simultaneously as an ongoing PANIC/CRASH/REBOOT)
+		 * we want to prioritize the ongoing PANIC/KEXEC call and
+		 * force here the offending CPU that was attempting the
+		 * concurrent STOP to just sit idle waiting to die.
+		 * Failing to do so would result in a lost CPU on the next
+		 * possible reboot triggered by crash_kexec().
+		 */
+		if (!reason_crash) {
+			pr_crit("CPU%d - kernel already stopping, parking myself.\n",
+				this_cpu_id);
+			local_irq_enable();
+			/* does not return */
+			arch_smp_cpu_park();
+		}
+	}
 	if (any_other_cpus_online(&mask, this_cpu_id)) {
 		unsigned long timeout;
 		unsigned int this_cpu_online = cpu_online(this_cpu_id);
@@ -915,6 +954,9 @@ bool __weak smp_crash_stop_failed(void)
  */
 void __weak crash_smp_send_stop(void)
 {
+#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
+	__smp_send_stop_all(true);
+#else
 	static int cpus_stopped;
 
 	/*
@@ -924,11 +966,7 @@ void __weak crash_smp_send_stop(void)
 	if (cpus_stopped)
 		return;
 
-#ifdef CONFIG_ARCH_USE_COMMON_SMP_STOP
-	__smp_send_stop_all(true);
-#else
 	smp_send_stop();
-#endif
-
 	cpus_stopped = 1;
+#endif
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/7] smp: address races of starting CPUs while stopping
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
                   ` (2 preceding siblings ...)
  2019-08-23 11:57 ` [RFC PATCH 3/7] smp: coordinate concurrent crash/smp stop calls Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code Cristian Marussi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Add to SMP stop common code a best-effort retry logic, re-issuing a stop
request when any CPU is detected to be still online after the first
stop request cycle has completed.

Address the case in which some CPUs could be still starting up when the
stop process is started, remaining so undetected and coming fully online
only after the SMP stop procedure was already started: such officially
still offline CPUs would be missed by an ongoing stop procedure.
(not considering here the special case of stuck CPUs)

Using here a best effort approach, though, it is not anyway guaranteed
to be able to stop any CPU that happened to finally come online after
the whole SMP stop retry cycle has completed.
(i.e. the race-window is reduced but not eliminated)

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
A more deterministic approach has been also attempted in order to catch
any concurrently starting CPUs at the very last moment and make them
kill themselves by:

- adding a new set_cpus_stopping() in cpumask.h used to set a
  __cpus_stopping atomic internal flag

- modifying set_cpu_online() to check on __cpus_stopping only when
  coming online, and force the offending CPU to kill itself in that case

Anyway it has proved tricky and complex (beside faulty) to implement the
above 'kill-myself' phase in a reliable way while remaining architecture
agnostic and still distingushing properly regular stops from crash kexec.

So given that the main idea underlying this patch series was instead of
simplifying and unifying code and the residual races not caught by the
best-effort logic seemed not very likely, this more deterministic approach
has been dropped in favour of the best effort retry logic.
---
 kernel/smp.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index ea8a1cc0ec3e..10d3120494f2 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -847,6 +847,8 @@ void __weak arch_smp_crash_call(cpumask_t *cpus)
 #define	REASON_STOP	1
 #define	REASON_CRASH	2
 
+#define	MAX_STOP_RETRIES	2
+
 /*
  * This centralizes the common logic to:
  *
@@ -860,7 +862,7 @@ void __weak arch_smp_crash_call(cpumask_t *cpus)
  */
 static inline void __smp_send_stop_all(bool reason_crash)
 {
-	unsigned int this_cpu_id;
+	unsigned int this_cpu_id, retries = MAX_STOP_RETRIES;
 	cpumask_t mask;
 	static atomic_t stopping;
 	int was_reason;
@@ -894,7 +896,7 @@ static inline void __smp_send_stop_all(bool reason_crash)
 			arch_smp_cpu_park();
 		}
 	}
-	if (any_other_cpus_online(&mask, this_cpu_id)) {
+	while (retries-- && any_other_cpus_online(&mask, this_cpu_id)) {
 		unsigned long timeout;
 		unsigned int this_cpu_online = cpu_online(this_cpu_id);
 
@@ -921,9 +923,12 @@ static inline void __smp_send_stop_all(bool reason_crash)
 			udelay(1);
 		/* ensure any stopping-CPUs memory access is made visible */
 		smp_rmb();
-		if (num_online_cpus() > this_cpu_online)
+		if (num_online_cpus() > this_cpu_online) {
 			pr_warn("failed to stop secondary CPUs %*pbl\n",
 				cpumask_pr_args(cpu_online_mask));
+			if (retries)
+				pr_warn("Retrying SMP stop call...\n");
+		}
 	}
 	/* Perform final (possibly arch-specific) work on this CPU */
 	arch_smp_cpus_stop_complete();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
                   ` (3 preceding siblings ...)
  2019-08-23 11:57 ` [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-26 15:32   ` Christoph Hellwig
  2019-08-23 11:57 ` [RFC PATCH 6/7] arm64: smp: use SMP crash-stop " Cristian Marussi
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Make arm64 use the generic SMP-stop logic provided by common code
unified smp_send_stop() function.

arm64 smp_send_stop() logic had a bug in it: it failed to consider the
online status of the calling CPU when evaluating if any stop message
needed to be sent to other CPus at all: this resulted, on a 2-CPUs
system, in the failure to stop all cpus if one paniced while starting
up, leaving such system in an unexpected lively state.

[root@arch ~]# echo 1 > /sys/devices/system/cpu/cpu1/online
[root@arch ~]# [  152.583368] ------------[ cut here ]------------
[  152.583872] kernel BUG at arch/arm64/kernel/cpufeature.c:852!
[  152.584693] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[  152.585228] Modules linked in:
[  152.586040] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.3.0-rc5-00001-gcabd12118c4a-dirty #2
[  152.586218] Hardware name: Foundation-v8A (DT)
[  152.586478] pstate: 000001c5 (nzcv dAIF -PAN -UAO)
[  152.587260] pc : has_cpuid_feature+0x35c/0x360
[  152.587398] lr : verify_local_elf_hwcaps+0x6c/0xf0
[  152.587520] sp : ffff0000118bbf60
[  152.587605] x29: ffff0000118bbf60 x28: 0000000000000000
[  152.587784] x27: 0000000000000000 x26: 0000000000000000
[  152.587882] x25: ffff00001167a010 x24: ffff0000112f59f8
[  152.587992] x23: 0000000000000000 x22: 0000000000000000
[  152.588085] x21: ffff0000112ea018 x20: ffff000010fe5518
[  152.588180] x19: ffff000010ba3f30 x18: 0000000000000036
[  152.588285] x17: 0000000000000000 x16: 0000000000000000
[  152.588380] x15: 0000000000000000 x14: ffff80087a821210
[  152.588481] x13: 0000000000000000 x12: 0000000000000000
[  152.588599] x11: 0000000000000080 x10: 00400032b5503510
[  152.588709] x9 : 0000000000000000 x8 : ffff000010b93204
[  152.588810] x7 : 00000000800001d8 x6 : 0000000000000005
[  152.588910] x5 : 0000000000000000 x4 : 0000000000000000
[  152.589021] x3 : 0000000000000000 x2 : 0000000000008000
[  152.589121] x1 : 0000000000180480 x0 : 0000000000180480
[  152.589379] Call trace:
[  152.589646]  has_cpuid_feature+0x35c/0x360
[  152.589763]  verify_local_elf_hwcaps+0x6c/0xf0
[  152.589858]  check_local_cpu_capabilities+0x88/0x118
[  152.589968]  secondary_start_kernel+0xc4/0x168
[  152.590530] Code: d53801e0 17ffff58 d5380600 17ffff56 (d4210000)
[  152.592215] ---[ end trace 80ea98416149c87e ]---
[  152.592734] Kernel panic - not syncing: Attempted to kill the idle task!
[  152.593173] Kernel Offset: disabled
[  152.593501] CPU features: 0x0004,20c02008
[  152.593678] Memory Limit: none
[  152.594208] ---[ end Kernel panic - not syncing: Attempted to kill the idle
task! ]---
[root@arch ~]# bash: echo: write error: Input/output error
[root@arch ~]#
[root@arch ~]#
[root@arch ~]# echo HELO
HELO

Get rid of such bug, switching arm64 to use the common SMP stop code.

Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 arch/arm64/Kconfig      |  3 +++
 arch/arm64/kernel/smp.c | 29 ++++++-----------------------
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..3baa69ca4c55 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -276,6 +276,9 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
 config SMP
 	def_bool y
 
+config ARCH_USE_COMMON_SMP_STOP
+	def_bool y if SMP
+
 config KERNEL_MODE_NEON
 	def_bool y
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 018a33e01b0e..473be3c23df7 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -953,33 +953,16 @@ void tick_broadcast(const struct cpumask *mask)
 }
 #endif
 
-void smp_send_stop(void)
+void arch_smp_cpus_stop_complete(void)
 {
-	unsigned long timeout;
-
-	if (num_online_cpus() > 1) {
-		cpumask_t mask;
-
-		cpumask_copy(&mask, cpu_online_mask);
-		cpumask_clear_cpu(smp_processor_id(), &mask);
-
-		if (system_state <= SYSTEM_RUNNING)
-			pr_crit("SMP: stopping secondary CPUs\n");
-		smp_cross_call(&mask, IPI_CPU_STOP);
-	}
-
-	/* Wait up to one second for other CPUs to stop */
-	timeout = USEC_PER_SEC;
-	while (num_online_cpus() > 1 && timeout--)
-		udelay(1);
-
-	if (num_online_cpus() > 1)
-		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
-			   cpumask_pr_args(cpu_online_mask));
-
 	sdei_mask_local_cpu();
 }
 
+void arch_smp_stop_call(cpumask_t *cpus)
+{
+	smp_cross_call(cpus, IPI_CPU_STOP);
+}
+
 #ifdef CONFIG_KEXEC_CORE
 void crash_smp_send_stop(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/7] arm64: smp: use SMP crash-stop common code
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
                   ` (4 preceding siblings ...)
  2019-08-23 11:57 ` [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-23 11:57 ` [RFC PATCH 7/7] arm64: smp: add arch specific cpu parking helper Cristian Marussi
  2019-08-26 15:34 ` [RFC PATCH 0/7] Unify SMP stop generic logic to common code Christoph Hellwig
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Make arm64 use the SMP common implementation of crash_smp_send_stop() and
its generic logic, by removing the arm64 crash_smp_send_stop() definition
and providing the needed arch specific helpers.

Additionally, simplify the arch-specific stop and crash dump ISRs backends
(which are in charge of effectively receiving and interpreting the
stop/crash messages) and unify them as much as possible.

Using the SMP common code, it is no more needed to make use of an atomic_t
counter to make sure that each CPU had time to perform its crash dump
related shutdown-ops before the world ends: simply take care to synchronize
on cpu_online_mask, and add proper explicit memory barriers where needed.

Moreover, remove arm64 specific smp_crash_stop_failed() helper as a whole
and rely on the common code provided homonym function to lookup the state
of an ongoing crash_stop operation.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 arch/arm64/include/asm/smp.h |   2 -
 arch/arm64/kernel/smp.c      | 100 +++++++++--------------------------
 2 files changed, 26 insertions(+), 76 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index a0c8a0b65259..d98c409f9225 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -150,8 +150,6 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
-extern void crash_smp_send_stop(void);
-extern bool smp_crash_stop_failed(void);
 
 #endif /* ifndef __ASSEMBLY__ */
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 473be3c23df7..c7497a885cfd 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -825,12 +825,30 @@ void arch_irq_work_raise(void)
 }
 #endif
 
-static void local_cpu_stop(void)
+static void local_cpu_crash_or_stop(struct pt_regs *crash_regs)
 {
-	set_cpu_online(smp_processor_id(), false);
+	unsigned int cpu = smp_processor_id();
 
-	local_daif_mask();
+	if (IS_ENABLED(CONFIG_KEXEC_CORE) && crash_regs) {
+#ifdef CONFIG_KEXEC_CORE
+		/* crash stop requested: save regs before going offline */
+		crash_save_cpu(crash_regs, cpu);
+#endif
+		local_irq_disable();
+	} else {
+		local_daif_mask();
+	}
 	sdei_mask_local_cpu();
+	/* ensure dumped regs are visible once cpu is seen offline */
+	smp_wmb();
+	set_cpu_online(cpu, false);
+	/* ensure all writes are globally visible before cpu parks */
+	wmb();
+#if defined(CONFIG_KEXEC_CORE) && defined(CONFIG_HOTPLUG_CPU)
+	if (cpu_ops[cpu]->cpu_die)
+		cpu_ops[cpu]->cpu_die(cpu);
+#endif
+	/* just in case */
 	cpu_park_loop();
 }
 
@@ -841,31 +859,7 @@ static void local_cpu_stop(void)
  */
 void panic_smp_self_stop(void)
 {
-	local_cpu_stop();
-}
-
-#ifdef CONFIG_KEXEC_CORE
-static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
-#endif
-
-static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
-{
-#ifdef CONFIG_KEXEC_CORE
-	crash_save_cpu(regs, cpu);
-
-	atomic_dec(&waiting_for_crash_ipi);
-
-	local_irq_disable();
-	sdei_mask_local_cpu();
-
-#ifdef CONFIG_HOTPLUG_CPU
-	if (cpu_ops[cpu]->cpu_die)
-		cpu_ops[cpu]->cpu_die(cpu);
-#endif
-
-	/* just in case */
-	cpu_park_loop();
-#endif
+	local_cpu_crash_or_stop(NULL);
 }
 
 /*
@@ -894,14 +888,14 @@ void handle_IPI(int ipinr, struct pt_regs *regs)
 
 	case IPI_CPU_STOP:
 		irq_enter();
-		local_cpu_stop();
+		local_cpu_crash_or_stop(NULL);
 		irq_exit();
 		break;
 
 	case IPI_CPU_CRASH_STOP:
 		if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
 			irq_enter();
-			ipi_cpu_crash_stop(cpu, regs);
+			local_cpu_crash_or_stop(regs);
 
 			unreachable();
 		}
@@ -963,52 +957,10 @@ void arch_smp_stop_call(cpumask_t *cpus)
 	smp_cross_call(cpus, IPI_CPU_STOP);
 }
 
-#ifdef CONFIG_KEXEC_CORE
-void crash_smp_send_stop(void)
+void arch_smp_crash_call(cpumask_t *cpus)
 {
-	static int cpus_stopped;
-	cpumask_t mask;
-	unsigned long timeout;
-
-	/*
-	 * This function can be called twice in panic path, but obviously
-	 * we execute this only once.
-	 */
-	if (cpus_stopped)
-		return;
-
-	cpus_stopped = 1;
-
-	if (num_online_cpus() == 1) {
-		sdei_mask_local_cpu();
-		return;
-	}
-
-	cpumask_copy(&mask, cpu_online_mask);
-	cpumask_clear_cpu(smp_processor_id(), &mask);
-
-	atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
-
-	pr_crit("SMP: stopping secondary CPUs\n");
-	smp_cross_call(&mask, IPI_CPU_CRASH_STOP);
-
-	/* Wait up to one second for other CPUs to stop */
-	timeout = USEC_PER_SEC;
-	while ((atomic_read(&waiting_for_crash_ipi) > 0) && timeout--)
-		udelay(1);
-
-	if (atomic_read(&waiting_for_crash_ipi) > 0)
-		pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
-			   cpumask_pr_args(&mask));
-
-	sdei_mask_local_cpu();
-}
-
-bool smp_crash_stop_failed(void)
-{
-	return (atomic_read(&waiting_for_crash_ipi) > 0);
+	smp_cross_call(cpus, IPI_CPU_CRASH_STOP);
 }
-#endif
 
 /*
  * not supported here
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/7] arm64: smp: add arch specific cpu parking helper
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
                   ` (5 preceding siblings ...)
  2019-08-23 11:57 ` [RFC PATCH 6/7] arm64: smp: use SMP crash-stop " Cristian Marussi
@ 2019-08-23 11:57 ` Cristian Marussi
  2019-08-26 15:34 ` [RFC PATCH 0/7] Unify SMP stop generic logic to common code Christoph Hellwig
  7 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-23 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-arm-kernel, dave.martin, james.morse,
	mark.rutland, catalin.marinas, will, tglx, peterz,
	takahiro.akashi, hidehiro.kawai.ez

Add an arm64 specific helper which parks the cpu in a more architecture
efficient way.

Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
---
 arch/arm64/kernel/smp.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index c7497a885cfd..a3fb22b4c870 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -947,6 +947,12 @@ void tick_broadcast(const struct cpumask *mask)
 }
 #endif
 
+void arch_smp_cpu_park(void)
+{
+	while (1)
+		cpu_park_loop();
+}
+
 void arch_smp_cpus_stop_complete(void)
 {
 	sdei_mask_local_cpu();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code
  2019-08-23 11:57 ` [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code Cristian Marussi
@ 2019-08-26 15:32   ` Christoph Hellwig
  2019-08-26 19:58     ` Cristian Marussi
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-08-26 15:32 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: linux-kernel, linux-arch, mark.rutland, peterz, catalin.marinas,
	takahiro.akashi, james.morse, hidehiro.kawai.ez, tglx, will,
	dave.martin, linux-arm-kernel

> +config ARCH_USE_COMMON_SMP_STOP
> +	def_bool y if SMP

The option belongs into common code and the arch code shoud only
select it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Unify SMP stop generic logic to common code
  2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
                   ` (6 preceding siblings ...)
  2019-08-23 11:57 ` [RFC PATCH 7/7] arm64: smp: add arch specific cpu parking helper Cristian Marussi
@ 2019-08-26 15:34 ` Christoph Hellwig
  2019-08-26 19:33   ` Cristian Marussi
  7 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2019-08-26 15:34 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: linux-kernel, linux-arch, mark.rutland, peterz, catalin.marinas,
	takahiro.akashi, james.morse, hidehiro.kawai.ez, tglx, will,
	dave.martin, linux-arm-kernel

On Fri, Aug 23, 2019 at 12:57:13PM +0100, Cristian Marussi wrote:
> An architecture willing to rely on this SMP common logic has to define its
> own helpers and set CONFIG_ARCH_USE_COMMON_SMP_STOP=y.
> The series wire this up for arm64.
> 
> Behaviour is not changed for architectures not adopting this new common
> logic.

Seens like this common code only covers arm64.  I think we should
generally have at least two users for common code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Unify SMP stop generic logic to common code
  2019-08-26 15:34 ` [RFC PATCH 0/7] Unify SMP stop generic logic to common code Christoph Hellwig
@ 2019-08-26 19:33   ` Cristian Marussi
  0 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-26 19:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-arch, mark.rutland, peterz, catalin.marinas,
	takahiro.akashi, james.morse, hidehiro.kawai.ez, tglx, will,
	dave.martin, linux-arm-kernel

Hi Christoph

thanks for the review.

On 8/26/19 4:34 PM, Christoph Hellwig wrote:
> On Fri, Aug 23, 2019 at 12:57:13PM +0100, Cristian Marussi wrote:
>> An architecture willing to rely on this SMP common logic has to define its
>> own helpers and set CONFIG_ARCH_USE_COMMON_SMP_STOP=y.
>> The series wire this up for arm64.
>>
>> Behaviour is not changed for architectures not adopting this new common
>> logic.
> 
> Seens like this common code only covers arm64.  I think we should
> generally have at least two users for common code.
> 

Yes absolutely, but this RFC was an attempt at first to explore if this
approach was deemed sensible upstream or not, so I wired up only arm64 for now.

Thanks

Cristian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code
  2019-08-26 15:32   ` Christoph Hellwig
@ 2019-08-26 19:58     ` Cristian Marussi
  2019-08-26 22:26       ` Thomas Gleixner
  0 siblings, 1 reply; 14+ messages in thread
From: Cristian Marussi @ 2019-08-26 19:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-kernel, linux-arch, mark.rutland, peterz, catalin.marinas,
	takahiro.akashi, james.morse, hidehiro.kawai.ez, tglx, will,
	dave.martin, linux-arm-kernel

Hi

On 8/26/19 4:32 PM, Christoph Hellwig wrote:
>> +config ARCH_USE_COMMON_SMP_STOP
>> +	def_bool y if SMP
> 
> The option belongs into common code and the arch code shoud only
> select it.
>

In fact that was my first approach, but then I noticed that in kernel/ topdir
there was no generic Kconfig but only subsystem specific ones:

Kconfig.freezer  Kconfig.hz       Kconfig.locks    Kconfig.preempt

while instead looking into archs top level Kconfig, beside the usual arch/Kconfig selects,
I could find this similar sort of "reversed" approach in which the arch defined and
selected a CONFIG which was indeed then used only in common code like in:

20:37 $ egrep -R ARCH_HAS_CACHE_LINE_SIZE .
./arch/arc/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./arch/x86/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./arch/arm64/Kconfig:config ARCH_HAS_CACHE_LINE_SIZE
./include/linux/cache.h:#ifndef CONFIG_ARCH_HAS_CACHE_LINE_SIZE

20:39 $ egrep -R ARCH_HAS_KEXEC_PURGATORY .
./arch/powerpc/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/x86/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/s390/Kconfig:config ARCH_HAS_KEXEC_PURGATORY
./arch/s390/purgatory/Makefile:obj-$(CONFIG_ARCH_HAS_KEXEC_PURGATORY) += kexec-purgatory.o
./arch/s390/Kbuild:obj-$(CONFIG_ARCH_HAS_KEXEC_PURGATORY) += purgatory/
./kernel/kexec_file.c:	if (!IS_ENABLED(CONFIG_ARCH_HAS_KEXEC_PURGATORY))

so I thought it was an acceptable option and I went for it, not to introduce a new kernel/Kconfig.smp
just for this new config option; but in fact I could have missed the real reason underlying these two
different choices.

Thanks

Cristian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code
  2019-08-26 19:58     ` Cristian Marussi
@ 2019-08-26 22:26       ` Thomas Gleixner
  2019-08-27 14:34         ` Cristian Marussi
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2019-08-26 22:26 UTC (permalink / raw)
  To: Cristian Marussi
  Cc: Christoph Hellwig, linux-kernel, linux-arch, mark.rutland,
	peterz, catalin.marinas, takahiro.akashi, james.morse,
	hidehiro.kawai.ez, will, dave.martin, linux-arm-kernel

On Mon, 26 Aug 2019, Cristian Marussi wrote:
> On 8/26/19 4:32 PM, Christoph Hellwig wrote:
> > > +config ARCH_USE_COMMON_SMP_STOP
> > > +	def_bool y if SMP
> > 
> > The option belongs into common code and the arch code shoud only
> > select it.
> > 
> 
> In fact that was my first approach, but then I noticed that in kernel/ topdir
> there was no generic Kconfig but only subsystem specific ones:
> 
> Kconfig.freezer  Kconfig.hz       Kconfig.locks    Kconfig.preempt

arch/Kconfig

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code
  2019-08-26 22:26       ` Thomas Gleixner
@ 2019-08-27 14:34         ` Cristian Marussi
  0 siblings, 0 replies; 14+ messages in thread
From: Cristian Marussi @ 2019-08-27 14:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Hellwig, linux-kernel, linux-arch, mark.rutland,
	peterz, catalin.marinas, takahiro.akashi, james.morse,
	hidehiro.kawai.ez, will, dave.martin, linux-arm-kernel

Hi

On 26/08/2019 23:26, Thomas Gleixner wrote:
> On Mon, 26 Aug 2019, Cristian Marussi wrote:
>> On 8/26/19 4:32 PM, Christoph Hellwig wrote:
>>>> +config ARCH_USE_COMMON_SMP_STOP
>>>> +	def_bool y if SMP
>>>
>>> The option belongs into common code and the arch code shoud only
>>> select it.
>>>
>>
>> In fact that was my first approach, but then I noticed that in kernel/ topdir
>> there was no generic Kconfig but only subsystem specific ones:
>>
>> Kconfig.freezer  Kconfig.hz       Kconfig.locks    Kconfig.preempt
> 
> arch/Kconfig
> 

Ok I'll move it there in v2.

Thanks for the review.

Cristian

> Thanks,
> 
> 	tglx
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-27 14:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-23 11:57 [RFC PATCH 0/7] Unify SMP stop generic logic to common code Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 1/7] smp: add generic SMP-stop support " Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 2/7] smp: unify crash_ and smp_send_stop() logic Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 3/7] smp: coordinate concurrent crash/smp stop calls Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 4/7] smp: address races of starting CPUs while stopping Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 5/7] arm64: smp: use generic SMP stop common code Cristian Marussi
2019-08-26 15:32   ` Christoph Hellwig
2019-08-26 19:58     ` Cristian Marussi
2019-08-26 22:26       ` Thomas Gleixner
2019-08-27 14:34         ` Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 6/7] arm64: smp: use SMP crash-stop " Cristian Marussi
2019-08-23 11:57 ` [RFC PATCH 7/7] arm64: smp: add arch specific cpu parking helper Cristian Marussi
2019-08-26 15:34 ` [RFC PATCH 0/7] Unify SMP stop generic logic to common code Christoph Hellwig
2019-08-26 19:33   ` Cristian Marussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).