All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] arm64: ensure CPUs are quiescent before patching
@ 2021-12-03 10:47 Mark Rutland
  2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: andre.przywara, ardb, catalin.marinas, james.morse, joey.gouly,
	mark.rutland, suzuki.poulose, will

On arm64, certain instructions cannot be patched while they are being
concurrently executed, and in these cases we use stop_machine() to
ensure that while one CPU is patching instructions all other CPUs are in
a quiescent state. We have two distinct sequences for this, one used for
boot-time patching of alternatives, and on used for runtime patching
(e.g. kprobes).

Both sequences wait for patching to be complete before CPUs exit the
quiescent state, but we don't wait for CPUs to be quiescent *before* we
start patching, and so we may patch code which is still being executed
(e.g. portions of stop_machine() itself).

These patches fix this problem by updating the sequences to wait for
CPUs to become quiescent before starting patches. The first two patches
are potentially backportable fixes for the individual sequences, and the
this patch unifies them behind an arm64-specific patch_machine() helper.
The last patch prevents taking asynchronous exceptions out of a
quiescent state (just DAIF for now; I'm not sure exactly how to handle
SDEI).

The architecture documentation is a little vague on how to ensure
completion of prior execution (i.e. when patching from another CPU
cannot possibly affect this and cause UNPREDICTABLE behaviour). For the
moment I'm assuming that an atomic store cannot become visible until all
prior execution has completed, but I suspect that we *might* need to add
barriers into patch_machine() prior to signalling quiescence.

This series does not intend to address the more general problem that out
patching sequences may use directly-patchable or instrumentable code,
and I'm intending that we address those with subsequent patches. Fixing
that will require a more substantial rework (e.g. of the insn code).

Thanks,
Mark.

Mark Rutland (4):
  arm64: alternative: wait for other CPUs before patching
  arm64: insn: wait for other CPUs before patching
  arm64: patching: unify stop_machine() patch synchronization
  arm64: patching: mask exceptions in patch_machine()

 arch/arm64/include/asm/patching.h |  4 ++
 arch/arm64/kernel/alternative.c   | 33 +++--------
 arch/arm64/kernel/patching.c      | 94 +++++++++++++++++++++++++------
 3 files changed, 89 insertions(+), 42 deletions(-)

-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-03 10:47 [PATCH 0/4] arm64: ensure CPUs are quiescent before patching Mark Rutland
@ 2021-12-03 10:47 ` Mark Rutland
  2021-12-10 14:49   ` Catalin Marinas
  2021-12-13 13:31   ` Will Deacon
  2021-12-03 10:47 ` [PATCH 2/4] arm64: insn: " Mark Rutland
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: andre.przywara, ardb, catalin.marinas, james.morse, joey.gouly,
	mark.rutland, suzuki.poulose, will

In __apply_alternatives_multi_stop() we have a "really simple polling
protocol" to avoid patching code that is concurrently executed on other
CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
complete, but the boot CPU doesn't wait for secondaries to enter the
polling loop, and it's possible that patching starts while secondaries
are still within the stop_machine logic.

Let's fix this by adding a vaguely simple polling protocol where the
boot CPU waits for secondaries to signal that they have entered the
unpatchable stop function. We can use the arch_atomic_*() functions for
this, as they are not patched with alternatives.

At the same time, let's make `all_alternatives_applied` local to
__apply_alternatives_multi_stop(), since it is only used there, and this
makes the code a little clearer.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 3fb79b76e9d9..4f32d4425aac 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -21,9 +21,6 @@
 #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
 #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
 
-/* Volatile, as we may be patching the guts of READ_ONCE() */
-static volatile int all_alternatives_applied;
-
 static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
 
 struct alt_region {
@@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
 }
 
 /*
- * We might be patching the stop_machine state machine, so implement a
- * really simple polling protocol here.
+ * Apply alternatives, ensuring that no CPUs are concurrently executing code
+ * being patched.
+ *
+ * We might be patching the stop_machine state machine or READ_ONCE(), so
+ * we implement a simple polling protocol.
  */
 static int __apply_alternatives_multi_stop(void *unused)
 {
+	/* Volatile, as we may be patching the guts of READ_ONCE() */
+	static volatile int all_alternatives_applied;
+	static atomic_t stopped_cpus = ATOMIC_INIT(0);
 	struct alt_region region = {
 		.begin	= (struct alt_instr *)__alt_instructions,
 		.end	= (struct alt_instr *)__alt_instructions_end,
@@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
 
 	/* We always have a CPU 0 at this point (__init) */
 	if (smp_processor_id()) {
+		arch_atomic_inc(&stopped_cpus);
 		while (!all_alternatives_applied)
 			cpu_relax();
 		isb();
 	} else {
 		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
 
+		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
+			cpu_relax();
+
 		bitmap_complement(remaining_capabilities, boot_capabilities,
 				  ARM64_NPATCHABLE);
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] arm64: insn: wait for other CPUs before patching
  2021-12-03 10:47 [PATCH 0/4] arm64: ensure CPUs are quiescent before patching Mark Rutland
  2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
@ 2021-12-03 10:47 ` Mark Rutland
  2021-12-03 10:47 ` [PATCH 3/4] arm64: patching: unify stop_machine() patch synchronization Mark Rutland
  2021-12-03 10:47 ` [PATCH 4/4] arm64: patching: mask exceptions in patch_machine() Mark Rutland
  3 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: andre.przywara, ardb, catalin.marinas, james.morse, joey.gouly,
	mark.rutland, suzuki.poulose, will

In aarch64_insn_patch_text_cb(), secondary CPUs wait for the master CPU
to signal that patching is complete, but the master CPU doesn't wait for
secondaries to enter the polling loop, and it's possible that patching
starts while secondaries are still within the stop_machine logic.

Let's fix this by adding a vaguely simple polling protocol where the
boot CPU waits for secondaries to signal that they have entered the
unpatchable stop function. We can use the arch_atomic_*() functions for
this, as these are inlined and not instrumented. These will not be
patched concurrently with this code executing.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/patching.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 771f543464e0..c0d51340c913 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -116,16 +116,17 @@ static int __kprobes aarch64_insn_patch_text_cb(void *arg)
 {
 	int i, ret = 0;
 	struct aarch64_insn_patch *pp = arg;
+	int num_cpus = num_online_cpus();
 
-	/* The first CPU becomes master */
-	if (atomic_inc_return(&pp->cpu_count) == 1) {
+	/* The last CPU becomes master */
+	if (arch_atomic_inc_return(&pp->cpu_count) == num_cpus) {
 		for (i = 0; ret == 0 && i < pp->insn_cnt; i++)
 			ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i],
 							     pp->new_insns[i]);
 		/* Notify other processors with an additional increment. */
 		atomic_inc(&pp->cpu_count);
 	} else {
-		while (atomic_read(&pp->cpu_count) <= num_online_cpus())
+		while (arch_atomic_read(&pp->cpu_count) <= num_cpus)
 			cpu_relax();
 		isb();
 	}
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] arm64: patching: unify stop_machine() patch synchronization
  2021-12-03 10:47 [PATCH 0/4] arm64: ensure CPUs are quiescent before patching Mark Rutland
  2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
  2021-12-03 10:47 ` [PATCH 2/4] arm64: insn: " Mark Rutland
@ 2021-12-03 10:47 ` Mark Rutland
  2021-12-03 10:47 ` [PATCH 4/4] arm64: patching: mask exceptions in patch_machine() Mark Rutland
  3 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: andre.przywara, ardb, catalin.marinas, james.morse, joey.gouly,
	mark.rutland, suzuki.poulose, will

Some instruction sequences cannot be safely modified while they may be
concurrently executed, and so it's necessary to temporarily stop all
CPUs while performing the modification. We have separate implementations
of this for alternatives and kprobes.

This patch unifies these with a common patch_machine() helper function
which handles the necessary synchronization to ensure that CPUs are
stopped during patching. This separates the patching logic, making it
easier to understand, and means that we only have to maintain one
synchronization algorithm.

The synchronization logic in do_patch_machine() only uses unpatchable
functions, and the function itself is marked `noinstr` to prevent
instrumentation. The patch_machine() helper is left instrumentatble as
stop_machine() is instrumentable, and therefore there is no benefit to
forbidding instrumentation.

As with the prior alternative patching sequence, the CPU to apply the
patch is chosen early so that this may be deterministic.

Since __apply_alternatives_stopped() is only ever called once under
apply_alternatives_all(), the `all_alternatives_applied` variable and
warning are redundant and therefore removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/patching.h |  4 ++
 arch/arm64/kernel/alternative.c   | 40 +++-----------
 arch/arm64/kernel/patching.c      | 91 +++++++++++++++++++++++++------
 3 files changed, 84 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/include/asm/patching.h b/arch/arm64/include/asm/patching.h
index 6bf5adc56295..25c199bc55d2 100644
--- a/arch/arm64/include/asm/patching.h
+++ b/arch/arm64/include/asm/patching.h
@@ -10,4 +10,8 @@ int aarch64_insn_write(void *addr, u32 insn);
 int aarch64_insn_patch_text_nosync(void *addr, u32 insn);
 int aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt);
 
+typedef int (*patch_machine_func_t)(void *);
+int patch_machine_cpuslocked(patch_machine_func_t func, void *arg);
+int patch_machine(patch_machine_func_t func, void *arg);
+
 #endif	/* __ASM_PATCHING_H */
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 4f32d4425aac..d2b4b9e6a0e4 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -14,8 +14,8 @@
 #include <asm/alternative.h>
 #include <asm/cpufeature.h>
 #include <asm/insn.h>
+#include <asm/patching.h>
 #include <asm/sections.h>
-#include <linux/stop_machine.h>
 
 #define __ALT_PTR(a, f)		((void *)&(a)->f + (a)->f)
 #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
@@ -189,43 +189,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
 	}
 }
 
-/*
- * Apply alternatives, ensuring that no CPUs are concurrently executing code
- * being patched.
- *
- * We might be patching the stop_machine state machine or READ_ONCE(), so
- * we implement a simple polling protocol.
- */
-static int __apply_alternatives_multi_stop(void *unused)
+static int __apply_alternatives_stopped(void *unused)
 {
-	/* Volatile, as we may be patching the guts of READ_ONCE() */
-	static volatile int all_alternatives_applied;
-	static atomic_t stopped_cpus = ATOMIC_INIT(0);
 	struct alt_region region = {
 		.begin	= (struct alt_instr *)__alt_instructions,
 		.end	= (struct alt_instr *)__alt_instructions_end,
 	};
+	DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
 
-	/* We always have a CPU 0 at this point (__init) */
-	if (smp_processor_id()) {
-		arch_atomic_inc(&stopped_cpus);
-		while (!all_alternatives_applied)
-			cpu_relax();
-		isb();
-	} else {
-		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
-
-		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
-			cpu_relax();
-
-		bitmap_complement(remaining_capabilities, boot_capabilities,
-				  ARM64_NPATCHABLE);
-
-		BUG_ON(all_alternatives_applied);
-		__apply_alternatives(&region, false, remaining_capabilities);
-		/* Barriers provided by the cache flushing */
-		all_alternatives_applied = 1;
-	}
+	bitmap_complement(remaining_capabilities, boot_capabilities,
+			  ARM64_NPATCHABLE);
+	__apply_alternatives(&region, false, remaining_capabilities);
 
 	return 0;
 }
@@ -233,7 +207,7 @@ static int __apply_alternatives_multi_stop(void *unused)
 void __init apply_alternatives_all(void)
 {
 	/* better not try code patching on a live SMP system */
-	stop_machine(__apply_alternatives_multi_stop, NULL, cpu_online_mask);
+	patch_machine(__apply_alternatives_stopped, NULL);
 }
 
 /*
diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index c0d51340c913..04497dbf14e2 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -105,31 +105,88 @@ int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
 	return ret;
 }
 
+struct patch_machine_info {
+	patch_machine_func_t func;
+	void *arg;
+	int cpu;
+	atomic_t active;
+	volatile int done;
+};
+
+/*
+ * Run a code patching function on a single CPU, ensuring that no CPUs are
+ * concurrently executing code being patched.
+ *
+ * We wait for other CPUs to become quiescent before starting patching, and
+ * wait until patching is completed before other CPUs are woken.
+ *
+ * The patching function is responsible for any barriers necessary to make new
+ * instructions visible to other CPUs. The other CPUs will issue an ISB upon
+ * being woken to ensure they use the new instructions.
+ */
+static int noinstr do_patch_machine(void *arg)
+{
+	struct patch_machine_info *pmi = arg;
+	int cpu = smp_processor_id();
+	int ret = 0;
+
+	if (pmi->cpu == cpu) {
+		while (arch_atomic_read(&pmi->active))
+			cpu_relax();
+		ret = pmi->func(pmi->arg);
+		pmi->done = 1;
+	} else {
+		arch_atomic_dec(&pmi->active);
+		while (!pmi->done)
+			cpu_relax();
+		isb();
+	}
+
+	return ret;
+}
+
+/*
+ * Run a code patching function on a single CPU, ensuring that no CPUs are
+ * concurrently executing code being patched.
+ */
+int patch_machine_cpuslocked(patch_machine_func_t func, void *arg)
+{
+	struct patch_machine_info pmi = {
+		.func = func,
+		.arg = arg,
+		.cpu = raw_smp_processor_id(),
+		.active = ATOMIC_INIT(num_online_cpus() - 1),
+		.done = 0,
+	};
+
+	return stop_machine_cpuslocked(do_patch_machine, &pmi, cpu_online_mask);
+}
+
+int patch_machine(patch_machine_func_t func, void *arg)
+{
+	int ret;
+
+	cpus_read_lock();
+	ret = patch_machine_cpuslocked(func, arg);
+	cpus_read_unlock();
+
+	return ret;
+}
+
 struct aarch64_insn_patch {
 	void		**text_addrs;
 	u32		*new_insns;
 	int		insn_cnt;
-	atomic_t	cpu_count;
 };
 
 static int __kprobes aarch64_insn_patch_text_cb(void *arg)
 {
 	int i, ret = 0;
 	struct aarch64_insn_patch *pp = arg;
-	int num_cpus = num_online_cpus();
-
-	/* The last CPU becomes master */
-	if (arch_atomic_inc_return(&pp->cpu_count) == num_cpus) {
-		for (i = 0; ret == 0 && i < pp->insn_cnt; i++)
-			ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i],
-							     pp->new_insns[i]);
-		/* Notify other processors with an additional increment. */
-		atomic_inc(&pp->cpu_count);
-	} else {
-		while (arch_atomic_read(&pp->cpu_count) <= num_cpus)
-			cpu_relax();
-		isb();
-	}
+
+	for (i = 0; ret == 0 && i < pp->insn_cnt; i++)
+		ret = aarch64_insn_patch_text_nosync(pp->text_addrs[i],
+						     pp->new_insns[i]);
 
 	return ret;
 }
@@ -140,12 +197,10 @@ int __kprobes aarch64_insn_patch_text(void *addrs[], u32 insns[], int cnt)
 		.text_addrs = addrs,
 		.new_insns = insns,
 		.insn_cnt = cnt,
-		.cpu_count = ATOMIC_INIT(0),
 	};
 
 	if (cnt <= 0)
 		return -EINVAL;
 
-	return stop_machine_cpuslocked(aarch64_insn_patch_text_cb, &patch,
-				       cpu_online_mask);
+	return patch_machine_cpuslocked(aarch64_insn_patch_text_cb, &patch);
 }
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] arm64: patching: mask exceptions in patch_machine()
  2021-12-03 10:47 [PATCH 0/4] arm64: ensure CPUs are quiescent before patching Mark Rutland
                   ` (2 preceding siblings ...)
  2021-12-03 10:47 ` [PATCH 3/4] arm64: patching: unify stop_machine() patch synchronization Mark Rutland
@ 2021-12-03 10:47 ` Mark Rutland
  3 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-03 10:47 UTC (permalink / raw)
  To: linux-arm-kernel, James Morse
  Cc: andre.przywara, ardb, catalin.marinas, joey.gouly, mark.rutland,
	suzuki.poulose, will

To ensure that CPUs remain quiescent during patching, they must not take
exceptions. Ensure this by masking DAIF during the code patch_machine()
logic.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/patching.c | 6 ++++++
 1 file changed, 6 insertions(+)

James, I think w need something similar for SDEI here, but I wasn't sure
what I should use for a save..restore sequence. Any thoughts?

Thanks,
Mark.

diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 04497dbf14e2..797bca33a13d 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -7,6 +7,7 @@
 #include <linux/uaccess.h>
 
 #include <asm/cacheflush.h>
+#include <asm/daifflags.h>
 #include <asm/fixmap.h>
 #include <asm/insn.h>
 #include <asm/kprobes.h>
@@ -127,9 +128,12 @@ struct patch_machine_info {
 static int noinstr do_patch_machine(void *arg)
 {
 	struct patch_machine_info *pmi = arg;
+	unsigned long flags;
 	int cpu = smp_processor_id();
 	int ret = 0;
 
+	flags = local_daif_save();
+
 	if (pmi->cpu == cpu) {
 		while (arch_atomic_read(&pmi->active))
 			cpu_relax();
@@ -142,6 +146,8 @@ static int noinstr do_patch_machine(void *arg)
 		isb();
 	}
 
+	local_daif_restore(flags);
+
 	return ret;
 }
 
-- 
2.30.2


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
@ 2021-12-10 14:49   ` Catalin Marinas
  2021-12-13 13:01     ` Mark Rutland
  2021-12-13 13:31   ` Will Deacon
  1 sibling, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2021-12-10 14:49 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, andre.przywara, ardb, james.morse, joey.gouly,
	suzuki.poulose, will

On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> In __apply_alternatives_multi_stop() we have a "really simple polling
> protocol" to avoid patching code that is concurrently executed on other
> CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> complete, but the boot CPU doesn't wait for secondaries to enter the
> polling loop, and it's possible that patching starts while secondaries
> are still within the stop_machine logic.
> 
> Let's fix this by adding a vaguely simple polling protocol where the
> boot CPU waits for secondaries to signal that they have entered the
> unpatchable stop function. We can use the arch_atomic_*() functions for
> this, as they are not patched with alternatives.
> 
> At the same time, let's make `all_alternatives_applied` local to
> __apply_alternatives_multi_stop(), since it is only used there, and this
> makes the code a little clearer.

Doesn't the stop_machine() mechanism wait for the CPUs to get in the
same state before calling our function or we need another stop at a
lower level in the arch code?

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-10 14:49   ` Catalin Marinas
@ 2021-12-13 13:01     ` Mark Rutland
  2021-12-13 13:27       ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2021-12-13 13:01 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arm-kernel, andre.przywara, ardb, james.morse, joey.gouly,
	suzuki.poulose, will

On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> 
> Doesn't the stop_machine() mechanism wait for the CPUs to get in the
> same state before calling our function or we need another stop at a
> lower level in the arch code?

The stop_machine() logic doesn't wait on the way in; it queues some work on
each CPU sequentially to *enter* the stop function (in this case
__apply_alternatives_multi_stop()), but there's no existing logic to ensrue
that all CPUs have entered by some point. On the way out, stop_machine() waits
for all CPUs to *exit* the stop function before returning.

We need to synchronize on the way into __apply_alternatives_multi_stop() to be
sure that no CPUs are executing bits which might be patched -- portions of
stop_machine() itself, anything necessary to dequeue the work, or anything that
might be running before the CPU spots the queued work.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-13 13:01     ` Mark Rutland
@ 2021-12-13 13:27       ` Will Deacon
  0 siblings, 0 replies; 13+ messages in thread
From: Will Deacon @ 2021-12-13 13:27 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Catalin Marinas, linux-arm-kernel, andre.przywara, ardb,
	james.morse, joey.gouly, suzuki.poulose

On Mon, Dec 13, 2021 at 01:01:25PM +0000, Mark Rutland wrote:
> On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote:
> > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > protocol" to avoid patching code that is concurrently executed on other
> > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > polling loop, and it's possible that patching starts while secondaries
> > > are still within the stop_machine logic.
> > > 
> > > Let's fix this by adding a vaguely simple polling protocol where the
> > > boot CPU waits for secondaries to signal that they have entered the
> > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > this, as they are not patched with alternatives.
> > > 
> > > At the same time, let's make `all_alternatives_applied` local to
> > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > makes the code a little clearer.
> > 
> > Doesn't the stop_machine() mechanism wait for the CPUs to get in the
> > same state before calling our function or we need another stop at a
> > lower level in the arch code?
> 
> The stop_machine() logic doesn't wait on the way in; it queues some work on
> each CPU sequentially to *enter* the stop function (in this case
> __apply_alternatives_multi_stop()), but there's no existing logic to ensrue
> that all CPUs have entered by some point. On the way out, stop_machine() waits
> for all CPUs to *exit* the stop function before returning.
> 
> We need to synchronize on the way into __apply_alternatives_multi_stop() to be
> sure that no CPUs are executing bits which might be patched -- portions of
> stop_machine() itself, anything necessary to dequeue the work, or anything that
> might be running before the CPU spots the queued work.

I remember looking at this when I added the current polling loop for arm64
and, at the time, the loop around the state machine in multi_cpu_stop() was
straightforward enough that it wasn't a problem. Since then, however, it
appears to have grown a READ_ONCE(), so it looks like we'll need to so
something.

I'll have a look at your patches.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
  2021-12-10 14:49   ` Catalin Marinas
@ 2021-12-13 13:31   ` Will Deacon
  2021-12-13 13:41     ` Will Deacon
  2021-12-13 13:49     ` Mark Rutland
  1 sibling, 2 replies; 13+ messages in thread
From: Will Deacon @ 2021-12-13 13:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, andre.przywara, ardb, catalin.marinas,
	james.morse, joey.gouly, suzuki.poulose

On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> In __apply_alternatives_multi_stop() we have a "really simple polling
> protocol" to avoid patching code that is concurrently executed on other
> CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> complete, but the boot CPU doesn't wait for secondaries to enter the
> polling loop, and it's possible that patching starts while secondaries
> are still within the stop_machine logic.
> 
> Let's fix this by adding a vaguely simple polling protocol where the
> boot CPU waits for secondaries to signal that they have entered the
> unpatchable stop function. We can use the arch_atomic_*() functions for
> this, as they are not patched with alternatives.
> 
> At the same time, let's make `all_alternatives_applied` local to
> __apply_alternatives_multi_stop(), since it is only used there, and this
> makes the code a little clearer.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index 3fb79b76e9d9..4f32d4425aac 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -21,9 +21,6 @@
>  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
>  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
>  
> -/* Volatile, as we may be patching the guts of READ_ONCE() */
> -static volatile int all_alternatives_applied;
> -
>  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
>  
>  struct alt_region {
> @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
>  }
>  
>  /*
> - * We might be patching the stop_machine state machine, so implement a
> - * really simple polling protocol here.
> + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> + * being patched.
> + *
> + * We might be patching the stop_machine state machine or READ_ONCE(), so
> + * we implement a simple polling protocol.
>   */
>  static int __apply_alternatives_multi_stop(void *unused)
>  {
> +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> +	static volatile int all_alternatives_applied;
> +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
>  	struct alt_region region = {
>  		.begin	= (struct alt_instr *)__alt_instructions,
>  		.end	= (struct alt_instr *)__alt_instructions_end,
> @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
>  
>  	/* We always have a CPU 0 at this point (__init) */
>  	if (smp_processor_id()) {
> +		arch_atomic_inc(&stopped_cpus);

Why can't we use normal atomic_inc() here?

>  		while (!all_alternatives_applied)
>  			cpu_relax();
>  		isb();
>  	} else {
>  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
>  
> +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)

and normal atomic_read() here?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-13 13:31   ` Will Deacon
@ 2021-12-13 13:41     ` Will Deacon
  2021-12-13 13:54       ` Mark Rutland
  2021-12-13 13:49     ` Mark Rutland
  1 sibling, 1 reply; 13+ messages in thread
From: Will Deacon @ 2021-12-13 13:41 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, andre.przywara, ardb, catalin.marinas,
	james.morse, joey.gouly, suzuki.poulose

On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Andre Przywara <andre.przywara@arm.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > index 3fb79b76e9d9..4f32d4425aac 100644
> > --- a/arch/arm64/kernel/alternative.c
> > +++ b/arch/arm64/kernel/alternative.c
> > @@ -21,9 +21,6 @@
> >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> >  
> > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > -static volatile int all_alternatives_applied;
> > -
> >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> >  
> >  struct alt_region {
> > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> >  }
> >  
> >  /*
> > - * We might be patching the stop_machine state machine, so implement a
> > - * really simple polling protocol here.
> > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > + * being patched.
> > + *
> > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > + * we implement a simple polling protocol.
> >   */
> >  static int __apply_alternatives_multi_stop(void *unused)
> >  {
> > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > +	static volatile int all_alternatives_applied;
> > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> >  	struct alt_region region = {
> >  		.begin	= (struct alt_instr *)__alt_instructions,
> >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> >  
> >  	/* We always have a CPU 0 at this point (__init) */
> >  	if (smp_processor_id()) {
> > +		arch_atomic_inc(&stopped_cpus);
> 
> Why can't we use normal atomic_inc() here?

Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
factor this out later on. It does, however, mean that we need to be really
careful with this because we're relying on (a) our atomics patching using
static keys and (b) static key patching not requiring stop_machine().

In particular, we cannot backport this to kernels where the atomics were
patched directly.

> 
> >  		while (!all_alternatives_applied)
> >  			cpu_relax();
> >  		isb();
> >  	} else {
> >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> >  
> > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> 
> and normal atomic_read() here?

This one I'm still thinking doesn't need the arch_ prefix.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-13 13:31   ` Will Deacon
  2021-12-13 13:41     ` Will Deacon
@ 2021-12-13 13:49     ` Mark Rutland
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2021-12-13 13:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, andre.przywara, ardb, catalin.marinas,
	james.morse, joey.gouly, suzuki.poulose

On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Andre Przywara <andre.przywara@arm.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > index 3fb79b76e9d9..4f32d4425aac 100644
> > --- a/arch/arm64/kernel/alternative.c
> > +++ b/arch/arm64/kernel/alternative.c
> > @@ -21,9 +21,6 @@
> >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> >  
> > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > -static volatile int all_alternatives_applied;
> > -
> >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> >  
> >  struct alt_region {
> > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> >  }
> >  
> >  /*
> > - * We might be patching the stop_machine state machine, so implement a
> > - * really simple polling protocol here.
> > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > + * being patched.
> > + *
> > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > + * we implement a simple polling protocol.
> >   */
> >  static int __apply_alternatives_multi_stop(void *unused)
> >  {
> > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > +	static volatile int all_alternatives_applied;
> > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> >  	struct alt_region region = {
> >  		.begin	= (struct alt_instr *)__alt_instructions,
> >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> >  
> >  	/* We always have a CPU 0 at this point (__init) */
> >  	if (smp_processor_id()) {
> > +		arch_atomic_inc(&stopped_cpus);
> 
> Why can't we use normal atomic_inc() here?

In case there's any explicit instrumentation enabled in the atomic_inc()
wrapper, since the instrumentation code may call into patchable code.

Today we'd get away with using atomic_inc(), since currently all the
instrumentation happens to be prior to the actual AMO, but generally to avoid
instrumentation we're supposed to use the arch_atomic_*() ops.

There are some other latent issues with calling into instrumentable code here,
which I plan to address in future patches, so if you want I can make this a
regular atomic_inc() for now and tackle that as a separate problem. Otherwise,
I can elaborate on the mention in the commit message to make that clearer.

> >  		while (!all_alternatives_applied)
> >  			cpu_relax();
> >  		isb();
> >  	} else {
> >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> >  
> > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> 
> and normal atomic_read() here?

Same story as above.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-13 13:41     ` Will Deacon
@ 2021-12-13 13:54       ` Mark Rutland
  2021-12-14 16:01         ` Will Deacon
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rutland @ 2021-12-13 13:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, andre.przywara, ardb, catalin.marinas,
	james.morse, joey.gouly, suzuki.poulose

On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote:
> On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > protocol" to avoid patching code that is concurrently executed on other
> > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > polling loop, and it's possible that patching starts while secondaries
> > > are still within the stop_machine logic.
> > > 
> > > Let's fix this by adding a vaguely simple polling protocol where the
> > > boot CPU waits for secondaries to signal that they have entered the
> > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > this, as they are not patched with alternatives.
> > > 
> > > At the same time, let's make `all_alternatives_applied` local to
> > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > makes the code a little clearer.
> > > 
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Andre Przywara <andre.przywara@arm.com>
> > > Cc: Ard Biesheuvel <ardb@kernel.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: James Morse <james.morse@arm.com>
> > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > > index 3fb79b76e9d9..4f32d4425aac 100644
> > > --- a/arch/arm64/kernel/alternative.c
> > > +++ b/arch/arm64/kernel/alternative.c
> > > @@ -21,9 +21,6 @@
> > >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> > >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> > >  
> > > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > -static volatile int all_alternatives_applied;
> > > -
> > >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> > >  
> > >  struct alt_region {
> > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> > >  }
> > >  
> > >  /*
> > > - * We might be patching the stop_machine state machine, so implement a
> > > - * really simple polling protocol here.
> > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > > + * being patched.
> > > + *
> > > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > > + * we implement a simple polling protocol.
> > >   */
> > >  static int __apply_alternatives_multi_stop(void *unused)
> > >  {
> > > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > +	static volatile int all_alternatives_applied;
> > > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> > >  	struct alt_region region = {
> > >  		.begin	= (struct alt_instr *)__alt_instructions,
> > >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> > >  
> > >  	/* We always have a CPU 0 at this point (__init) */
> > >  	if (smp_processor_id()) {
> > > +		arch_atomic_inc(&stopped_cpus);
> > 
> > Why can't we use normal atomic_inc() here?
> 
> Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
> factor this out later on. It does, however, mean that we need to be really
> careful with this because we're relying on (a) our atomics patching using
> static keys and (b) static key patching not requiring stop_machine().
> 
> In particular, we cannot backport this to kernels where the atomics were
> patched directly.

Another option here would be to use the __ll_sc_*() atomics directly, which at
least will break the build if backported too far?

> > >  		while (!all_alternatives_applied)
> > >  			cpu_relax();
> > >  		isb();
> > >  	} else {
> > >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> > >  
> > > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> > 
> > and normal atomic_read() here?
> 
> This one I'm still thinking doesn't need the arch_ prefix.

We could use a regular atomic_read() here, yes.

I'd used the arch_atomic_*() from for consistency with the inc().

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] arm64: alternative: wait for other CPUs before patching
  2021-12-13 13:54       ` Mark Rutland
@ 2021-12-14 16:01         ` Will Deacon
  0 siblings, 0 replies; 13+ messages in thread
From: Will Deacon @ 2021-12-14 16:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, andre.przywara, ardb, catalin.marinas,
	james.morse, joey.gouly, suzuki.poulose

On Mon, Dec 13, 2021 at 01:54:39PM +0000, Mark Rutland wrote:
> On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote:
> > On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> > > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > > protocol" to avoid patching code that is concurrently executed on other
> > > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > > polling loop, and it's possible that patching starts while secondaries
> > > > are still within the stop_machine logic.
> > > > 
> > > > Let's fix this by adding a vaguely simple polling protocol where the
> > > > boot CPU waits for secondaries to signal that they have entered the
> > > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > > this, as they are not patched with alternatives.
> > > > 
> > > > At the same time, let's make `all_alternatives_applied` local to
> > > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > > makes the code a little clearer.
> > > > 
> > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Andre Przywara <andre.przywara@arm.com>
> > > > Cc: Ard Biesheuvel <ardb@kernel.org>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: James Morse <james.morse@arm.com>
> > > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> > > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > > > index 3fb79b76e9d9..4f32d4425aac 100644
> > > > --- a/arch/arm64/kernel/alternative.c
> > > > +++ b/arch/arm64/kernel/alternative.c
> > > > @@ -21,9 +21,6 @@
> > > >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> > > >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> > > >  
> > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > > -static volatile int all_alternatives_applied;
> > > > -
> > > >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> > > >  
> > > >  struct alt_region {
> > > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> > > >  }
> > > >  
> > > >  /*
> > > > - * We might be patching the stop_machine state machine, so implement a
> > > > - * really simple polling protocol here.
> > > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > > > + * being patched.
> > > > + *
> > > > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > > > + * we implement a simple polling protocol.
> > > >   */
> > > >  static int __apply_alternatives_multi_stop(void *unused)
> > > >  {
> > > > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > > +	static volatile int all_alternatives_applied;
> > > > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> > > >  	struct alt_region region = {
> > > >  		.begin	= (struct alt_instr *)__alt_instructions,
> > > >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> > > >  
> > > >  	/* We always have a CPU 0 at this point (__init) */
> > > >  	if (smp_processor_id()) {
> > > > +		arch_atomic_inc(&stopped_cpus);
> > > 
> > > Why can't we use normal atomic_inc() here?
> > 
> > Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
> > factor this out later on. It does, however, mean that we need to be really
> > careful with this because we're relying on (a) our atomics patching using
> > static keys and (b) static key patching not requiring stop_machine().
> > 
> > In particular, we cannot backport this to kernels where the atomics were
> > patched directly.
> 
> Another option here would be to use the __ll_sc_*() atomics directly, which at
> least will break the build if backported too far?

Hopefully it's sufficient just to add the right Fixes: tag and stick the
kernel version on the CC stable line.

> > > >  		while (!all_alternatives_applied)
> > > >  			cpu_relax();
> > > >  		isb();
> > > >  	} else {
> > > >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> > > >  
> > > > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> > > 
> > > and normal atomic_read() here?
> > 
> > This one I'm still thinking doesn't need the arch_ prefix.
> 
> We could use a regular atomic_read() here, yes.
> 
> I'd used the arch_atomic_*() from for consistency with the inc().

I'd rather only use the arch_* forms where they are strictly needed, and
have a comment justifying each use.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-12-14 16:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03 10:47 [PATCH 0/4] arm64: ensure CPUs are quiescent before patching Mark Rutland
2021-12-03 10:47 ` [PATCH 1/4] arm64: alternative: wait for other CPUs " Mark Rutland
2021-12-10 14:49   ` Catalin Marinas
2021-12-13 13:01     ` Mark Rutland
2021-12-13 13:27       ` Will Deacon
2021-12-13 13:31   ` Will Deacon
2021-12-13 13:41     ` Will Deacon
2021-12-13 13:54       ` Mark Rutland
2021-12-14 16:01         ` Will Deacon
2021-12-13 13:49     ` Mark Rutland
2021-12-03 10:47 ` [PATCH 2/4] arm64: insn: " Mark Rutland
2021-12-03 10:47 ` [PATCH 3/4] arm64: patching: unify stop_machine() patch synchronization Mark Rutland
2021-12-03 10:47 ` [PATCH 4/4] arm64: patching: mask exceptions in patch_machine() Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.