linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks
@ 2018-04-05 17:17 Yury Norov
  2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:17 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

kick_all_cpus_sync() is used to broadcast IPIs to all online CPUs to force
them synchronize caches, TLB etc. It is called only 3 times - from mm/slab
arm64 and powerpc code.

We can delay synchronization work for CPUs in extended quiescent state
(idle or nohz_full userspace). 

As Paul E. McKenney wrote: 

--

Currently, IPIs are used to force other CPUs to invalidate their TLBs
in response to a kernel virtual-memory mapping change.  This works, but 
degrades both battery lifetime (for idle CPUs) and real-time response
(for nohz_full CPUs), and in addition results in unnecessary IPIs due to
the fact that CPUs executing in usermode are unaffected by stale kernel
mappings.  It would be better to cause a CPU executing in usermode to
wait until it is entering kernel mode to do the flush, first to avoid
interrupting usemode tasks and second to handle multiple flush requests
with a single flush in the case of a long-running user task.

--

v2 is big rework to address comments in v1:
 - rcu_eqs_special() declaration in public header is dropped, it is not
   used in new implementation. Though, I hope Paul will pick it in his
   tree;
 - for arm64, few isb() added to ensure kernel text synchronization
   (patches 1-4);
 - rcu_get_eqs_cpus() introduced and used to mask EQS CPUs before 
   generating broadcast IPIs;
 - RCU_DYNTICK_CTRL_MASK is not touched because memory barrier is
   implicitly issued in EQS exit path;
 - powerpc is not an exception anymore. I think it's safe to delay
   synchronization for it as well, and I didn't get comments from ppc
   community.
v1:
  https://lkml.org/lkml/2018/3/25/109

Based on next-20180405

Yury Norov (5):
  arm64: entry: isb in el1_irq
  arm64: entry: introduce restore_syscall_args macro
  arm64: ISB early at exit from extended quiescent state
  rcu: arm64: add rcu_dynticks_eqs_exit_sync()
  smp: Lazy synchronization for EQS CPUs in kick_all_cpus_sync()

 arch/arm64/kernel/Makefile  |  2 ++
 arch/arm64/kernel/entry.S   | 52 +++++++++++++++++++++++++++++++--------------
 arch/arm64/kernel/process.c |  7 ++++++
 arch/arm64/kernel/rcu.c     |  8 +++++++
 include/linux/rcutiny.h     |  2 ++
 include/linux/rcutree.h     |  1 +
 kernel/rcu/tiny.c           |  9 ++++++++
 kernel/rcu/tree.c           | 27 +++++++++++++++++++++++
 kernel/smp.c                | 21 +++++++++++-------
 9 files changed, 105 insertions(+), 24 deletions(-)
 create mode 100644 arch/arm64/kernel/rcu.c

-- 
2.14.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
@ 2018-04-05 17:17 ` Yury Norov
  2018-04-06 10:02   ` James Morse
  2018-04-06 10:57   ` Mark Rutland
  2018-04-05 17:17 ` [PATCH 2/5] arm64: entry: introduce restore_syscall_args macro Yury Norov
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:17 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

Kernel text patching framework relies on IPI to ensure that other
SMP cores observe the change. Target core calls isb() in IPI handler
path, but not at the beginning of el1_irq entry. There's a chance
that modified instruction will appear prior isb(), and so will not be
observed.

This patch inserts isb early at el1_irq entry to avoid that chance.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/entry.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ec2ee720e33e..9c06b4b80060 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -593,6 +593,7 @@ ENDPROC(el1_sync)
 
 	.align	6
 el1_irq:
+	isb					// pairs with aarch64_insn_patch_text
 	kernel_entry 1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] arm64: entry: introduce restore_syscall_args macro
  2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
  2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
@ 2018-04-05 17:17 ` Yury Norov
  2018-04-05 17:17 ` [PATCH 3/5] arm64: early ISB at exit from extended quiescent state Yury Norov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:17 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

Syscall arguments are passed in registers x0..x7. If assembler
code has to call C functions before passing control to syscall
handler, it should restore original state of that registers
after the call.

Currently, syscall arguments restoring is opencoded in el0_svc_naked
and __sys_trace. This patch introduces restore_syscall_args macro to
use it there.

Also, parameter 'syscall = 0' is removed from ct_user_exit to make
el0_svc_naked call restore_syscall_args explicitly. This is needed
because the following patch of the series adds another call to C
function in el0_svc_naked, and restoring of syscall args becomes not
only a matter of ct_user_exit.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/entry.S | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 9c06b4b80060..c8d9ec363ddd 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -37,22 +37,29 @@
 #include <asm/unistd.h>
 
 /*
- * Context tracking subsystem.  Used to instrument transitions
- * between user and kernel mode.
+ * Save/restore needed during syscalls.  Restore syscall arguments from
+ * the values already saved on stack during kernel_entry.
  */
-	.macro ct_user_exit, syscall = 0
-#ifdef CONFIG_CONTEXT_TRACKING
-	bl	context_tracking_user_exit
-	.if \syscall == 1
-	/*
-	 * Save/restore needed during syscalls.  Restore syscall arguments from
-	 * the values already saved on stack during kernel_entry.
-	 */
+	.macro restore_syscall_args
 	ldp	x0, x1, [sp]
 	ldp	x2, x3, [sp, #S_X2]
 	ldp	x4, x5, [sp, #S_X4]
 	ldp	x6, x7, [sp, #S_X6]
-	.endif
+	.endm
+
+	.macro el0_svc_restore_syscall_args
+#if defined(CONFIG_CONTEXT_TRACKING)
+	restore_syscall_args
+#endif
+	.endm
+
+/*
+ * Context tracking subsystem.  Used to instrument transitions
+ * between user and kernel mode.
+ */
+	.macro ct_user_exit
+#ifdef CONFIG_CONTEXT_TRACKING
+	bl	context_tracking_user_exit
 #endif
 	.endm
 
@@ -943,7 +950,8 @@ alternative_else_nop_endif
 el0_svc_naked:					// compat entry point
 	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
 	enable_daif
-	ct_user_exit 1
+	ct_user_exit
+	el0_svc_restore_syscall_args
 
 	tst	x16, #_TIF_SYSCALL_WORK		// check for syscall hooks
 	b.ne	__sys_trace
@@ -976,10 +984,7 @@ __sys_trace:
 	mov	x1, sp				// pointer to regs
 	cmp	wscno, wsc_nr			// check upper syscall limit
 	b.hs	__ni_sys_trace
-	ldp	x0, x1, [sp]			// restore the syscall args
-	ldp	x2, x3, [sp, #S_X2]
-	ldp	x4, x5, [sp, #S_X4]
-	ldp	x6, x7, [sp, #S_X6]
+	restore_syscall_args
 	ldr	x16, [stbl, xscno, lsl #3]	// address in the syscall table
 	blr	x16				// call sys_* routine
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] arm64: early ISB at exit from extended quiescent state
  2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
  2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
  2018-04-05 17:17 ` [PATCH 2/5] arm64: entry: introduce restore_syscall_args macro Yury Norov
@ 2018-04-05 17:17 ` Yury Norov
  2018-04-06 10:06   ` James Morse
  2018-04-06 11:07   ` Mark Rutland
  2018-04-05 17:17 ` [PATCH 4/5] rcu: arm64: add rcu_dynticks_eqs_exit_sync() Yury Norov
  2018-04-05 17:18 ` [PATCH 5/5] smp: Lazy synchronization for EQS CPUs in kick_all_cpus_sync() Yury Norov
  4 siblings, 2 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:17 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

This series enables delaying of kernel memory synchronization
for CPUs running in extended quiescent state (EQS) till the exit
of that state.

ARM64 uses IPI mechanism to notify all cores in  SMP system that
kernel text is changed; and IPI handler calls isb() to synchronize.

If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
in EQS exit path.

There are 2 such paths. One starts in do_idle() loop, and other
in el0_svc entry. For do_idle(), isb() is added in
arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
el0_svc_naked.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/entry.S   | 16 +++++++++++++++-
 arch/arm64/kernel/process.c |  7 +++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index c8d9ec363ddd..b1e1c19b4432 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -48,7 +48,7 @@
 	.endm
 
 	.macro el0_svc_restore_syscall_args
-#if defined(CONFIG_CONTEXT_TRACKING)
+#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
 	restore_syscall_args
 #endif
 	.endm
@@ -483,6 +483,19 @@ __bad_stack:
 	ASM_BUG()
 	.endm
 
+/*
+ * If CPU is in extended quiescent state we need isb to ensure that
+ * possible change of kernel text is visible by the core.
+ */
+	.macro	isb_if_eqs
+#ifndef CONFIG_TINY_RCU
+	bl	rcu_is_watching
+	cbnz	x0, 1f
+	isb 					// pairs with aarch64_insn_patch_text
+1:
+#endif
+	.endm
+
 el0_sync_invalid:
 	inv_entry 0, BAD_SYNC
 ENDPROC(el0_sync_invalid)
@@ -949,6 +962,7 @@ alternative_else_nop_endif
 
 el0_svc_naked:					// compat entry point
 	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
+	isb_if_eqs
 	enable_daif
 	ct_user_exit
 	el0_svc_restore_syscall_args
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index f08a2ed9db0d..74cad496b07b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -88,6 +88,13 @@ void arch_cpu_idle(void)
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
+void arch_cpu_idle_exit(void)
+{
+	/* Pairs with aarch64_insn_patch_text() for EQS CPUs. */
+	if (!rcu_is_watching())
+		isb();
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 void arch_cpu_idle_dead(void)
 {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] rcu: arm64: add rcu_dynticks_eqs_exit_sync()
  2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
                   ` (2 preceding siblings ...)
  2018-04-05 17:17 ` [PATCH 3/5] arm64: early ISB at exit from extended quiescent state Yury Norov
@ 2018-04-05 17:17 ` Yury Norov
  2018-04-05 17:18 ` [PATCH 5/5] smp: Lazy synchronization for EQS CPUs in kick_all_cpus_sync() Yury Norov
  4 siblings, 0 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:17 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

The following patch of the series enables delaying of kernel memory
synchronization for CPUs running in extended quiescent state (EQS)
till the exit of that state.

In previous patch ISB was added in EQS exit path to ensure that
any change made by kernel patching framework is visible. But after
that isb(), EQS is still enabled for a while, and there's a chance
that some other core will modify text in parallel, and EQS core
will be not notified about it, as EQS will mask IPI:

CPU0                            CPU1

ISB
				patch_some_text()
				kick_all_active_cpus_sync()
exit EQS

// not synchronized!
use_of_patched_text()

This patch introduces rcu_dynticks_eqs_exit_sync() function and uses
it in arm64 code to call ipi() after the exit from quiescent state.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/Makefile | 2 ++
 arch/arm64/kernel/rcu.c    | 8 ++++++++
 kernel/rcu/tree.c          | 4 ++++
 3 files changed, 14 insertions(+)
 create mode 100644 arch/arm64/kernel/rcu.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 9b55a3f24be7..c87a203524ab 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -54,6 +54,8 @@ arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
 arm64-obj-$(CONFIG_ARM_SDE_INTERFACE)	+= sdei.o
+arm64-obj-$(CONFIG_TREE_RCU)		+= rcu.o
+arm64-obj-$(CONFIG_PREEMPT_RCU)		+= rcu.o
 
 arm64-obj-$(CONFIG_KVM_INDIRECT_VECTORS)+= bpi.o
 
diff --git a/arch/arm64/kernel/rcu.c b/arch/arm64/kernel/rcu.c
new file mode 100644
index 000000000000..67fe33c0ea03
--- /dev/null
+++ b/arch/arm64/kernel/rcu.c
@@ -0,0 +1,8 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <asm/barrier.h>
+
+void rcu_dynticks_eqs_exit_sync(void)
+{
+	isb();
+};
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a734692a581..363f91776b66 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -264,6 +264,8 @@ void rcu_bh_qs(void)
 #define rcu_eqs_special_exit() do { } while (0)
 #endif
 
+void __weak rcu_dynticks_eqs_exit_sync(void) {};
+
 static DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = {
 	.dynticks_nesting = 1,
 	.dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE,
@@ -308,6 +310,8 @@ static void rcu_dynticks_eqs_exit(void)
 	 * critical section.
 	 */
 	seq = atomic_add_return(RCU_DYNTICK_CTRL_CTR, &rdtp->dynticks);
+	rcu_dynticks_eqs_exit_sync();
+
 	WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) &&
 		     !(seq & RCU_DYNTICK_CTRL_CTR));
 	if (seq & RCU_DYNTICK_CTRL_MASK) {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] smp: Lazy synchronization for EQS CPUs in kick_all_cpus_sync()
  2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
                   ` (3 preceding siblings ...)
  2018-04-05 17:17 ` [PATCH 4/5] rcu: arm64: add rcu_dynticks_eqs_exit_sync() Yury Norov
@ 2018-04-05 17:18 ` Yury Norov
  4 siblings, 0 replies; 15+ messages in thread
From: Yury Norov @ 2018-04-05 17:18 UTC (permalink / raw)
  To: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov
  Cc: Yury Norov, linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm,
	linux-kernel

kick_all_cpus_sync() forces all CPUs to sync caches by sending broadcast
IPI.  If CPU is in extended quiescent state (idle task or nohz_full
userspace), this work may be done at the exit of this state. Delaying
synchronization helps to save power if CPU is in idle state and decrease
latency for real-time tasks.

This patch introduces rcu_get_eqs_cpus() and uses it in
kick_all_cpus_sync() to delay synchronization.

For task isolation (https://lkml.org/lkml/2017/11/3/589), IPI to the CPU
running isolated task is fatal, as it breaks isolation. The approach with
lazy synchronization helps to maintain isolated state.

I've tested it with test from task isolation series on ThunderX2 for
more than 10 hours (10k giga-ticks) without breaking isolation.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 include/linux/rcutiny.h |  2 ++
 include/linux/rcutree.h |  1 +
 kernel/rcu/tiny.c       |  9 +++++++++
 kernel/rcu/tree.c       | 23 +++++++++++++++++++++++
 kernel/smp.c            | 21 +++++++++++++--------
 5 files changed, 48 insertions(+), 8 deletions(-)

diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h
index ce9beec35e34..dc7e2ea731fa 100644
--- a/include/linux/rcutiny.h
+++ b/include/linux/rcutiny.h
@@ -36,6 +36,8 @@ static inline int rcu_dynticks_snap(struct rcu_dynticks *rdtp)
 /* Never flag non-existent other CPUs! */
 static inline bool rcu_eqs_special_set(int cpu) { return false; }
 
+void rcu_get_eqs_cpus(struct cpumask *cpus, int choose_eqs);
+
 static inline unsigned long get_state_synchronize_rcu(void)
 {
 	return 0;
diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index fd996cdf1833..7a34eb8c0df3 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -74,6 +74,7 @@ static inline void synchronize_rcu_bh_expedited(void)
 void rcu_barrier(void);
 void rcu_barrier_bh(void);
 void rcu_barrier_sched(void);
+void rcu_get_eqs_cpus(struct cpumask *cpus, int choose_eqs);
 unsigned long get_state_synchronize_rcu(void);
 void cond_synchronize_rcu(unsigned long oldstate);
 unsigned long get_state_synchronize_sched(void);
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index a64eee0db39e..d4e94e1b0570 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -128,6 +128,15 @@ void rcu_check_callbacks(int user)
 		rcu_note_voluntary_context_switch(current);
 }
 
+/*
+ * For tiny RCU, all CPUs are active (non-EQS).
+ */
+void rcu_get_eqs_cpus(struct cpumask *cpus, int choose_eqs)
+{
+	if (!choose_eqs)
+		cpumask_copy(cpus, cpu_online_mask);
+}
+
 /*
  * Invoke the RCU callbacks on the specified rcu_ctrlkblk structure
  * whose grace period has elapsed.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 363f91776b66..cb0d3afe7ea8 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -419,6 +419,29 @@ bool rcu_eqs_special_set(int cpu)
 	return true;
 }
 
+/*
+ * Get EQS CPUs. If @choose_eqs is 0, set of active (non-EQS)
+ * CPUs is returned instead.
+ *
+ * Call with disabled preemption. Make sure @cpus is cleared.
+ */
+void rcu_get_eqs_cpus(struct cpumask *cpus, int choose_eqs)
+{
+	int cpu, in_eqs;
+	struct rcu_dynticks *rdtp;
+
+	for_each_online_cpu(cpu) {
+		rdtp = &per_cpu(rcu_dynticks, cpu);
+		in_eqs = rcu_dynticks_in_eqs(atomic_read(&rdtp->dynticks));
+
+		if (in_eqs && choose_eqs)
+			cpumask_set_cpu(cpu, cpus);
+
+		if (!in_eqs && !choose_eqs)
+			cpumask_set_cpu(cpu, cpus);
+	}
+}
+
 /*
  * Let the RCU core know that this CPU has gone through the scheduler,
  * which is a quiescent state.  This is called when the need for a
diff --git a/kernel/smp.c b/kernel/smp.c
index 084c8b3a2681..5e6cfb57da22 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -708,19 +708,24 @@ static void do_nothing(void *unused)
 /**
  * kick_all_cpus_sync - Force all cpus out of idle
  *
- * Used to synchronize the update of pm_idle function pointer. It's
- * called after the pointer is updated and returns after the dummy
- * callback function has been executed on all cpus. The execution of
- * the function can only happen on the remote cpus after they have
- * left the idle function which had been called via pm_idle function
- * pointer. So it's guaranteed that nothing uses the previous pointer
- * anymore.
+ * - on current CPU call smp_mb() explicitly;
+ * - on CPUs in extended quiescent state (idle or nohz_full userspace), memory
+ *   is synchronized at the exit of that mode, so do nothing (it's safe to delay
+ *   synchronization because EQS CPUs don't run kernel code);
+ * - on other CPUs fire IPI for synchronization, which implies barrier.
  */
 void kick_all_cpus_sync(void)
 {
+	struct cpumask active_cpus;
+
 	/* Make sure the change is visible before we kick the cpus */
 	smp_mb();
-	smp_call_function(do_nothing, NULL, 1);
+
+	cpumask_clear(&active_cpus);
+	preempt_disable();
+	rcu_get_eqs_cpus(&active_cpus, 0);
+	smp_call_function_many(&active_cpus, do_nothing, NULL, 1);
+	preempt_enable();
 }
 EXPORT_SYMBOL_GPL(kick_all_cpus_sync);
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
@ 2018-04-06 10:02   ` James Morse
  2018-04-06 16:54     ` Yury Norov
  2018-04-06 10:57   ` Mark Rutland
  1 sibling, 1 reply; 15+ messages in thread
From: James Morse @ 2018-04-06 10:02 UTC (permalink / raw)
  To: Yury Norov
  Cc: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

Hi Yury,

On 05/04/18 18:17, Yury Norov wrote:
> Kernel text patching framework relies on IPI to ensure that other
> SMP cores observe the change. Target core calls isb() in IPI handler

(Odd, if its just to synchronize the CPU, taking the IPI should be enough).


> path, but not at the beginning of el1_irq entry. There's a chance
> that modified instruction will appear prior isb(), and so will not be
> observed.
> 
> This patch inserts isb early at el1_irq entry to avoid that chance.


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..9c06b4b80060 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -593,6 +593,7 @@ ENDPROC(el1_sync)
>  
>  	.align	6
>  el1_irq:
> +	isb					// pairs with aarch64_insn_patch_text
>  	kernel_entry 1
>  	enable_da_f
>  #ifdef CONFIG_TRACE_IRQFLAGS
> 

An ISB at the beginning of the vectors? This is odd, taking an IRQ to get in
here would be a context-synchronization-event too, so the ISB is superfluous.

The ARM-ARM  has a list of 'Context-Synchronization event's (Glossary on page
6480 of DDI0487B.b), paraphrasing:
* ISB
* Taking an exception
* ERET
* (...loads of debug stuff...)


Thanks,

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] arm64: early ISB at exit from extended quiescent state
  2018-04-05 17:17 ` [PATCH 3/5] arm64: early ISB at exit from extended quiescent state Yury Norov
@ 2018-04-06 10:06   ` James Morse
  2018-04-06 11:07   ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: James Morse @ 2018-04-06 10:06 UTC (permalink / raw)
  To: Yury Norov
  Cc: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

Hi Yury,

On 05/04/18 18:17, Yury Norov wrote:
> This series enables delaying of kernel memory synchronization
> for CPUs running in extended quiescent state (EQS) till the exit
> of that state.
> 
> ARM64 uses IPI mechanism to notify all cores in  SMP system that
> kernel text is changed; and IPI handler calls isb() to synchronize.
> 
> If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
> in EQS exit path.
> 
> There are 2 such paths. One starts in do_idle() loop, and other
> in el0_svc entry. For do_idle(), isb() is added in
> arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
> el0_svc_naked.

(I know nothing about this EQS stuff, but) there is a third path that might be
relevant.
>From include/linux/context_tracking.h:guest_enter_irqoff():
|	 * KVM does not hold any references to rcu protected data when it
|	 * switches CPU into a guest mode. In fact switching to a guest mode
|	 * is very similar to exiting to userspace from rcu point of view. In
|	 * addition CPU may stay in a guest mode for quite a long time (up to
|	 * one time slice). Lets treat guest mode as quiescent state, just like
|	 * we do with user-mode execution.

For non-VHE systems guest_enter_irqoff()() is called just before we jump to EL2.
Coming back gives us an exception-return, so we have a context-synchronisation
event there, and I assume we will never patch the hyp-text on these systems.

But with VHE on the upcoming kernel version we still go on to run code at the
same EL. Do we need an ISB on the path back from the guest once we've told RCU
we've 'exited user-space'?
If this code can be patched, do we have a problem here?


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index c8d9ec363ddd..b1e1c19b4432 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -48,7 +48,7 @@
>  	.endm
>  
>  	.macro el0_svc_restore_syscall_args
> -#if defined(CONFIG_CONTEXT_TRACKING)
> +#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
>  	restore_syscall_args
>  #endif
>  	.endm
> @@ -483,6 +483,19 @@ __bad_stack:
>  	ASM_BUG()
>  	.endm
>  
> +/*
> + * If CPU is in extended quiescent state we need isb to ensure that
> + * possible change of kernel text is visible by the core.
> + */
> +	.macro	isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> +	bl	rcu_is_watching
> +	cbnz	x0, 1f
> +	isb 					// pairs with aarch64_insn_patch_text
> +1:
> +#endif
> +	.endm
> +
>  el0_sync_invalid:
>  	inv_entry 0, BAD_SYNC
>  ENDPROC(el0_sync_invalid)
> @@ -949,6 +962,7 @@ alternative_else_nop_endif
>  
>  el0_svc_naked:					// compat entry point
>  	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
> +	isb_if_eqs
>  	enable_daif
>  	ct_user_exit
>  	el0_svc_restore_syscall_args

Shouldn't this be at the point that RCU knows we've exited user-space? Otherwise
there is a gap where RCU thinks we're in user-space, we're not, and we're about
to tell it. Code-patching occurring in this gap would be missed.

This gap only contains 'enable_daif', and any exception that occurs here is
safe, but its going to give someone a nasty surprise...

Mark points out this ISB needs to be after RCU knows we're not quiescent:
https://lkml.org/lkml/2018/4/3/378

Can't this go in the rcu exit-quiescence code? Isn't this what your
rcu_dynticks_eqs_exit_sync() hook does?


Thanks,

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
  2018-04-06 10:02   ` James Morse
@ 2018-04-06 10:57   ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2018-04-06 10:57 UTC (permalink / raw)
  To: Yury Norov
  Cc: Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Thu, Apr 05, 2018 at 08:17:56PM +0300, Yury Norov wrote:
> Kernel text patching framework relies on IPI to ensure that other
> SMP cores observe the change. Target core calls isb() in IPI handler
> path, but not at the beginning of el1_irq entry. There's a chance
> that modified instruction will appear prior isb(), and so will not be
> observed.
> 
> This patch inserts isb early at el1_irq entry to avoid that chance.

As James pointed out, taking an exception is context synchronizing, so
this looks unnecessary.

Also, it's important to realise that the exception entry is not tied to a
specific interrupt. We might take an EL1 IRQ because of a timer interrupt,
then an IPI could be taken before we get to gic_handle_irq().

This means that we can race:

	CPU0				CPU1
	<take IRQ>
	ISB
					<patch text>
					<send IPI>
	<discover IPI pending>

... and thus the ISB is too early.

Only once we're in the interrupt handler can we pair an ISB with the IPI, and
any code executed before that is not guaranteed to be up-to-date.

Thanks,
Mark.

> 
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> ---
>  arch/arm64/kernel/entry.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..9c06b4b80060 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -593,6 +593,7 @@ ENDPROC(el1_sync)
>  
>  	.align	6
>  el1_irq:
> +	isb					// pairs with aarch64_insn_patch_text
>  	kernel_entry 1
>  	enable_da_f
>  #ifdef CONFIG_TRACE_IRQFLAGS
> -- 
> 2.14.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] arm64: early ISB at exit from extended quiescent state
  2018-04-05 17:17 ` [PATCH 3/5] arm64: early ISB at exit from extended quiescent state Yury Norov
  2018-04-06 10:06   ` James Morse
@ 2018-04-06 11:07   ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2018-04-06 11:07 UTC (permalink / raw)
  To: Yury Norov
  Cc: Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Thu, Apr 05, 2018 at 08:17:58PM +0300, Yury Norov wrote:
> This series enables delaying of kernel memory synchronization
> for CPUs running in extended quiescent state (EQS) till the exit
> of that state.
> 
> ARM64 uses IPI mechanism to notify all cores in  SMP system that
> kernel text is changed; and IPI handler calls isb() to synchronize.
> 
> If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
> in EQS exit path.
> 
> There are 2 such paths. One starts in do_idle() loop, and other
> in el0_svc entry. For do_idle(), isb() is added in
> arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
> el0_svc_naked.
> 
> Suggested-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> ---
>  arch/arm64/kernel/entry.S   | 16 +++++++++++++++-
>  arch/arm64/kernel/process.c |  7 +++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index c8d9ec363ddd..b1e1c19b4432 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -48,7 +48,7 @@
>  	.endm
>  
>  	.macro el0_svc_restore_syscall_args
> -#if defined(CONFIG_CONTEXT_TRACKING)
> +#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
>  	restore_syscall_args
>  #endif
>  	.endm
> @@ -483,6 +483,19 @@ __bad_stack:
>  	ASM_BUG()
>  	.endm
>  
> +/*
> + * If CPU is in extended quiescent state we need isb to ensure that
> + * possible change of kernel text is visible by the core.
> + */
> +	.macro	isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> +	bl	rcu_is_watching
> +	cbnz	x0, 1f
> +	isb 					// pairs with aarch64_insn_patch_text
> +1:
> +#endif
> +	.endm
> +
>  el0_sync_invalid:
>  	inv_entry 0, BAD_SYNC
>  ENDPROC(el0_sync_invalid)
> @@ -949,6 +962,7 @@ alternative_else_nop_endif
>  
>  el0_svc_naked:					// compat entry point
>  	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
> +	isb_if_eqs

As I mentioned before, this is too early.

If we only kick active CPUs, then until we exit a quiescent state, we
can race with concurrent modification, and cannot reliably ensure that
instructions are up-to-date. Practically speaking, that means that we
cannot patch any code used on the path to exit a quiescent state.

Also, if this were needed in the SVC path, it would be necessary for all
exceptions from EL0. Buggy userspace can always trigger a data abort,
even if it doesn't intend to.

>  	enable_daif
>  	ct_user_exit
>  	el0_svc_restore_syscall_args
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index f08a2ed9db0d..74cad496b07b 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -88,6 +88,13 @@ void arch_cpu_idle(void)
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>  
> +void arch_cpu_idle_exit(void)
> +{
> +	/* Pairs with aarch64_insn_patch_text() for EQS CPUs. */
> +	if (!rcu_is_watching())
> +		isb();
> +}

Likewise, this is too early as we haven't left the extended quiescent
state yet.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-06 10:02   ` James Morse
@ 2018-04-06 16:54     ` Yury Norov
  2018-04-06 17:22       ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Yury Norov @ 2018-04-06 16:54 UTC (permalink / raw)
  To: James Morse
  Cc: Paul E. McKenney, Mark Rutland, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Fri, Apr 06, 2018 at 11:02:56AM +0100, James Morse wrote:
> Hi Yury,
> 
> An ISB at the beginning of the vectors? This is odd, taking an IRQ to get in
> here would be a context-synchronization-event too, so the ISB is superfluous.
> 
> The ARM-ARM  has a list of 'Context-Synchronization event's (Glossary on page
> 6480 of DDI0487B.b), paraphrasing:
> * ISB
> * Taking an exception
> * ERET
> * (...loads of debug stuff...)

Hi James, Mark,

I completely forgot that taking an exception is the context synchronization
event. Sorry for your time on reviewing this crap. It means that patches 1,
2 and 3 are not needed except chunk that adds ISB in do_idle() path. 

Also it means that for arm64 we are safe to mask IPI delivering to CPUs that
run any userspace code, not only nohz_full.

In general, kick_all_cpus_sync() is needed to switch contexts. But exit from
userspace is anyway the switch of context. And while in userspace, we cannot
do something wrong on kernel side. For me it means that we can safely drop
IPI for all userspace modes - both normal and nohz_full. 

If it's correct, for v3 I would suggest:
 - in kick_all_cpus_sync() mask all is_idle_task() and user_mode() CPUs;
 - add isb() for arm64 in do_idle() path only - this path doesn't imply
   context switch.

What do you think?

Yury

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-06 16:54     ` Yury Norov
@ 2018-04-06 17:22       ` Mark Rutland
  2018-04-06 17:30         ` James Morse
  2018-04-06 17:50         ` Mark Rutland
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Rutland @ 2018-04-06 17:22 UTC (permalink / raw)
  To: Yury Norov
  Cc: James Morse, Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Fri, Apr 06, 2018 at 07:54:02PM +0300, Yury Norov wrote:
> In general, kick_all_cpus_sync() is needed to switch contexts. But exit from
> userspace is anyway the switch of context. And while in userspace, we cannot
> do something wrong on kernel side. For me it means that we can safely drop
> IPI for all userspace modes - both normal and nohz_full. 

This *may* be true, but only if we never have to patch text in the
windows:

* between exception entry and eqs_exit()

* between eqs_enter() and exception return

* between eqs_enter() and eqs_exit() in the idle loop.

If it's possible that we need to execute patched text in any of those
paths, we must IPI all CPUs in order to correctly serialize things.

Digging a bit, I also thing that our ct_user_exit and ct_user_enter
usage is on dodgy ground today.

For example, in el0_dbg we call do_debug_exception() *before* calling
ct_user_exit. Which I believe means we'd use RCU while supposedly in an
extended quiescent period, which would be bad.

In other paths, we unmask all DAIF bits before calling ct_user_exit, so
we could similarly take an EL1 debug exception without having exited the
extended quiescent period.

I think similar applies to SDEI; we don't negotiate with RCU prior to
invoking handlers, which might need RCU.

> If it's correct, for v3 I would suggest:
>  - in kick_all_cpus_sync() mask all is_idle_task() and user_mode() CPUs;
>  - add isb() for arm64 in do_idle() path only - this path doesn't imply
>    context switch.

As mentioned in my other reply, I don't think the ISB in do_idle()
makes sense, unless that occurs *after* we exit the extended quiescent
state.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-06 17:22       ` Mark Rutland
@ 2018-04-06 17:30         ` James Morse
  2018-04-06 17:34           ` Mark Rutland
  2018-04-06 17:50         ` Mark Rutland
  1 sibling, 1 reply; 15+ messages in thread
From: James Morse @ 2018-04-06 17:30 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Yury Norov, Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

Hi Mark,

On 06/04/18 18:22, Mark Rutland wrote:
> Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> usage is on dodgy ground today.

[...]

> I think similar applies to SDEI; we don't negotiate with RCU prior to
> invoking handlers, which might need RCU.

The arch code's __sdei_handler() calls nmi_enter() -> rcu_nmi_enter(), which
does a rcu_dynticks_eqs_exit().


Thanks,

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-06 17:30         ` James Morse
@ 2018-04-06 17:34           ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2018-04-06 17:34 UTC (permalink / raw)
  To: James Morse
  Cc: Yury Norov, Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Fri, Apr 06, 2018 at 06:30:50PM +0100, James Morse wrote:
> Hi Mark,
> 
> On 06/04/18 18:22, Mark Rutland wrote:
> > Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> > usage is on dodgy ground today.
> 
> [...]
> 
> > I think similar applies to SDEI; we don't negotiate with RCU prior to
> > invoking handlers, which might need RCU.
> 
> The arch code's __sdei_handler() calls nmi_enter() -> rcu_nmi_enter(), which
> does a rcu_dynticks_eqs_exit().

Ah, sorry. I missed that!

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] arm64: entry: isb in el1_irq
  2018-04-06 17:22       ` Mark Rutland
  2018-04-06 17:30         ` James Morse
@ 2018-04-06 17:50         ` Mark Rutland
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2018-04-06 17:50 UTC (permalink / raw)
  To: Yury Norov
  Cc: James Morse, Paul E. McKenney, Will Deacon, Chris Metcalf,
	Christopher Lameter, Russell King - ARM Linux, Steven Rostedt,
	Mathieu Desnoyers, Catalin Marinas, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Alexey Klimov,
	linux-arm-kernel, linuxppc-dev, kvm-ppc, linux-mm, linux-kernel

On Fri, Apr 06, 2018 at 06:22:11PM +0100, Mark Rutland wrote:
> Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> usage is on dodgy ground today.
> 
> For example, in el0_dbg we call do_debug_exception() *before* calling
> ct_user_exit. Which I believe means we'd use RCU while supposedly in an
> extended quiescent period, which would be bad.

It seems this is the case. I can trigger the following by having GDB
place a SW breakpoint:

[   51.217947] =============================
[   51.217953] WARNING: suspicious RCU usage
[   51.217961] 4.16.0 #4 Not tainted
[   51.217966] -----------------------------
[   51.217974] ./include/linux/rcupdate.h:632 rcu_read_lock() used illegally while idle!
[   51.217980]
[   51.217980] other info that might help us debug this:
[   51.217980]
[   51.217987]
[   51.217987] RCU used illegally from idle CPU!
[   51.217987] rcu_scheduler_active = 2, debug_locks = 1
[   51.217992] RCU used illegally from extended quiescent state!
[   51.217999] 1 lock held by ls/2412:
[   51.218004]  #0:  (rcu_read_lock){....}, at: [<0000000092efbdd5>] brk_handler+0x0/0x198
[   51.218041]
[   51.218041] stack backtrace:
[   51.218049] CPU: 2 PID: 2412 Comm: ls Not tainted 4.16.0 #4
[   51.218055] Hardware name: ARM Juno development board (r1) (DT)
[   51.218061] Call trace:
[   51.218070]  dump_backtrace+0x0/0x1c8
[   51.218078]  show_stack+0x14/0x20
[   51.218087]  dump_stack+0xac/0xe4
[   51.218096]  lockdep_rcu_suspicious+0xcc/0x110
[   51.218103]  brk_handler+0x144/0x198
[   51.218110]  do_debug_exception+0x9c/0x190
[   51.218116]  el0_dbg+0x14/0x20

We will need to fix this before we can fiddle with kick_all_cpus_sync().

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-06 17:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-05 17:17 [PATCH v2 0/2] smp: don't kick CPUs running idle or nohz_full tasks Yury Norov
2018-04-05 17:17 ` [PATCH 1/5] arm64: entry: isb in el1_irq Yury Norov
2018-04-06 10:02   ` James Morse
2018-04-06 16:54     ` Yury Norov
2018-04-06 17:22       ` Mark Rutland
2018-04-06 17:30         ` James Morse
2018-04-06 17:34           ` Mark Rutland
2018-04-06 17:50         ` Mark Rutland
2018-04-06 10:57   ` Mark Rutland
2018-04-05 17:17 ` [PATCH 2/5] arm64: entry: introduce restore_syscall_args macro Yury Norov
2018-04-05 17:17 ` [PATCH 3/5] arm64: early ISB at exit from extended quiescent state Yury Norov
2018-04-06 10:06   ` James Morse
2018-04-06 11:07   ` Mark Rutland
2018-04-05 17:17 ` [PATCH 4/5] rcu: arm64: add rcu_dynticks_eqs_exit_sync() Yury Norov
2018-04-05 17:18 ` [PATCH 5/5] smp: Lazy synchronization for EQS CPUs in kick_all_cpus_sync() Yury Norov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).