All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH tip/core/rcu 0/3] Updates to sys_membarrier for v4.14 to add registration
@ 2017-10-04 21:37 Paul E. McKenney
  2017-10-04 21:37   ` Paul E. McKenney
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg

Hello!

This post-merge-window series fixes a problem that arose when designing
requested extensions to sys_membarrier() to allow JITs to efficiently
flush old code from instruction caches.  Several potential algorithms are
much less painful if the user register intent to use this functionality
early on, for example, before the process spawns the second thread.
Registering at this time removes the need to interrupt each and every
thread in that process at the first expedited sys_membarrier() system
call.

However, if we let the current function out, then user programs might
be written without registration, which would make it more difficult to
add required registration after the fact.  Hence the late-in-merge-window
request.  (Ingo, pull request on its way as well.)

1.	Provide register expedited private command.

2.	Test private expedited cmd.

3.	Document scheduler barrier requirements.

All courtesy of Mathieu Desnoyers.

							Thanx, Paul

------------------------------------------------------------------------

 MAINTAINERS                                          |    2 
 arch/powerpc/Kconfig                                 |    1 
 arch/powerpc/include/asm/membarrier.h                |   43 +++++++
 arch/powerpc/include/asm/thread_info.h               |    3 
 arch/powerpc/kernel/Makefile                         |    2 
 arch/powerpc/kernel/membarrier.c                     |   45 +++++++
 arch/powerpc/mm/mmu_context.c                        |    7 +
 arch/x86/mm/tlb.c                                    |    5 
 fs/exec.c                                            |    1 
 include/linux/mm_types.h                             |    3 
 include/linux/sched/mm.h                             |   55 +++++++++
 include/uapi/linux/membarrier.h                      |   23 ++--
 init/Kconfig                                         |    3 
 kernel/fork.c                                        |    2 
 kernel/sched/core.c                                  |   47 ++++----
 kernel/sched/membarrier.c                            |   25 +++-
 tools/testing/selftests/membarrier/membarrier_test.c |  109 ++++++++++++++++---
 17 files changed, 329 insertions(+), 47 deletions(-)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-04 21:37 [PATCH tip/core/rcu 0/3] Updates to sys_membarrier for v4.14 to add registration Paul E. McKenney
@ 2017-10-04 21:37   ` Paul E. McKenney
  2017-10-04 21:37   ` Paul E. McKenney
  2017-10-04 21:37 ` [PATCH tip/core/rcu 3/3] membarrier: Document scheduler barrier requirements Paul E. McKenney
  2 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Provide a new command allowing processes to register their intent to use
the private expedited command.

This allows PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 MAINTAINERS                            |  2 ++
 arch/powerpc/Kconfig                   |  1 +
 arch/powerpc/include/asm/membarrier.h  | 43 +++++++++++++++++++++++++++++
 arch/powerpc/include/asm/thread_info.h |  3 ++
 arch/powerpc/kernel/Makefile           |  2 ++
 arch/powerpc/kernel/membarrier.c       | 45 ++++++++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c          |  7 +++++
 fs/exec.c                              |  1 +
 include/linux/mm_types.h               |  3 ++
 include/linux/sched/mm.h               | 50 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/membarrier.h        | 23 +++++++++++-----
 init/Kconfig                           |  3 ++
 kernel/fork.c                          |  2 ++
 kernel/sched/core.c                    | 10 -------
 kernel/sched/membarrier.c              | 25 ++++++++++++++---
 15 files changed, 199 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h
 create mode 100644 arch/powerpc/kernel/membarrier.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..c5296d7f447b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8822,6 +8822,8 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/kernel/membarrier.c
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 809c468edab1..6f44c5f74f71 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,6 +138,7 @@ config PPC
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..61152a7a3cf9
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,43 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!test_ti_thread_flag(task_thread_info(tsk),
+			TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+static inline void membarrier_arch_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+	/*
+	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
+	 * fork is protected by siglock. membarrier_arch_fork is called
+	 * with siglock held.
+	 */
+	if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
+		set_ti_thread_flag(task_thread_info(t),
+				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+}
+static inline void membarrier_arch_execve(struct task_struct *t)
+{
+	clear_ti_thread_flag(task_thread_info(t),
+			TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+}
+void membarrier_arch_register_private_expedited(struct task_struct *t);
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a941cc6fc3e9..2a208487724b 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
 #if defined(CONFIG_PPC64)
 #define TIF_ELF2ABI		18	/* function descriptors must die! */
 #endif
+#define TIF_MEMBARRIER_PRIVATE_EXPEDITED	19	/* membarrier */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -119,6 +120,8 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
 #define _TIF_NOHZ		(1<<TIF_NOHZ)
+#define _TIF_MEMBARRIER_PRIVATE_EXPEDITED \
+				(1<<TIF_MEMBARRIER_PRIVATE_EXPEDITED)
 #define _TIF_SYSCALL_DOTRACE	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
 				 _TIF_NOHZ)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 91960f83039c..2dd4b9e3313a 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -135,6 +135,8 @@ endif
 obj-$(CONFIG_EPAPR_PARAVIRT)	+= epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvm_emul.o
 
+obj-$(CONFIG_MEMBARRIER)	+= membarrier.o
+
 # Disable GCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
new file mode 100644
index 000000000000..b0d79a5f5981
--- /dev/null
+++ b/arch/powerpc/kernel/membarrier.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2010-2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * membarrier system call - PowerPC architecture code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/thread_info.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+
+void membarrier_arch_register_private_expedited(struct task_struct *p)
+{
+	struct task_struct *t;
+
+	if (get_nr_threads(p) == 1) {
+		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+		return;
+	}
+	/*
+	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
+	 * fork is protected by siglock.
+	 */
+	spin_lock(&p->sighand->siglock);
+	for_each_thread(p, t)
+		set_ti_thread_flag(task_thread_info(t),
+				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+	spin_unlock(&p->sighand->siglock);
+	/*
+	 * Ensure all future scheduler executions will observe the new
+	 * thread flag state for this process.
+	 */
+	synchronize_sched();
+}
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index 0f613bc63c50..22f5c91cdc38 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -67,6 +68,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -89,6 +94,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/fs/exec.c b/fs/exec.c
index ac34d9724684..b2448f2731b3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1802,6 +1802,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
+	membarrier_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 46f4ecf5479a..5e0fe8ce053b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -445,6 +445,9 @@ struct mm_struct {
 	unsigned long flags; /* Must use atomic bitops to access the bits */
 
 	struct core_state *core_state; /* coredumping support */
+#ifdef CONFIG_MEMBARRIER
+	int membarrier_private_expedited;
+#endif
 #ifdef CONFIG_AIO
 	spinlock_t			ioctx_lock;
 	struct kioctx_table __rcu	*ioctx_table;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3a19c253bdb1..4af1b719c65f 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
 }
 
+#ifdef CONFIG_MEMBARRIER
+
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#else
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+static inline void membarrier_arch_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+}
+static inline void membarrier_arch_execve(struct task_struct *t)
+{
+}
+static inline void membarrier_arch_register_private_expedited(
+		struct task_struct *p)
+{
+}
+#endif
+
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+	/*
+	 * Prior copy_mm() copies the membarrier_private_expedited field
+	 * from current->mm to t->mm.
+	 */
+	membarrier_arch_fork(t, clone_flags);
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+	t->mm->membarrier_private_expedited = 0;
+	membarrier_arch_execve(t);
+}
+#else
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+}
+#endif
+
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 6d47b3249d8a..4e01ad7ffe98 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -52,21 +52,30 @@
  *                          (non-running threads are de facto in such a
  *                          state). This only covers threads from the
  *                          same processes as the caller thread. This
- *                          command returns 0. The "expedited" commands
- *                          complete faster than the non-expedited ones,
- *                          they never block, but have the downside of
- *                          causing extra overhead.
+ *                          command returns 0 on success. The
+ *                          "expedited" commands complete faster than
+ *                          the non-expedited ones, they never block,
+ *                          but have the downside of causing extra
+ *                          overhead. A process needs to register its
+ *                          intent to use the private expedited command
+ *                          prior to using it, otherwise this command
+ *                          returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
+ *                          Register the process intent to use
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
+ *                          returns 0.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY			= 0,
-	MEMBARRIER_CMD_SHARED			= (1 << 0),
+	MEMBARRIER_CMD_QUERY				= 0,
+	MEMBARRIER_CMD_SHARED				= (1 << 0),
 	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
 	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..a3dc6a66f0d1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,9 @@ config MEMBARRIER
 
 	  If unsure, say Y.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/fork.c b/kernel/fork.c
index 10646182440f..bd4a93915e08 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	copy_seccomp(p);
 
+	membarrier_fork(p, clone_flags);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d17c5da523a0..f29525e453c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2644,16 +2644,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a92fddc22747..f06949a279ca 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -26,21 +26,25 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
 {
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
-static void membarrier_private_expedited(void)
+static int membarrier_private_expedited(void)
 {
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
 
+	if (!READ_ONCE(current->mm->membarrier_private_expedited))
+		return -EPERM;
+
 	if (num_online_cpus() == 1)
-		return;
+		return 0;
 
 	/*
 	 * Matches memory barriers around rq->curr modification in
@@ -94,6 +98,17 @@ static void membarrier_private_expedited(void)
 	 * rq->curr modification in scheduler.
 	 */
 	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
+static void membarrier_register_private_expedited(void)
+{
+	struct task_struct *p = current;
+
+	if (READ_ONCE(p->mm->membarrier_private_expedited))
+		return;
+	membarrier_arch_register_private_expedited(p);
+	WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
 }
 
 /**
@@ -144,7 +159,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 			synchronize_sched();
 		return 0;
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		membarrier_private_expedited();
+		return membarrier_private_expedited();
+	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
+		membarrier_register_private_expedited();
 		return 0;
 	default:
 		return -EINVAL;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-04 21:37   ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Provide a new command allowing processes to register their intent to use
the private expedited command.

This allows PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Processes are now required to register before using
MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 MAINTAINERS                            |  2 ++
 arch/powerpc/Kconfig                   |  1 +
 arch/powerpc/include/asm/membarrier.h  | 43 +++++++++++++++++++++++++++++
 arch/powerpc/include/asm/thread_info.h |  3 ++
 arch/powerpc/kernel/Makefile           |  2 ++
 arch/powerpc/kernel/membarrier.c       | 45 ++++++++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c          |  7 +++++
 fs/exec.c                              |  1 +
 include/linux/mm_types.h               |  3 ++
 include/linux/sched/mm.h               | 50 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/membarrier.h        | 23 +++++++++++-----
 init/Kconfig                           |  3 ++
 kernel/fork.c                          |  2 ++
 kernel/sched/core.c                    | 10 -------
 kernel/sched/membarrier.c              | 25 ++++++++++++++---
 15 files changed, 199 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h
 create mode 100644 arch/powerpc/kernel/membarrier.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 65b0c88d5ee0..c5296d7f447b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8822,6 +8822,8 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/kernel/membarrier.c
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 809c468edab1..6f44c5f74f71 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -138,6 +138,7 @@ config PPC
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..61152a7a3cf9
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,43 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!test_ti_thread_flag(task_thread_info(tsk),
+			TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+static inline void membarrier_arch_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+	/*
+	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
+	 * fork is protected by siglock. membarrier_arch_fork is called
+	 * with siglock held.
+	 */
+	if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
+		set_ti_thread_flag(task_thread_info(t),
+				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+}
+static inline void membarrier_arch_execve(struct task_struct *t)
+{
+	clear_ti_thread_flag(task_thread_info(t),
+			TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+}
+void membarrier_arch_register_private_expedited(struct task_struct *t);
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
index a941cc6fc3e9..2a208487724b 100644
--- a/arch/powerpc/include/asm/thread_info.h
+++ b/arch/powerpc/include/asm/thread_info.h
@@ -100,6 +100,7 @@ static inline struct thread_info *current_thread_info(void)
 #if defined(CONFIG_PPC64)
 #define TIF_ELF2ABI		18	/* function descriptors must die! */
 #endif
+#define TIF_MEMBARRIER_PRIVATE_EXPEDITED	19	/* membarrier */
 
 /* as above, but as bit values */
 #define _TIF_SYSCALL_TRACE	(1<<TIF_SYSCALL_TRACE)
@@ -119,6 +120,8 @@ static inline struct thread_info *current_thread_info(void)
 #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
 #define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
 #define _TIF_NOHZ		(1<<TIF_NOHZ)
+#define _TIF_MEMBARRIER_PRIVATE_EXPEDITED \
+				(1<<TIF_MEMBARRIER_PRIVATE_EXPEDITED)
 #define _TIF_SYSCALL_DOTRACE	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
 				 _TIF_NOHZ)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 91960f83039c..2dd4b9e3313a 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -135,6 +135,8 @@ endif
 obj-$(CONFIG_EPAPR_PARAVIRT)	+= epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)		+= kvm.o kvm_emul.o
 
+obj-$(CONFIG_MEMBARRIER)	+= membarrier.o
+
 # Disable GCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
 UBSAN_SANITIZE_prom_init.o := n
diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
new file mode 100644
index 000000000000..b0d79a5f5981
--- /dev/null
+++ b/arch/powerpc/kernel/membarrier.c
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2010-2017 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+ *
+ * membarrier system call - PowerPC architecture code
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/sched/mm.h>
+#include <linux/sched/signal.h>
+#include <linux/thread_info.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+
+void membarrier_arch_register_private_expedited(struct task_struct *p)
+{
+	struct task_struct *t;
+
+	if (get_nr_threads(p) == 1) {
+		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+		return;
+	}
+	/*
+	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
+	 * fork is protected by siglock.
+	 */
+	spin_lock(&p->sighand->siglock);
+	for_each_thread(p, t)
+		set_ti_thread_flag(task_thread_info(t),
+				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+	spin_unlock(&p->sighand->siglock);
+	/*
+	 * Ensure all future scheduler executions will observe the new
+	 * thread flag state for this process.
+	 */
+	synchronize_sched();
+}
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index 0f613bc63c50..22f5c91cdc38 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -67,6 +68,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -89,6 +94,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/fs/exec.c b/fs/exec.c
index ac34d9724684..b2448f2731b3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1802,6 +1802,7 @@ static int do_execveat_common(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
+	membarrier_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current);
 	free_bprm(bprm);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 46f4ecf5479a..5e0fe8ce053b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -445,6 +445,9 @@ struct mm_struct {
 	unsigned long flags; /* Must use atomic bitops to access the bits */
 
 	struct core_state *core_state; /* coredumping support */
+#ifdef CONFIG_MEMBARRIER
+	int membarrier_private_expedited;
+#endif
 #ifdef CONFIG_AIO
 	spinlock_t			ioctx_lock;
 	struct kioctx_table __rcu	*ioctx_table;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3a19c253bdb1..4af1b719c65f 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
 }
 
+#ifdef CONFIG_MEMBARRIER
+
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#else
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+static inline void membarrier_arch_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+}
+static inline void membarrier_arch_execve(struct task_struct *t)
+{
+}
+static inline void membarrier_arch_register_private_expedited(
+		struct task_struct *p)
+{
+}
+#endif
+
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+	/*
+	 * Prior copy_mm() copies the membarrier_private_expedited field
+	 * from current->mm to t->mm.
+	 */
+	membarrier_arch_fork(t, clone_flags);
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+	t->mm->membarrier_private_expedited = 0;
+	membarrier_arch_execve(t);
+}
+#else
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+		struct mm_struct *next, struct task_struct *tsk)
+{
+}
+static inline void membarrier_fork(struct task_struct *t,
+		unsigned long clone_flags)
+{
+}
+static inline void membarrier_execve(struct task_struct *t)
+{
+}
+#endif
+
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 6d47b3249d8a..4e01ad7ffe98 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -52,21 +52,30 @@
  *                          (non-running threads are de facto in such a
  *                          state). This only covers threads from the
  *                          same processes as the caller thread. This
- *                          command returns 0. The "expedited" commands
- *                          complete faster than the non-expedited ones,
- *                          they never block, but have the downside of
- *                          causing extra overhead.
+ *                          command returns 0 on success. The
+ *                          "expedited" commands complete faster than
+ *                          the non-expedited ones, they never block,
+ *                          but have the downside of causing extra
+ *                          overhead. A process needs to register its
+ *                          intent to use the private expedited command
+ *                          prior to using it, otherwise this command
+ *                          returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
+ *                          Register the process intent to use
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
+ *                          returns 0.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY			= 0,
-	MEMBARRIER_CMD_SHARED			= (1 << 0),
+	MEMBARRIER_CMD_QUERY				= 0,
+	MEMBARRIER_CMD_SHARED				= (1 << 0),
 	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
 	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED	= (1 << 3),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/init/Kconfig b/init/Kconfig
index 78cb2461012e..a3dc6a66f0d1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1400,6 +1400,9 @@ config MEMBARRIER
 
 	  If unsure, say Y.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/fork.c b/kernel/fork.c
index 10646182440f..bd4a93915e08 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1840,6 +1840,8 @@ static __latent_entropy struct task_struct *copy_process(
 	 */
 	copy_seccomp(p);
 
+	membarrier_fork(p, clone_flags);
+
 	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d17c5da523a0..f29525e453c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2644,16 +2644,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a92fddc22747..f06949a279ca 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -26,21 +26,25 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED)
+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
 {
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
-static void membarrier_private_expedited(void)
+static int membarrier_private_expedited(void)
 {
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
 
+	if (!READ_ONCE(current->mm->membarrier_private_expedited))
+		return -EPERM;
+
 	if (num_online_cpus() == 1)
-		return;
+		return 0;
 
 	/*
 	 * Matches memory barriers around rq->curr modification in
@@ -94,6 +98,17 @@ static void membarrier_private_expedited(void)
 	 * rq->curr modification in scheduler.
 	 */
 	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
+static void membarrier_register_private_expedited(void)
+{
+	struct task_struct *p = current;
+
+	if (READ_ONCE(p->mm->membarrier_private_expedited))
+		return;
+	membarrier_arch_register_private_expedited(p);
+	WRITE_ONCE(p->mm->membarrier_private_expedited, 1);
 }
 
 /**
@@ -144,7 +159,9 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 			synchronize_sched();
 		return 0;
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		membarrier_private_expedited();
+		return membarrier_private_expedited();
+	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
+		membarrier_register_private_expedited();
 		return 0;
 	default:
 		return -EINVAL;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH tip/core/rcu 2/3] membarrier: selftest: Test private expedited cmd
  2017-10-04 21:37 [PATCH tip/core/rcu 0/3] Updates to sys_membarrier for v4.14 to add registration Paul E. McKenney
@ 2017-10-04 21:37   ` Paul E. McKenney
  2017-10-04 21:37   ` Paul E. McKenney
  2017-10-04 21:37 ` [PATCH tip/core/rcu 3/3] membarrier: Document scheduler barrier requirements Paul E. McKenney
  2 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Alice Ferrazzi, Paul Elder, linux-kselftest,
	linux-arch

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 .../testing/selftests/membarrier/membarrier_test.c | 109 ++++++++++++++++++---
 1 file changed, 94 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 21399fcf1a59..f85657374b59 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -15,49 +15,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -71,7 +141,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH tip/core/rcu 2/3] membarrier: selftest: Test private expedited cmd
@ 2017-10-04 21:37   ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 .../testing/selftests/membarrier/membarrier_test.c | 109 ++++++++++++++++++---
 1 file changed, 94 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 21399fcf1a59..f85657374b59 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -15,49 +15,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -71,7 +141,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH tip/core/rcu 3/3] membarrier: Document scheduler barrier requirements
  2017-10-04 21:37 [PATCH tip/core/rcu 0/3] Updates to sys_membarrier for v4.14 to add registration Paul E. McKenney
  2017-10-04 21:37   ` Paul E. McKenney
  2017-10-04 21:37   ` Paul E. McKenney
@ 2017-10-04 21:37 ` Paul E. McKenney
  2 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2017-10-04 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jiangshanlai, dipankar, akpm, mathieu.desnoyers, josh,
	tglx, peterz, rostedt, dhowells, edumazet, fweisbec, oleg,
	Paul E. McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	gromer, Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Ingo Molnar, H. Peter Anvin,
	Andrea Parri, x86

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Changes since v1:
- Update comments to match reality for code paths which are after
  storing to rq->curr, before returning to user-space, based on feedback
  from Andrea Parri.
Changes since v2:
- Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
  Based on feedback from Andrea Parri.
Changes since v3:
- Clarify comments following feeback from Peter Zijlstra.
Changes since v4:
- Update comment regarding powerpc barrier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: gromer@google.com
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/x86/mm/tlb.c        |  5 +++++
 include/linux/sched/mm.h |  5 +++++
 kernel/sched/core.c      | 37 ++++++++++++++++++++++++++-----------
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 49d9778376d7..38f6cbd8cd80 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -143,6 +143,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	}
 #endif
 
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * before returning to user-space, after storing to rq->curr.
+	 * Writing to CR3 provides that full memory barrier.
+	 */
 	if (real_prev == next) {
 		VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			  next->context.ctx_id);
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4af1b719c65f..d5a9ab8f3836 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -38,6 +38,11 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+	/*
+	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * required by the membarrier system call before returning to
+	 * user-space, after storing to rq->curr.
+	 */
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f29525e453c4..8a176892b4f0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2648,6 +2648,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+	/*
+	 * When transitioning from a kernel thread to a userspace
+	 * thread, mmdrop()'s implicit full barrier is required by the
+	 * membarrier system call, because the current active_mm can
+	 * become the current mm without going through switch_mm().
+	 */
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
@@ -2753,6 +2759,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 */
 	arch_start_context_switch(prev);
 
+	/*
+	 * If mm is non-NULL, we pass through switch_mm(). If mm is
+	 * NULL, we will pass through mmdrop() in finish_task_switch().
+	 * Both of these contain the full memory barrier required by
+	 * membarrier after storing to rq->curr, before returning to
+	 * user-space.
+	 */
 	if (!mm) {
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
@@ -3289,6 +3302,9 @@ static void __sched notrace __schedule(bool preempt)
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
+	 *
+	 * The membarrier system call requires a full memory barrier
+	 * after coming from user-space, before storing to rq->curr.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -3336,17 +3352,16 @@ static void __sched notrace __schedule(bool preempt)
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
-		 * rq->curr, before returning to user-space. For TSO
-		 * (e.g. x86), the architecture must provide its own
-		 * barrier in switch_mm(). For weakly ordered machines
-		 * for which spin_unlock() acts as a full memory
-		 * barrier, finish_lock_switch() in common code takes
-		 * care of this barrier. For weakly ordered machines for
-		 * which spin_unlock() acts as a RELEASE barrier (only
-		 * arm64 and PowerPC), arm64 has a full barrier in
-		 * switch_to(), and PowerPC has
-		 * smp_mb__after_unlock_lock() before
-		 * finish_lock_switch().
+		 * rq->curr, before returning to user-space.
+		 *
+		 * Here are the schemes providing that barrier on the
+		 * various architectures:
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - finish_lock_switch() for weakly-ordered
+		 *   architectures where spin_unlock is a full barrier,
+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
+		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
 
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-04 21:37   ` Paul E. McKenney
@ 2017-10-05  4:23     ` Nicholas Piggin
  -1 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2017-10-05  4:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Alan Stern,
	Will Deacon, Andy Lutomirski, Ingo Molnar, Alexander Viro,
	linuxppc-dev, linux-arch

On Wed,  4 Oct 2017 14:37:53 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> Provide a new command allowing processes to register their intent to use
> the private expedited command.
> 
> This allows PowerPC to skip the full memory barrier in switch_mm(), and
> only issue the barrier when scheduling into a task belonging to a
> process that has registered to use expedited private.
> 
> Processes are now required to register before using
> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> 
> Changes since v1:
> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>   check the next thread state.
> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
> - Use task_thread_info() to pass thread_info from task to
>   *_ti_thread_flag().
> 
> Changes since v2:
> - Move membarrier_arch_sched_in() call to finish_task_switch().
> - Check for NULL t->mm in membarrier_arch_fork().
> - Use membarrier_sched_in() in generic code, which invokes the
>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>   build on PowerPC.
> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>   allnoconfig build on PowerPC.
> - Build and runtime tested on PowerPC.
> 
> Changes since v3:
> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>   field on fork.
> - powerpc: test thread flag instead of reading
>   membarrier_private_expedited in membarrier_arch_fork().
> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>   from kernel thread, since mmdrop() implies a full barrier.
> - Set membarrier_private_expedited to 1 only after arch registration
>   code, thus eliminating a race where concurrent commands could succeed
>   when they should fail if issued concurrently with process
>   registration.
> - Use READ_ONCE() for membarrier_private_expedited field access in
>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>   process registration.
> 
> Changes since v4:
> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>   from Nicholas Piggin.

For now, the powerpc approach is okay by me. I plan to test
others (e.g., taking runqueue locks) on larger systems, but that can
be sent as an incremental patch at a later time.

The main thing I would like is for people to review the userspace API.


> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 3a19c253bdb1..4af1b719c65f 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>  }
>  
> +#ifdef CONFIG_MEMBARRIER
> +
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
> +#include <asm/membarrier.h>
> +#else
> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> +		struct mm_struct *next, struct task_struct *tsk)
> +{
> +}

This is no longer required in architecture independent code, is it?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05  4:23     ` Nicholas Piggin
  0 siblings, 0 replies; 37+ messages in thread
From: Nicholas Piggin @ 2017-10-05  4:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Alan Stern,
	Will Deacon, Andy Lutomirski

On Wed,  4 Oct 2017 14:37:53 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> 
> Provide a new command allowing processes to register their intent to use
> the private expedited command.
> 
> This allows PowerPC to skip the full memory barrier in switch_mm(), and
> only issue the barrier when scheduling into a task belonging to a
> process that has registered to use expedited private.
> 
> Processes are now required to register before using
> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> 
> Changes since v1:
> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>   check the next thread state.
> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
> - Use task_thread_info() to pass thread_info from task to
>   *_ti_thread_flag().
> 
> Changes since v2:
> - Move membarrier_arch_sched_in() call to finish_task_switch().
> - Check for NULL t->mm in membarrier_arch_fork().
> - Use membarrier_sched_in() in generic code, which invokes the
>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>   build on PowerPC.
> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>   allnoconfig build on PowerPC.
> - Build and runtime tested on PowerPC.
> 
> Changes since v3:
> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>   field on fork.
> - powerpc: test thread flag instead of reading
>   membarrier_private_expedited in membarrier_arch_fork().
> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>   from kernel thread, since mmdrop() implies a full barrier.
> - Set membarrier_private_expedited to 1 only after arch registration
>   code, thus eliminating a race where concurrent commands could succeed
>   when they should fail if issued concurrently with process
>   registration.
> - Use READ_ONCE() for membarrier_private_expedited field access in
>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>   process registration.
> 
> Changes since v4:
> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>   from Nicholas Piggin.

For now, the powerpc approach is okay by me. I plan to test
others (e.g., taking runqueue locks) on larger systems, but that can
be sent as an incremental patch at a later time.

The main thing I would like is for people to review the userspace API.


> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 3a19c253bdb1..4af1b719c65f 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>  }
>  
> +#ifdef CONFIG_MEMBARRIER
> +
> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
> +#include <asm/membarrier.h>
> +#else
> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
> +		struct mm_struct *next, struct task_struct *tsk)
> +{
> +}

This is no longer required in architecture independent code, is it?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-04 21:37   ` Paul E. McKenney
@ 2017-10-05 12:12     ` Peter Zijlstra
  -1 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 12:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	fweisbec, oleg, Boqun Feng, Andrew Hunter, Maged Michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> new file mode 100644
> index 000000000000..b0d79a5f5981
> --- /dev/null
> +++ b/arch/powerpc/kernel/membarrier.c
> @@ -0,0 +1,45 @@

> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> +{
> +	struct task_struct *t;
> +
> +	if (get_nr_threads(p) == 1) {
> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> +		return;
> +	}
> +	/*
> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> +	 * fork is protected by siglock.
> +	 */
> +	spin_lock(&p->sighand->siglock);
> +	for_each_thread(p, t)
> +		set_ti_thread_flag(task_thread_info(t),
> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);

I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

> +	spin_unlock(&p->sighand->siglock);
> +	/*
> +	 * Ensure all future scheduler executions will observe the new
> +	 * thread flag state for this process.
> +	 */
> +	synchronize_sched();

This relies on the flag being read inside rq->lock, right?

> +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 12:12     ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 12:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	fweisbec, oleg, Boqun Feng, Andrew Hunter, Maged Michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexan

On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> new file mode 100644
> index 000000000000..b0d79a5f5981
> --- /dev/null
> +++ b/arch/powerpc/kernel/membarrier.c
> @@ -0,0 +1,45 @@

> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> +{
> +	struct task_struct *t;
> +
> +	if (get_nr_threads(p) == 1) {
> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> +		return;
> +	}
> +	/*
> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> +	 * fork is protected by siglock.
> +	 */
> +	spin_lock(&p->sighand->siglock);
> +	for_each_thread(p, t)
> +		set_ti_thread_flag(task_thread_info(t),
> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);

I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

> +	spin_unlock(&p->sighand->siglock);
> +	/*
> +	 * Ensure all future scheduler executions will observe the new
> +	 * thread flag state for this process.
> +	 */
> +	synchronize_sched();

This relies on the flag being read inside rq->lock, right?

> +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05  4:23     ` Nicholas Piggin
@ 2017-10-05 12:22       ` Avi Kivity
  -1 siblings, 0 replies; 37+ messages in thread
From: Avi Kivity @ 2017-10-05 12:22 UTC (permalink / raw)
  To: Nicholas Piggin, Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, linuxppc-dev,
	linux-arch



On 10/05/2017 07:23 AM, Nicholas Piggin wrote:
> On Wed,  4 Oct 2017 14:37:53 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>
>> Provide a new command allowing processes to register their intent to use
>> the private expedited command.
>>
>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>> only issue the barrier when scheduling into a task belonging to a
>> process that has registered to use expedited private.
>>
>> Processes are now required to register before using
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>>
>> Changes since v1:
>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>    powerpc membarrier_arch_sched_in(), given that we want to specifically
>>    check the next thread state.
>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>> - Use task_thread_info() to pass thread_info from task to
>>    *_ti_thread_flag().
>>
>> Changes since v2:
>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>> - Check for NULL t->mm in membarrier_arch_fork().
>> - Use membarrier_sched_in() in generic code, which invokes the
>>    arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>    build on PowerPC.
>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>    allnoconfig build on PowerPC.
>> - Build and runtime tested on PowerPC.
>>
>> Changes since v3:
>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>    field on fork.
>> - powerpc: test thread flag instead of reading
>>    membarrier_private_expedited in membarrier_arch_fork().
>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>    from kernel thread, since mmdrop() implies a full barrier.
>> - Set membarrier_private_expedited to 1 only after arch registration
>>    code, thus eliminating a race where concurrent commands could succeed
>>    when they should fail if issued concurrently with process
>>    registration.
>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>    membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>    process registration.
>>
>> Changes since v4:
>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>    from Nicholas Piggin.
> For now, the powerpc approach is okay by me. I plan to test
> others (e.g., taking runqueue locks) on larger systems, but that can
> be sent as an incremental patch at a later time.
>
> The main thing I would like is for people to review the userspace API.
>

As a future satisfied user of the expedited private membarrier syscall, 
I am happy with the change.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 12:22       ` Avi Kivity
  0 siblings, 0 replies; 37+ messages in thread
From: Avi Kivity @ 2017-10-05 12:22 UTC (permalink / raw)
  To: Nicholas Piggin, Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, peterz, rostedt, dhowells,
	edumazet, fweisbec, oleg, Boqun Feng, Andrew Hunter,
	Maged Michael, gromer, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander



On 10/05/2017 07:23 AM, Nicholas Piggin wrote:
> On Wed,  4 Oct 2017 14:37:53 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>
>> Provide a new command allowing processes to register their intent to use
>> the private expedited command.
>>
>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>> only issue the barrier when scheduling into a task belonging to a
>> process that has registered to use expedited private.
>>
>> Processes are now required to register before using
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>>
>> Changes since v1:
>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>    powerpc membarrier_arch_sched_in(), given that we want to specifically
>>    check the next thread state.
>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>> - Use task_thread_info() to pass thread_info from task to
>>    *_ti_thread_flag().
>>
>> Changes since v2:
>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>> - Check for NULL t->mm in membarrier_arch_fork().
>> - Use membarrier_sched_in() in generic code, which invokes the
>>    arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>    build on PowerPC.
>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>    allnoconfig build on PowerPC.
>> - Build and runtime tested on PowerPC.
>>
>> Changes since v3:
>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>    field on fork.
>> - powerpc: test thread flag instead of reading
>>    membarrier_private_expedited in membarrier_arch_fork().
>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>    from kernel thread, since mmdrop() implies a full barrier.
>> - Set membarrier_private_expedited to 1 only after arch registration
>>    code, thus eliminating a race where concurrent commands could succeed
>>    when they should fail if issued concurrently with process
>>    registration.
>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>    membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>    process registration.
>>
>> Changes since v4:
>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>    from Nicholas Piggin.
> For now, the powerpc approach is okay by me. I plan to test
> others (e.g., taking runqueue locks) on larger systems, but that can
> be sent as an incremental patch at a later time.
>
> The main thing I would like is for people to review the userspace API.
>

As a future satisfied user of the expedited private membarrier syscall, 
I am happy with the change.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 12:12     ` Peter Zijlstra
@ 2017-10-05 12:24       ` Peter Zijlstra
  -1 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 12:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	fweisbec, oleg, Boqun Feng, Andrew Hunter, Maged Michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> > diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> > new file mode 100644
> > index 000000000000..b0d79a5f5981
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/membarrier.c
> > @@ -0,0 +1,45 @@
> 
> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
> > +{
> > +	struct task_struct *t;
> > +
> > +	if (get_nr_threads(p) == 1) {
> > +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > +		return;
> > +	}
> > +	/*
> > +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> > +	 * fork is protected by siglock.
> > +	 */
> > +	spin_lock(&p->sighand->siglock);
> > +	for_each_thread(p, t)
> > +		set_ti_thread_flag(task_thread_info(t),
> > +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

Also, for easier reading I would suggest putting { } around the block.

> > +	spin_unlock(&p->sighand->siglock);
> > +	/*
> > +	 * Ensure all future scheduler executions will observe the new
> > +	 * thread flag state for this process.
> > +	 */
> > +	synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?
> 
> > +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 12:24       ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 12:24 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, mingo, jiangshanlai, dipankar, akpm,
	mathieu.desnoyers, josh, tglx, rostedt, dhowells, edumazet,
	fweisbec, oleg, Boqun Feng, Andrew Hunter, Maged Michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexan

On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> > diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> > new file mode 100644
> > index 000000000000..b0d79a5f5981
> > --- /dev/null
> > +++ b/arch/powerpc/kernel/membarrier.c
> > @@ -0,0 +1,45 @@
> 
> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
> > +{
> > +	struct task_struct *t;
> > +
> > +	if (get_nr_threads(p) == 1) {
> > +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > +		return;
> > +	}
> > +	/*
> > +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> > +	 * fork is protected by siglock.
> > +	 */
> > +	spin_lock(&p->sighand->siglock);
> > +	for_each_thread(p, t)
> > +		set_ti_thread_flag(task_thread_info(t),
> > +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

Also, for easier reading I would suggest putting { } around the block.

> > +	spin_unlock(&p->sighand->siglock);
> > +	/*
> > +	 * Ensure all future scheduler executions will observe the new
> > +	 * thread flag state for this process.
> > +	 */
> > +	synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?
> 
> > +}

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05  4:23     ` Nicholas Piggin
@ 2017-10-05 15:53       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 15:53 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, linuxppc-dev,
	linux-arch

----- On Oct 5, 2017, at 12:23 AM, Nicholas Piggin npiggin@gmail.com wrote:

> On Wed,  4 Oct 2017 14:37:53 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> 
>> Provide a new command allowing processes to register their intent to use
>> the private expedited command.
>> 
>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>> only issue the barrier when scheduling into a task belonging to a
>> process that has registered to use expedited private.
>> 
>> Processes are now required to register before using
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>> 
>> Changes since v1:
>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>>   check the next thread state.
>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>> - Use task_thread_info() to pass thread_info from task to
>>   *_ti_thread_flag().
>> 
>> Changes since v2:
>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>> - Check for NULL t->mm in membarrier_arch_fork().
>> - Use membarrier_sched_in() in generic code, which invokes the
>>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>   build on PowerPC.
>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>   allnoconfig build on PowerPC.
>> - Build and runtime tested on PowerPC.
>> 
>> Changes since v3:
>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>   field on fork.
>> - powerpc: test thread flag instead of reading
>>   membarrier_private_expedited in membarrier_arch_fork().
>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>   from kernel thread, since mmdrop() implies a full barrier.
>> - Set membarrier_private_expedited to 1 only after arch registration
>>   code, thus eliminating a race where concurrent commands could succeed
>>   when they should fail if issued concurrently with process
>>   registration.
>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>   process registration.
>> 
>> Changes since v4:
>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>   from Nicholas Piggin.
> 
> For now, the powerpc approach is okay by me. I plan to test
> others (e.g., taking runqueue locks) on larger systems, but that can
> be sent as an incremental patch at a later time.
> 
> The main thing I would like is for people to review the userspace API.
> 
> 
>> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
>> index 3a19c253bdb1..4af1b719c65f 100644
>> --- a/include/linux/sched/mm.h
>> +++ b/include/linux/sched/mm.h
>> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int
>> flags)
>>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>>  }
>>  
>> +#ifdef CONFIG_MEMBARRIER
>> +
>> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
>> +#include <asm/membarrier.h>
>> +#else
>> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
>> +		struct mm_struct *next, struct task_struct *tsk)
>> +{
>> +}
> 
> This is no longer required in architecture independent code, is it?

Yes, good point! I'll remove this unused code in a follow up patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 15:53       ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 15:53 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner,
	Peter Zijlstra, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt

----- On Oct 5, 2017, at 12:23 AM, Nicholas Piggin npiggin@gmail.com wrote:

> On Wed,  4 Oct 2017 14:37:53 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> 
>> Provide a new command allowing processes to register their intent to use
>> the private expedited command.
>> 
>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>> only issue the barrier when scheduling into a task belonging to a
>> process that has registered to use expedited private.
>> 
>> Processes are now required to register before using
>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>> 
>> Changes since v1:
>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>   powerpc membarrier_arch_sched_in(), given that we want to specifically
>>   check the next thread state.
>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>> - Use task_thread_info() to pass thread_info from task to
>>   *_ti_thread_flag().
>> 
>> Changes since v2:
>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>> - Check for NULL t->mm in membarrier_arch_fork().
>> - Use membarrier_sched_in() in generic code, which invokes the
>>   arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>   build on PowerPC.
>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>   allnoconfig build on PowerPC.
>> - Build and runtime tested on PowerPC.
>> 
>> Changes since v3:
>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>   field on fork.
>> - powerpc: test thread flag instead of reading
>>   membarrier_private_expedited in membarrier_arch_fork().
>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>   from kernel thread, since mmdrop() implies a full barrier.
>> - Set membarrier_private_expedited to 1 only after arch registration
>>   code, thus eliminating a race where concurrent commands could succeed
>>   when they should fail if issued concurrently with process
>>   registration.
>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>   membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>   process registration.
>> 
>> Changes since v4:
>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>   from Nicholas Piggin.
> 
> For now, the powerpc approach is okay by me. I plan to test
> others (e.g., taking runqueue locks) on larger systems, but that can
> be sent as an incremental patch at a later time.
> 
> The main thing I would like is for people to review the userspace API.
> 
> 
>> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
>> index 3a19c253bdb1..4af1b719c65f 100644
>> --- a/include/linux/sched/mm.h
>> +++ b/include/linux/sched/mm.h
>> @@ -205,4 +205,54 @@ static inline void memalloc_noreclaim_restore(unsigned int
>> flags)
>>  	current->flags = (current->flags & ~PF_MEMALLOC) | flags;
>>  }
>>  
>> +#ifdef CONFIG_MEMBARRIER
>> +
>> +#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
>> +#include <asm/membarrier.h>
>> +#else
>> +static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
>> +		struct mm_struct *next, struct task_struct *tsk)
>> +{
>> +}
> 
> This is no longer required in architecture independent code, is it?

Yes, good point! I'll remove this unused code in a follow up patch.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 12:22       ` Avi Kivity
@ 2017-10-05 15:54         ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 15:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Nicholas Piggin, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, rostedt, David Howells,
	Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, linuxppc-dev,
	linux-arch

----- On Oct 5, 2017, at 8:22 AM, Avi Kivity avi@scylladb.com wrote:

> On 10/05/2017 07:23 AM, Nicholas Piggin wrote:
>> On Wed,  4 Oct 2017 14:37:53 -0700
>> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>>
>>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>
>>> Provide a new command allowing processes to register their intent to use
>>> the private expedited command.
>>>
>>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>>> only issue the barrier when scheduling into a task belonging to a
>>> process that has registered to use expedited private.
>>>
>>> Processes are now required to register before using
>>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>>>
>>> Changes since v1:
>>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>>    powerpc membarrier_arch_sched_in(), given that we want to specifically
>>>    check the next thread state.
>>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>>> - Use task_thread_info() to pass thread_info from task to
>>>    *_ti_thread_flag().
>>>
>>> Changes since v2:
>>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>>> - Check for NULL t->mm in membarrier_arch_fork().
>>> - Use membarrier_sched_in() in generic code, which invokes the
>>>    arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>>    build on PowerPC.
>>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>>    allnoconfig build on PowerPC.
>>> - Build and runtime tested on PowerPC.
>>>
>>> Changes since v3:
>>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>>    field on fork.
>>> - powerpc: test thread flag instead of reading
>>>    membarrier_private_expedited in membarrier_arch_fork().
>>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>>    from kernel thread, since mmdrop() implies a full barrier.
>>> - Set membarrier_private_expedited to 1 only after arch registration
>>>    code, thus eliminating a race where concurrent commands could succeed
>>>    when they should fail if issued concurrently with process
>>>    registration.
>>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>>    membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>>    process registration.
>>>
>>> Changes since v4:
>>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>>    from Nicholas Piggin.
>> For now, the powerpc approach is okay by me. I plan to test
>> others (e.g., taking runqueue locks) on larger systems, but that can
>> be sent as an incremental patch at a later time.
>>
>> The main thing I would like is for people to review the userspace API.
>>
> 
> As a future satisfied user of the expedited private membarrier syscall,
> I am happy with the change.

Thanks Avi for your input on the userspace API.

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 15:54         ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 15:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Nicholas Piggin, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, Peter Zijlstra, rostedt, David Howells,
	Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Benjamin Herrenschmidt

----- On Oct 5, 2017, at 8:22 AM, Avi Kivity avi@scylladb.com wrote:

> On 10/05/2017 07:23 AM, Nicholas Piggin wrote:
>> On Wed,  4 Oct 2017 14:37:53 -0700
>> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>>
>>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>>>
>>> Provide a new command allowing processes to register their intent to use
>>> the private expedited command.
>>>
>>> This allows PowerPC to skip the full memory barrier in switch_mm(), and
>>> only issue the barrier when scheduling into a task belonging to a
>>> process that has registered to use expedited private.
>>>
>>> Processes are now required to register before using
>>> MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
>>>
>>> Changes since v1:
>>> - Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
>>>    powerpc membarrier_arch_sched_in(), given that we want to specifically
>>>    check the next thread state.
>>> - Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
>>> - Use task_thread_info() to pass thread_info from task to
>>>    *_ti_thread_flag().
>>>
>>> Changes since v2:
>>> - Move membarrier_arch_sched_in() call to finish_task_switch().
>>> - Check for NULL t->mm in membarrier_arch_fork().
>>> - Use membarrier_sched_in() in generic code, which invokes the
>>>    arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
>>>    build on PowerPC.
>>> - Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
>>>    allnoconfig build on PowerPC.
>>> - Build and runtime tested on PowerPC.
>>>
>>> Changes since v3:
>>> - Simply rely on copy_mm() to copy the membarrier_private_expedited mm
>>>    field on fork.
>>> - powerpc: test thread flag instead of reading
>>>    membarrier_private_expedited in membarrier_arch_fork().
>>> - powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
>>>    from kernel thread, since mmdrop() implies a full barrier.
>>> - Set membarrier_private_expedited to 1 only after arch registration
>>>    code, thus eliminating a race where concurrent commands could succeed
>>>    when they should fail if issued concurrently with process
>>>    registration.
>>> - Use READ_ONCE() for membarrier_private_expedited field access in
>>>    membarrier_private_expedited. Matches WRITE_ONCE() performed in
>>>    process registration.
>>>
>>> Changes since v4:
>>> - Move powerpc hook from sched_in() to switch_mm(), based on feedback
>>>    from Nicholas Piggin.
>> For now, the powerpc approach is okay by me. I plan to test
>> others (e.g., taking runqueue locks) on larger systems, but that can
>> be sent as an incremental patch at a later time.
>>
>> The main thing I would like is for people to review the userspace API.
>>
> 
> As a future satisfied user of the expedited private membarrier syscall,
> I am happy with the change.

Thanks Avi for your input on the userspace API.

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 12:12     ` Peter Zijlstra
@ 2017-10-05 16:02       ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Alan Stern, Will Deacon, Andy Lutomirski,
	Ingo Molnar, Alexander Viro, Nicholas Piggin, linuxppc-dev,
	linux-arch

----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> new file mode 100644
>> index 000000000000..b0d79a5f5981
>> --- /dev/null
>> +++ b/arch/powerpc/kernel/membarrier.c
>> @@ -0,0 +1,45 @@
> 
>> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> +{
>> +	struct task_struct *t;
>> +
>> +	if (get_nr_threads(p) == 1) {
>> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> +		return;
>> +	}
>> +	/*
>> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> +	 * fork is protected by siglock.
>> +	 */
>> +	spin_lock(&p->sighand->siglock);
>> +	for_each_thread(p, t)
>> +		set_ti_thread_flag(task_thread_info(t),
>> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

The intent here is to hold the sighand siglock to provide mutual
exclusion against invocation of membarrier_fork(p, clone_flags)
by copy_process().

copy_process() grabs spin_lock(&current->sighand->siglock) for both
CLONE_THREAD and not CLONE_THREAD flags.

What am I missing here ?

> 
>> +	spin_unlock(&p->sighand->siglock);
>> +	/*
>> +	 * Ensure all future scheduler executions will observe the new
>> +	 * thread flag state for this process.
>> +	 */
>> +	synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?

Yes. The flag is read by membarrier_arch_switch_mm(), invoked
within switch_mm_irqs_off(), called by context_switch() before
finish_task_switch() releases the rq lock.

Is the comment clear enough, or do you have suggestions for
improvements ?

Thanks,

Mathieu

> 
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 16:02       ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael

----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> new file mode 100644
>> index 000000000000..b0d79a5f5981
>> --- /dev/null
>> +++ b/arch/powerpc/kernel/membarrier.c
>> @@ -0,0 +1,45 @@
> 
>> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> +{
>> +	struct task_struct *t;
>> +
>> +	if (get_nr_threads(p) == 1) {
>> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> +		return;
>> +	}
>> +	/*
>> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> +	 * fork is protected by siglock.
>> +	 */
>> +	spin_lock(&p->sighand->siglock);
>> +	for_each_thread(p, t)
>> +		set_ti_thread_flag(task_thread_info(t),
>> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> 
> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.

The intent here is to hold the sighand siglock to provide mutual
exclusion against invocation of membarrier_fork(p, clone_flags)
by copy_process().

copy_process() grabs spin_lock(&current->sighand->siglock) for both
CLONE_THREAD and not CLONE_THREAD flags.

What am I missing here ?

> 
>> +	spin_unlock(&p->sighand->siglock);
>> +	/*
>> +	 * Ensure all future scheduler executions will observe the new
>> +	 * thread flag state for this process.
>> +	 */
>> +	synchronize_sched();
> 
> This relies on the flag being read inside rq->lock, right?

Yes. The flag is read by membarrier_arch_switch_mm(), invoked
within switch_mm_irqs_off(), called by context_switch() before
finish_task_switch() releases the rq lock.

Is the comment clear enough, or do you have suggestions for
improvements ?

Thanks,

Mathieu

> 
> > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 12:24       ` Peter Zijlstra
@ 2017-10-05 16:05         ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Alan Stern, Will Deacon, Andy Lutomirski,
	Ingo Molnar, Alexander Viro, Nicholas Piggin, linuxppc-dev,
	linux-arch


----- On Oct 5, 2017, at 8:24 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
>> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> > diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> > new file mode 100644
>> > index 000000000000..b0d79a5f5981
>> > --- /dev/null
>> > +++ b/arch/powerpc/kernel/membarrier.c
>> > @@ -0,0 +1,45 @@
>> 
>> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> > +{
>> > +	struct task_struct *t;
>> > +
>> > +	if (get_nr_threads(p) == 1) {
>> > +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > +		return;
>> > +	}
>> > +	/*
>> > +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> > +	 * fork is protected by siglock.
>> > +	 */
>> > +	spin_lock(&p->sighand->siglock);
>> > +	for_each_thread(p, t)
>> > +		set_ti_thread_flag(task_thread_info(t),
>> > +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> 
>> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> Also, for easier reading I would suggest putting { } around the block.

Will do, thanks,

Mathieu

> 
>> > +	spin_unlock(&p->sighand->siglock);
>> > +	/*
>> > +	 * Ensure all future scheduler executions will observe the new
>> > +	 * thread flag state for this process.
>> > +	 */
>> > +	synchronize_sched();
>> 
>> This relies on the flag being read inside rq->lock, right?
>> 
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 16:05         ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 16:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael


----- On Oct 5, 2017, at 8:24 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Oct 05, 2017 at 02:12:50PM +0200, Peter Zijlstra wrote:
>> On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> > diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> > new file mode 100644
>> > index 000000000000..b0d79a5f5981
>> > --- /dev/null
>> > +++ b/arch/powerpc/kernel/membarrier.c
>> > @@ -0,0 +1,45 @@
>> 
>> > +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> > +{
>> > +	struct task_struct *t;
>> > +
>> > +	if (get_nr_threads(p) == 1) {
>> > +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > +		return;
>> > +	}
>> > +	/*
>> > +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> > +	 * fork is protected by siglock.
>> > +	 */
>> > +	spin_lock(&p->sighand->siglock);
>> > +	for_each_thread(p, t)
>> > +		set_ti_thread_flag(task_thread_info(t),
>> > +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> 
>> I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> Also, for easier reading I would suggest putting { } around the block.

Will do, thanks,

Mathieu

> 
>> > +	spin_unlock(&p->sighand->siglock);
>> > +	/*
>> > +	 * Ensure all future scheduler executions will observe the new
>> > +	 * thread flag state for this process.
>> > +	 */
>> > +	synchronize_sched();
>> 
>> This relies on the flag being read inside rq->lock, right?
>> 
> > > +}

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 16:02       ` Mathieu Desnoyers
@ 2017-10-05 16:21         ` Peter Zijlstra
  -1 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 16:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Alan Stern, Will Deacon, Andy Lutomirski,
	Ingo Molnar, Alexander Viro, Nicholas Piggin, linuxppc-dev,
	linux-arch

On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index 000000000000..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +	struct task_struct *t;
> >> +
> >> +	if (get_nr_threads(p) == 1) {
> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +		return;
> >> +	}
> >> +	/*
> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +	 * fork is protected by siglock.
> >> +	 */
> >> +	spin_lock(&p->sighand->siglock);
> >> +	for_each_thread(p, t)
> >> +		set_ti_thread_flag(task_thread_info(t),
> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(&current->sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?

If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
you'll not be part of thread_head, so the for_each_thread() iteration
will not find the task.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 16:21         ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-05 16:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael

On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index 000000000000..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +	struct task_struct *t;
> >> +
> >> +	if (get_nr_threads(p) == 1) {
> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +		return;
> >> +	}
> >> +	/*
> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +	 * fork is protected by siglock.
> >> +	 */
> >> +	spin_lock(&p->sighand->siglock);
> >> +	for_each_thread(p, t)
> >> +		set_ti_thread_flag(task_thread_info(t),
> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(&current->sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?

If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
you'll not be part of thread_head, so the for_each_thread() iteration
will not find the task.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 16:21         ` Peter Zijlstra
@ 2017-10-05 21:44           ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 21:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Alan Stern, Will Deacon, Andy Lutomirski,
	Ingo Molnar, Alexander Viro, Nicholas Piggin, linuxppc-dev,
	linux-arch

----- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
>> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index 000000000000..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +	struct task_struct *t;
>> >> +
>> >> +	if (get_nr_threads(p) == 1) {
>> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +		return;
>> >> +	}
>> >> +	/*
>> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +	 * fork is protected by siglock.
>> >> +	 */
>> >> +	spin_lock(&p->sighand->siglock);
>> >> +	for_each_thread(p, t)
>> >> +		set_ti_thread_flag(task_thread_info(t),
>> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(&current->sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
> 
> If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
> you'll not be part of thread_head, so the for_each_thread() iteration
> will not find the task.

Excellent point. Please see the follow up RFC patch I posted taking care of
this matter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 21:44           ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 21:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, linux-kernel, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael

----- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
>> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index 000000000000..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +	struct task_struct *t;
>> >> +
>> >> +	if (get_nr_threads(p) == 1) {
>> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +		return;
>> >> +	}
>> >> +	/*
>> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +	 * fork is protected by siglock.
>> >> +	 */
>> >> +	spin_lock(&p->sighand->siglock);
>> >> +	for_each_thread(p, t)
>> >> +		set_ti_thread_flag(task_thread_info(t),
>> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(&current->sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
> 
> If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
> you'll not be part of thread_head, so the for_each_thread() iteration
> will not find the task.

Excellent point. Please see the follow up RFC patch I posted taking care of
this matter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 16:02       ` Mathieu Desnoyers
@ 2017-10-05 22:02         ` Andrea Parri
  -1 siblings, 0 replies; 37+ messages in thread
From: Andrea Parri @ 2017-10-05 22:02 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index 000000000000..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +	struct task_struct *t;
> >> +
> >> +	if (get_nr_threads(p) == 1) {
> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +		return;
> >> +	}
> >> +	/*
> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +	 * fork is protected by siglock.
> >> +	 */
> >> +	spin_lock(&p->sighand->siglock);
> >> +	for_each_thread(p, t)
> >> +		set_ti_thread_flag(task_thread_info(t),
> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(&current->sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?
> 
> > 
> >> +	spin_unlock(&p->sighand->siglock);
> >> +	/*
> >> +	 * Ensure all future scheduler executions will observe the new
> >> +	 * thread flag state for this process.
> >> +	 */
> >> +	synchronize_sched();
> > 
> > This relies on the flag being read inside rq->lock, right?
> 
> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
> within switch_mm_irqs_off(), called by context_switch() before
> finish_task_switch() releases the rq lock.

I fail to graps the relation between this synchronize_sched() and rq->lock.

(Besides, we made no reference to rq->lock while discussing:

  https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
  replace membarrier_arch_sched_in with switch_mm_irqs_off )

Could you elaborate?

  Andrea


> 
> Is the comment clear enough, or do you have suggestions for
> improvements ?



> 
> Thanks,
> 
> Mathieu
> 
> > 
> > > +}
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 22:02         ` Andrea Parri
  0 siblings, 0 replies; 37+ messages in thread
From: Andrea Parri @ 2017-10-05 22:02 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt

On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
> >> new file mode 100644
> >> index 000000000000..b0d79a5f5981
> >> --- /dev/null
> >> +++ b/arch/powerpc/kernel/membarrier.c
> >> @@ -0,0 +1,45 @@
> > 
> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
> >> +{
> >> +	struct task_struct *t;
> >> +
> >> +	if (get_nr_threads(p) == 1) {
> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> >> +		return;
> >> +	}
> >> +	/*
> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
> >> +	 * fork is protected by siglock.
> >> +	 */
> >> +	spin_lock(&p->sighand->siglock);
> >> +	for_each_thread(p, t)
> >> +		set_ti_thread_flag(task_thread_info(t),
> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
> > 
> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
> 
> The intent here is to hold the sighand siglock to provide mutual
> exclusion against invocation of membarrier_fork(p, clone_flags)
> by copy_process().
> 
> copy_process() grabs spin_lock(&current->sighand->siglock) for both
> CLONE_THREAD and not CLONE_THREAD flags.
> 
> What am I missing here ?
> 
> > 
> >> +	spin_unlock(&p->sighand->siglock);
> >> +	/*
> >> +	 * Ensure all future scheduler executions will observe the new
> >> +	 * thread flag state for this process.
> >> +	 */
> >> +	synchronize_sched();
> > 
> > This relies on the flag being read inside rq->lock, right?
> 
> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
> within switch_mm_irqs_off(), called by context_switch() before
> finish_task_switch() releases the rq lock.

I fail to graps the relation between this synchronize_sched() and rq->lock.

(Besides, we made no reference to rq->lock while discussing:

  https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
  replace membarrier_arch_sched_in with switch_mm_irqs_off )

Could you elaborate?

  Andrea


> 
> Is the comment clear enough, or do you have suggestions for
> improvements ?



> 
> Thanks,
> 
> Mathieu
> 
> > 
> > > +}
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 22:02         ` Andrea Parri
  (?)
@ 2017-10-05 22:04         ` Andrea Parri
  -1 siblings, 0 replies; 37+ messages in thread
From: Andrea Parri @ 2017-10-05 22:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Peter Zijlstra, Paul E. McKenney, Ingo Molnar, Lai Jiangshan,
	dipankar, Andrew Morton, Josh Triplett, Thomas Gleixner, rostedt,
	David Howells, Eric Dumazet, fweisbec, Oleg Nesterov, Boqun Feng,
	Andrew Hunter, maged michael, gromer, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Alan Stern, Will Deacon, Andy Lutomirski,
	Ingo Molnar, Alexander Viro, Nicholas Piggin, linuxppc-dev

On Fri, Oct 6, 2017 at 12:02 AM, Andrea Parri <parri.andrea@gmail.com> wrote:
> On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
>> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
>>
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index 000000000000..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> >
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +  struct task_struct *t;
>> >> +
>> >> +  if (get_nr_threads(p) == 1) {
>> >> +          set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +          return;
>> >> +  }
>> >> +  /*
>> >> +   * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +   * fork is protected by siglock.
>> >> +   */
>> >> +  spin_lock(&p->sighand->siglock);
>> >> +  for_each_thread(p, t)
>> >> +          set_ti_thread_flag(task_thread_info(t),
>> >> +                          TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>>
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>>
>> copy_process() grabs spin_lock(&current->sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>>
>> What am I missing here ?
>>
>> >
>> >> +  spin_unlock(&p->sighand->siglock);
>> >> +  /*
>> >> +   * Ensure all future scheduler executions will observe the new
>> >> +   * thread flag state for this process.
>> >> +   */
>> >> +  synchronize_sched();
>> >
>> > This relies on the flag being read inside rq->lock, right?
>>
>> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
>> within switch_mm_irqs_off(), called by context_switch() before
>> finish_task_switch() releases the rq lock.
>
> I fail to graps the relation between this synchronize_sched() and rq->lock.

s/graps/grasp

  Andrea


>
> (Besides, we made no reference to rq->lock while discussing:
>
>   https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
>   replace membarrier_arch_sched_in with switch_mm_irqs_off )
>
> Could you elaborate?
>
>   Andrea
>
>
>>
>> Is the comment clear enough, or do you have suggestions for
>> improvements ?
>
>
>
>>
>> Thanks,
>>
>> Mathieu
>>
>> >
>> > > +}
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 22:02         ` Andrea Parri
@ 2017-10-05 22:19           ` Mathieu Desnoyers
  -1 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 22:19 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

----- On Oct 5, 2017, at 6:02 PM, Andrea Parri parri.andrea@gmail.com wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
>> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index 000000000000..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +	struct task_struct *t;
>> >> +
>> >> +	if (get_nr_threads(p) == 1) {
>> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +		return;
>> >> +	}
>> >> +	/*
>> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +	 * fork is protected by siglock.
>> >> +	 */
>> >> +	spin_lock(&p->sighand->siglock);
>> >> +	for_each_thread(p, t)
>> >> +		set_ti_thread_flag(task_thread_info(t),
>> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(&current->sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
>> 
>> > 
>> >> +	spin_unlock(&p->sighand->siglock);
>> >> +	/*
>> >> +	 * Ensure all future scheduler executions will observe the new
>> >> +	 * thread flag state for this process.
>> >> +	 */
>> >> +	synchronize_sched();
>> > 
>> > This relies on the flag being read inside rq->lock, right?
>> 
>> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
>> within switch_mm_irqs_off(), called by context_switch() before
>> finish_task_switch() releases the rq lock.
> 
> I fail to grap the relation between this synchronize_sched() and rq->lock.
> 
> (Besides, we made no reference to rq->lock while discussing:
> 
>  https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
>  replace membarrier_arch_sched_in with switch_mm_irqs_off )
> 
> Could you elaborate?

Hi Andrea,

AFAIU the scheduler rq->lock is held while preemption is disabled.
synchronize_sched() is used here to ensure that all pre-existing
preempt-off critical sections have completed.

So saying that we use synchronize_sched() to synchronize with rq->lock
would be stretching the truth a bit. It's actually only true because the
scheduler holding the rq->lock is surrounded by a preempt-off
critical section.

Thanks,

Mathieu



> 
>  Andrea
> 
> 
>> 
>> Is the comment clear enough, or do you have suggestions for
>> improvements ?
> 
> 
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> > 
>> > > +}
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 22:19           ` Mathieu Desnoyers
  0 siblings, 0 replies; 37+ messages in thread
From: Mathieu Desnoyers @ 2017-10-05 22:19 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Peter Zijlstra, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt

----- On Oct 5, 2017, at 6:02 PM, Andrea Parri parri.andrea@gmail.com wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +0000, Mathieu Desnoyers wrote:
>> ----- On Oct 5, 2017, at 8:12 AM, Peter Zijlstra peterz@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index 000000000000..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> +	struct task_struct *t;
>> >> +
>> >> +	if (get_nr_threads(p) == 1) {
>> >> +		set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> +		return;
>> >> +	}
>> >> +	/*
>> >> +	 * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +	 * fork is protected by siglock.
>> >> +	 */
>> >> +	spin_lock(&p->sighand->siglock);
>> >> +	for_each_thread(p, t)
>> >> +		set_ti_thread_flag(task_thread_info(t),
>> >> +				TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(&current->sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
>> 
>> > 
>> >> +	spin_unlock(&p->sighand->siglock);
>> >> +	/*
>> >> +	 * Ensure all future scheduler executions will observe the new
>> >> +	 * thread flag state for this process.
>> >> +	 */
>> >> +	synchronize_sched();
>> > 
>> > This relies on the flag being read inside rq->lock, right?
>> 
>> Yes. The flag is read by membarrier_arch_switch_mm(), invoked
>> within switch_mm_irqs_off(), called by context_switch() before
>> finish_task_switch() releases the rq lock.
> 
> I fail to grap the relation between this synchronize_sched() and rq->lock.
> 
> (Besides, we made no reference to rq->lock while discussing:
> 
>  https://github.com/paulmckrcu/litmus/commit/47039df324b60ace0cf7b2403299580be730119b
>  replace membarrier_arch_sched_in with switch_mm_irqs_off )
> 
> Could you elaborate?

Hi Andrea,

AFAIU the scheduler rq->lock is held while preemption is disabled.
synchronize_sched() is used here to ensure that all pre-existing
preempt-off critical sections have completed.

So saying that we use synchronize_sched() to synchronize with rq->lock
would be stretching the truth a bit. It's actually only true because the
scheduler holding the rq->lock is surrounded by a preempt-off
critical section.

Thanks,

Mathieu



> 
>  Andrea
> 
> 
>> 
>> Is the comment clear enough, or do you have suggestions for
>> improvements ?
> 
> 
> 
>> 
>> Thanks,
>> 
>> Mathieu
>> 
>> > 
>> > > +}
>> 
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 22:19           ` Mathieu Desnoyers
@ 2017-10-05 22:46             ` Steven Rostedt
  -1 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2017-10-05 22:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Parri, Peter Zijlstra, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton,
	Josh Triplett, Thomas Gleixner, David Howells, Eric Dumazet,
	fweisbec, Oleg Nesterov, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, Alan Stern,
	Will Deacon, Andy Lutomirski, Ingo Molnar, Alexander Viro,
	Nicholas Piggin, linuxppc-dev, linux-arch

On Thu, 5 Oct 2017 22:19:15 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

Not only is preemption disabled (which is true for all spinlocks, at
least with non PREEMPT_RT), but the rq->lock must also only be taken
with interrupts disabled (for PREEMPT_RT as well). Because it can also
be taken in interrupt context.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-05 22:46             ` Steven Rostedt
  0 siblings, 0 replies; 37+ messages in thread
From: Steven Rostedt @ 2017-10-05 22:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Parri, Peter Zijlstra, Paul E. McKenney, linux-kernel,
	Ingo Molnar, Lai Jiangshan, dipankar, Andrew Morton,
	Josh Triplett, Thomas Gleixner, David Howells, Eric Dumazet,
	fweisbec, Oleg Nesterov, Boqun Feng, Andrew Hunter,
	maged michael, gromer, Avi Kivity, Benjamin Herrenschmidt

On Thu, 5 Oct 2017 22:19:15 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

Not only is preemption disabled (which is true for all spinlocks, at
least with non PREEMPT_RT), but the rq->lock must also only be taken
with interrupts disabled (for PREEMPT_RT as well). Because it can also
be taken in interrupt context.

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-05 22:19           ` Mathieu Desnoyers
@ 2017-10-06  8:32             ` Peter Zijlstra
  -1 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-06  8:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Parri, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
implies !preempt. Yes, we also surround the rq->lock usage with a
slightly larger preempt_disable() section but that's not in fact
required for this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-06  8:32             ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2017-10-06  8:32 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrea Parri, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt

> AFAIU the scheduler rq->lock is held while preemption is disabled.
> synchronize_sched() is used here to ensure that all pre-existing
> preempt-off critical sections have completed.
> 
> So saying that we use synchronize_sched() to synchronize with rq->lock
> would be stretching the truth a bit. It's actually only true because the
> scheduler holding the rq->lock is surrounded by a preempt-off
> critical section.

No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
implies !preempt. Yes, we also surround the rq->lock usage with a
slightly larger preempt_disable() section but that's not in fact
required for this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
  2017-10-06  8:32             ` Peter Zijlstra
@ 2017-10-07 15:10               ` Andrea Parri
  -1 siblings, 0 replies; 37+ messages in thread
From: Andrea Parri @ 2017-10-07 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Alan Stern, Will Deacon,
	Andy Lutomirski, Ingo Molnar, Alexander Viro, Nicholas Piggin,
	linuxppc-dev, linux-arch

On Fri, Oct 06, 2017 at 10:32:19AM +0200, Peter Zijlstra wrote:
> > AFAIU the scheduler rq->lock is held while preemption is disabled.
> > synchronize_sched() is used here to ensure that all pre-existing
> > preempt-off critical sections have completed.
> > 
> > So saying that we use synchronize_sched() to synchronize with rq->lock
> > would be stretching the truth a bit. It's actually only true because the
> > scheduler holding the rq->lock is surrounded by a preempt-off
> > critical section.
> 
> No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
> implies !preempt. Yes, we also surround the rq->lock usage with a
> slightly larger preempt_disable() section but that's not in fact
> required for this.

That's what it is, according to the current sources: we seemed to agree that
a preempt-off critical section is what we rely on here and that the start of
this critical section is not marked by that raw_spin_lock.

  Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command
@ 2017-10-07 15:10               ` Andrea Parri
  0 siblings, 0 replies; 37+ messages in thread
From: Andrea Parri @ 2017-10-07 15:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, Paul E. McKenney, linux-kernel, Ingo Molnar,
	Lai Jiangshan, dipankar, Andrew Morton, Josh Triplett,
	Thomas Gleixner, rostedt, David Howells, Eric Dumazet, fweisbec,
	Oleg Nesterov, Boqun Feng, Andrew Hunter, maged michael, gromer,
	Avi Kivity, Benjamin Herrenschmidt

On Fri, Oct 06, 2017 at 10:32:19AM +0200, Peter Zijlstra wrote:
> > AFAIU the scheduler rq->lock is held while preemption is disabled.
> > synchronize_sched() is used here to ensure that all pre-existing
> > preempt-off critical sections have completed.
> > 
> > So saying that we use synchronize_sched() to synchronize with rq->lock
> > would be stretching the truth a bit. It's actually only true because the
> > scheduler holding the rq->lock is surrounded by a preempt-off
> > critical section.
> 
> No, rq->lock is sufficient, note that rq->lock is a raw_spinlock_t which
> implies !preempt. Yes, we also surround the rq->lock usage with a
> slightly larger preempt_disable() section but that's not in fact
> required for this.

That's what it is, according to the current sources: we seemed to agree that
a preempt-off critical section is what we rely on here and that the start of
this critical section is not marked by that raw_spin_lock.

  Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2017-10-07 15:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04 21:37 [PATCH tip/core/rcu 0/3] Updates to sys_membarrier for v4.14 to add registration Paul E. McKenney
2017-10-04 21:37 ` [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command Paul E. McKenney
2017-10-04 21:37   ` Paul E. McKenney
2017-10-05  4:23   ` Nicholas Piggin
2017-10-05  4:23     ` Nicholas Piggin
2017-10-05 12:22     ` Avi Kivity
2017-10-05 12:22       ` Avi Kivity
2017-10-05 15:54       ` Mathieu Desnoyers
2017-10-05 15:54         ` Mathieu Desnoyers
2017-10-05 15:53     ` Mathieu Desnoyers
2017-10-05 15:53       ` Mathieu Desnoyers
2017-10-05 12:12   ` Peter Zijlstra
2017-10-05 12:12     ` Peter Zijlstra
2017-10-05 12:24     ` Peter Zijlstra
2017-10-05 12:24       ` Peter Zijlstra
2017-10-05 16:05       ` Mathieu Desnoyers
2017-10-05 16:05         ` Mathieu Desnoyers
2017-10-05 16:02     ` Mathieu Desnoyers
2017-10-05 16:02       ` Mathieu Desnoyers
2017-10-05 16:21       ` Peter Zijlstra
2017-10-05 16:21         ` Peter Zijlstra
2017-10-05 21:44         ` Mathieu Desnoyers
2017-10-05 21:44           ` Mathieu Desnoyers
2017-10-05 22:02       ` Andrea Parri
2017-10-05 22:02         ` Andrea Parri
2017-10-05 22:04         ` Andrea Parri
2017-10-05 22:19         ` Mathieu Desnoyers
2017-10-05 22:19           ` Mathieu Desnoyers
2017-10-05 22:46           ` Steven Rostedt
2017-10-05 22:46             ` Steven Rostedt
2017-10-06  8:32           ` Peter Zijlstra
2017-10-06  8:32             ` Peter Zijlstra
2017-10-07 15:10             ` Andrea Parri
2017-10-07 15:10               ` Andrea Parri
2017-10-04 21:37 ` [PATCH tip/core/rcu 2/3] membarrier: selftest: Test private expedited cmd Paul E. McKenney
2017-10-04 21:37   ` Paul E. McKenney
2017-10-04 21:37 ` [PATCH tip/core/rcu 3/3] membarrier: Document scheduler barrier requirements Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.