linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 4.16 00/10] membarrier updates for 4.16
@ 2018-01-15 19:10 Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 01/10] membarrier: selftest: Test private expedited cmd (v2) Mathieu Desnoyers
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers

Hi Ingo, Hi Peter,

Here are the membarrier patch series I would like to submit for the
4.16 merge window. It would be appreciated of those can go through the
scheduler tree.

It's the same series as last week, rebased on top of 4.15-rc8.

Highlights:

"powerpc: membarrier: Skip memory barrier in switch_mm()" takes care of
a TODO that was left in the private expedited implementation when merged
in 4.14: an extra memory barrier was added on context switch on powerpc.
Ensure that the barrier is only performed when scheduling between
different processes, only for threads belonging to processes that have
registered their intent to use the private expedited command.

"membarrier: provide SHARED_EXPEDITED command" adds new commands to
membarrier for registration and use of membarrier across shared
memory mappings. The non-expedited command has proven to be really
too slow (taking 10ms and more to complete) for real-world use. The
expedited version completes in a matter of microseconds.

"membarrier: Provide core serializing command" provides core
serialization for JIT reclaim. We received positive feedback from
Android developers that the proposed ABI fits their use-case.
Only x86 32/64 and arm 64 implement this command so far. This is
opt-in per architecture.

The other patches add selftests and documentation.

Thanks,

Mathieu

Mathieu Desnoyers (10):
  membarrier: selftest: Test private expedited cmd (v2)
  powerpc: membarrier: Skip memory barrier in switch_mm() (v7)
  membarrier: Document scheduler barrier requirements (v5)
  membarrier: provide SHARED_EXPEDITED command (v2)
  membarrier: selftest: Test shared expedited cmd
  membarrier: Provide core serializing command
  x86: Introduce sync_core_before_usermode (v2)
  membarrier: x86: Provide core serializing command (v3)
  membarrier: arm64: Provide core serializing command
  membarrier: selftest: Test private expedited sync core cmd

 MAINTAINERS                                        |   1 +
 arch/arm64/Kconfig                                 |   1 +
 arch/arm64/kernel/entry.S                          |   4 +
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/membarrier.h              |  27 +++
 arch/powerpc/mm/mmu_context.c                      |   7 +
 arch/x86/Kconfig                                   |   2 +
 arch/x86/entry/entry_32.S                          |   5 +
 arch/x86/entry/entry_64.S                          |   4 +
 arch/x86/include/asm/processor.h                   |  10 +
 arch/x86/mm/tlb.c                                  |   6 +
 include/linux/processor.h                          |   6 +
 include/linux/sched/mm.h                           |  40 +++-
 include/uapi/linux/membarrier.h                    |  66 +++++-
 init/Kconfig                                       |   9 +
 kernel/sched/core.c                                |  53 +++--
 kernel/sched/membarrier.c                          | 171 +++++++++++++--
 .../testing/selftests/membarrier/membarrier_test.c | 235 +++++++++++++++++++--
 18 files changed, 585 insertions(+), 63 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 01/10] membarrier: selftest: Test private expedited cmd (v2)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
@ 2018-01-15 19:10 ` Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 02/10] powerpc: membarrier: Skip memory barrier in switch_mm() (v7) Mathieu Desnoyers
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, Alan Stern,
	Alice Ferrazzi, Paul Elder, linux-kselftest, linux-arch

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 02/10] powerpc: membarrier: Skip memory barrier in switch_mm() (v7)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 01/10] membarrier: selftest: Test private expedited cmd (v2) Mathieu Desnoyers
@ 2018-01-15 19:10 ` Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 03/10] membarrier: Document scheduler barrier requirements (v5) Mathieu Desnoyers
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, Alan Stern,
	Alexander Viro, Nicholas Piggin, linuxppc-dev, linux-arch

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
---
 MAINTAINERS                           |  1 +
 arch/powerpc/Kconfig                  |  1 +
 arch/powerpc/include/asm/membarrier.h | 26 ++++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c         |  7 +++++++
 include/linux/sched/mm.h              | 13 ++++++++++++-
 init/Kconfig                          |  3 +++
 kernel/sched/core.c                   | 10 ----------
 kernel/sched/membarrier.c             |  8 ++++++++
 8 files changed, 58 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 18994806e441..c2f0d9a48a10 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8931,6 +8931,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index c51e6ce42e7a..a63adb082c0a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_PMEM_API                if PPC64
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..98ff4f1fcf2b
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,26 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+					     struct mm_struct *next,
+					     struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!(atomic_read(&next->membarrier_state) &
+		     MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index d60a62bf4fc7..0ab297c4cfad 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -58,6 +59,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -80,6 +85,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3d49b91b674d..1754396795f6 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,14 +215,25 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 #ifdef CONFIG_MEMBARRIER
 enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_SWITCH_MM			= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
 };
 
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#endif
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+					     struct mm_struct *next,
+					     struct task_struct *tsk)
+{
+}
+#endif
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
diff --git a/init/Kconfig b/init/Kconfig
index a9a2e2c86671..2d118b6adee2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1412,6 +1412,9 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 644fa2e3d993..524b705892db 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2653,16 +2653,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 9bcbacba82a8..678577267a9a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -118,6 +118,14 @@ static void membarrier_register_private_expedited(void)
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
 		return;
+	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
+	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+		/*
+		 * Ensure all future scheduler executions will observe the
+		 * new thread flag state for this process.
+		 */
+		synchronize_sched();
+	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 03/10] membarrier: Document scheduler barrier requirements (v5)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 01/10] membarrier: selftest: Test private expedited cmd (v2) Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 02/10] powerpc: membarrier: Skip memory barrier in switch_mm() (v7) Mathieu Desnoyers
@ 2018-01-15 19:10 ` Mathieu Desnoyers
  2018-01-15 19:10 ` [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2) Mathieu Desnoyers
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Changes since v1:
- Update comments to match reality for code paths which are after
  storing to rq->curr, before returning to user-space, based on feedback
  from Andrea Parri.
Changes since v2:
- Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
  Based on feedback from Andrea Parri.
Changes since v3:
- Clarify comments following feeback from Peter Zijlstra.
Changes since v4:
- Update comment regarding powerpc barrier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
 arch/x86/mm/tlb.c        |  5 +++++
 include/linux/sched/mm.h |  5 +++++
 kernel/sched/core.c      | 37 ++++++++++++++++++++++++++-----------
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a1561957dccb..c28cd5592b0d 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -200,6 +200,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #endif
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * before returning to user-space, after storing to rq->curr.
+	 * Writing to CR3 provides that full memory barrier.
+	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 1754396795f6..28aef7051d73 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -39,6 +39,11 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+	/*
+	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * required by the membarrier system call before returning to
+	 * user-space, after storing to rq->curr.
+	 */
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 524b705892db..62f269980e29 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2657,6 +2657,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+	/*
+	 * When transitioning from a kernel thread to a userspace
+	 * thread, mmdrop()'s implicit full barrier is required by the
+	 * membarrier system call, because the current active_mm can
+	 * become the current mm without going through switch_mm().
+	 */
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
@@ -2762,6 +2768,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 */
 	arch_start_context_switch(prev);
 
+	/*
+	 * If mm is non-NULL, we pass through switch_mm(). If mm is
+	 * NULL, we will pass through mmdrop() in finish_task_switch().
+	 * Both of these contain the full memory barrier required by
+	 * membarrier after storing to rq->curr, before returning to
+	 * user-space.
+	 */
 	if (!mm) {
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
@@ -3298,6 +3311,9 @@ static void __sched notrace __schedule(bool preempt)
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
+	 *
+	 * The membarrier system call requires a full memory barrier
+	 * after coming from user-space, before storing to rq->curr.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -3345,17 +3361,16 @@ static void __sched notrace __schedule(bool preempt)
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
-		 * rq->curr, before returning to user-space. For TSO
-		 * (e.g. x86), the architecture must provide its own
-		 * barrier in switch_mm(). For weakly ordered machines
-		 * for which spin_unlock() acts as a full memory
-		 * barrier, finish_lock_switch() in common code takes
-		 * care of this barrier. For weakly ordered machines for
-		 * which spin_unlock() acts as a RELEASE barrier (only
-		 * arm64 and PowerPC), arm64 has a full barrier in
-		 * switch_to(), and PowerPC has
-		 * smp_mb__after_unlock_lock() before
-		 * finish_lock_switch().
+		 * rq->curr, before returning to user-space.
+		 *
+		 * Here are the schemes providing that barrier on the
+		 * various architectures:
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - finish_lock_switch() for weakly-ordered
+		 *   architectures where spin_unlock is a full barrier,
+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
+		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2018-01-15 19:10 ` [PATCH for 4.16 03/10] membarrier: Document scheduler barrier requirements (v5) Mathieu Desnoyers
@ 2018-01-15 19:10 ` Mathieu Desnoyers
  2018-01-16 18:20   ` Thomas Gleixner
  2018-01-15 19:10 ` [PATCH for 4.16 05/10] membarrier: selftest: Test shared expedited cmd Mathieu Desnoyers
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers

Allow expedited membarrier to be used for data shared between processes
(shared memory).

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_SHARED_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside to this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "shared" receivers will get the barrier, even if they
don't share a memory mapping with the "sender" issuing
MEMBARRIER_CMD_SHARED_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
Changes since v1:
- Add missing preempt disable around smp_call_function_many().
---
 arch/powerpc/include/asm/membarrier.h |   3 +-
 include/linux/sched/mm.h              |   6 +-
 include/uapi/linux/membarrier.h       |  34 ++++++++--
 kernel/sched/membarrier.c             | 114 ++++++++++++++++++++++++++++++++--
 4 files changed, 143 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
index 98ff4f1fcf2b..20cd79ed3fc6 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -13,7 +13,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 	 * store to rq->curr.
 	 */
 	if (likely(!(atomic_read(&next->membarrier_state) &
-		     MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+		      MEMBARRIER_STATE_SHARED_EXPEDITED)) || !prev))
 		return;
 
 	/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 28aef7051d73..cbeecd287589 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -219,8 +219,10 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 
 #ifdef CONFIG_MEMBARRIER
 enum {
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY		= (1U << 0),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
+	MEMBARRIER_STATE_SHARED_EXPEDITED_READY			= (1U << 2),
+	MEMBARRIER_STATE_SHARED_EXPEDITED			= (1U << 3),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 4e01ad7ffe98..2de01e595d3b 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -40,6 +40,28 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_SHARED_EXPEDITED:
+ *                          Execute a memory barrier on all running threads
+ *                          part of a process which previously registered
+ *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This only covers threads from processes
+ *                          which registered with
+ *                          MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
+ *                          This command returns 0. Given that
+ *                          registration is about the intent to receive
+ *                          the barriers, it is valid to invoke
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED from a
+ *                          non-registered process.
+ * @MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+ *                          Register the process intent to receive
+ *                          MEMBARRIER_CMD_SHARED_EXPEDITED memory
+ *                          barriers. Always returns 0.
  * @MEMBARRIER_CMD_PRIVATE_EXPEDITED:
  *                          Execute a memory barrier on each running
  *                          thread belonging to the same process as the current
@@ -70,12 +92,12 @@
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY				= 0,
-	MEMBARRIER_CMD_SHARED				= (1 << 0),
-	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
-	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
-	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
+	MEMBARRIER_CMD_QUERY					= 0,
+	MEMBARRIER_CMD_SHARED					= (1 << 0),
+	MEMBARRIER_CMD_SHARED_EXPEDITED				= (1 << 1),
+	MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED		= (1 << 2),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 678577267a9a..14bc8bf9e736 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -27,7 +27,9 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED \
+	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
@@ -35,6 +37,73 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static int membarrier_shared_expedited(void)
+{
+	int cpu;
+	bool fallback = false;
+	cpumask_var_t tmpmask;
+
+	if (num_online_cpus() == 1)
+		return 0;
+
+	/*
+	 * Matches memory barriers around rq->curr modification in
+	 * scheduler.
+	 */
+	smp_mb();	/* system call entry is not a mb. */
+
+	/*
+	 * Expedited membarrier commands guarantee that they won't
+	 * block, hence the GFP_NOWAIT allocation flag and fallback
+	 * implementation.
+	 */
+	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
+		/* Fallback for OOM. */
+		fallback = true;
+	}
+
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct task_struct *p;
+
+		/*
+		 * Skipping the current CPU is OK even through we can be
+		 * migrated at any point. The current CPU, at the point
+		 * where we read raw_smp_processor_id(), is ensured to
+		 * be in program order with respect to the caller
+		 * thread. Therefore, we can skip this CPU from the
+		 * iteration.
+		 */
+		if (cpu == raw_smp_processor_id())
+			continue;
+		rcu_read_lock();
+		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
+		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
+				   MEMBARRIER_STATE_SHARED_EXPEDITED)) {
+			if (!fallback)
+				__cpumask_set_cpu(cpu, tmpmask);
+			else
+				smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		}
+		rcu_read_unlock();
+	}
+	if (!fallback) {
+		preempt_disable();
+		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
+		free_cpumask_var(tmpmask);
+	}
+	cpus_read_unlock();
+
+	/*
+	 * Memory barrier on the caller thread _after_ we finished
+	 * waiting for the last IPI. Matches memory barriers around
+	 * rq->curr modification in scheduler.
+	 */
+	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
 static int membarrier_private_expedited(void)
 {
 	int cpu;
@@ -105,7 +174,38 @@ static int membarrier_private_expedited(void)
 	return 0;
 }
 
-static void membarrier_register_private_expedited(void)
+static int membarrier_register_shared_expedited(void)
+{
+	struct task_struct *p = current;
+	struct mm_struct *mm = p->mm;
+
+	if (atomic_read(&mm->membarrier_state) &
+	    MEMBARRIER_STATE_SHARED_EXPEDITED_READY)
+		return 0;
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED, &mm->membarrier_state);
+	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+		/*
+		 * For single mm user, single threaded process, we can
+		 * simply issue a memory barrier after setting
+		 * MEMBARRIER_STATE_SHARED_EXPEDITED to guarantee that
+		 * no memory access following registration is reordered
+		 * before registration.
+		 */
+		smp_mb();
+	} else {
+		/*
+		 * For multi-mm user threads, we need to ensure all
+		 * future scheduler executions will observe the new
+		 * thread flag state for this mm.
+		 */
+		synchronize_sched();
+	}
+	atomic_or(MEMBARRIER_STATE_SHARED_EXPEDITED_READY,
+		  &mm->membarrier_state);
+	return 0;
+}
+
+static int membarrier_register_private_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
@@ -117,7 +217,7 @@ static void membarrier_register_private_expedited(void)
 	 */
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
-		return;
+		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
 		/*
@@ -128,6 +228,7 @@ static void membarrier_register_private_expedited(void)
 	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
+	return 0;
 }
 
 /**
@@ -177,11 +278,14 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_SHARED_EXPEDITED:
+		return membarrier_shared_expedited();
+	case MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
+		return membarrier_register_shared_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
 		return membarrier_private_expedited();
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		membarrier_register_private_expedited();
-		return 0;
+		return membarrier_register_private_expedited();
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 05/10] membarrier: selftest: Test shared expedited cmd
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2018-01-15 19:10 ` [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2) Mathieu Desnoyers
@ 2018-01-15 19:10 ` Mathieu Desnoyers
  2018-01-15 19:11 ` [PATCH for 4.16 06/10] membarrier: Provide core serializing command Mathieu Desnoyers
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:10 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, Greg Kroah-Hartman,
	Alan Stern, Alice Ferrazzi, Paul Elder, linux-kselftest,
	linux-arch

Test the new MEMBARRIER_CMD_SHARED_EXPEDITED and
MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 51 +++++++++++++++++++++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..bb9c58072c5c 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_shared_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_SHARED_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a shared membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_shared_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_shared_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_SHARED)) {
+		ksft_test_result_fail("sys_membarrier() CMD_SHARED query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 06/10] membarrier: Provide core serializing command
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2018-01-15 19:10 ` [PATCH for 4.16 05/10] membarrier: selftest: Test shared expedited cmd Mathieu Desnoyers
@ 2018-01-15 19:11 ` Mathieu Desnoyers
  2018-01-15 19:11 ` [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2) Mathieu Desnoyers
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, linux-arch

Provide core serializing membarrier command to support memory reclaim
by JIT.

Each architecture needs to explicitly opt into that support by
documenting in their architecture code how they provide the core
serializing instructions required when returning from the membarrier
IPI, and after the scheduler has updated the curr->mm pointer (before
going back to user-space). They should then select
ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on
their architecture.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: linux-arch@vger.kernel.org
---
 include/linux/sched/mm.h        |  6 +++++
 include/uapi/linux/membarrier.h | 32 +++++++++++++++++++++++++-
 init/Kconfig                    |  3 +++
 kernel/sched/membarrier.c       | 50 +++++++++++++++++++++++++++++++----------
 4 files changed, 78 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index cbeecd287589..3ff217a071ca 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -223,6 +223,12 @@ enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
 	MEMBARRIER_STATE_SHARED_EXPEDITED_READY			= (1U << 2),
 	MEMBARRIER_STATE_SHARED_EXPEDITED			= (1U << 3),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY	= (1U << 4),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE		= (1U << 5),
+};
+
+enum {
+	MEMBARRIER_FLAG_SYNC_CORE	= (1U << 0),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 2de01e595d3b..99a66577bd85 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -73,7 +73,7 @@
  *                          to and return from the system call
  *                          (non-running threads are de facto in such a
  *                          state). This only covers threads from the
- *                          same processes as the caller thread. This
+ *                          same process as the caller thread. This
  *                          command returns 0 on success. The
  *                          "expedited" commands complete faster than
  *                          the non-expedited ones, they never block,
@@ -86,6 +86,34 @@
  *                          Register the process intent to use
  *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
  *                          returns 0.
+ * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
+ *                          In addition to provide memory ordering
+ *                          guarantees described in
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED, ensure
+ *                          the caller thread, upon return from system
+ *                          call, that all its running threads siblings
+ *                          have executed a core serializing
+ *                          instruction. (architectures are required to
+ *                          guarantee that non-running threads issue
+ *                          core serializing instructions before they
+ *                          resume user-space execution). This only
+ *                          covers threads from the same process as the
+ *                          caller thread. This command returns 0 on
+ *                          success. The "expedited" commands complete
+ *                          faster than the non-expedited ones, they
+ *                          never block, but have the downside of
+ *                          causing extra overhead. If this command is
+ *                          not implemented by an architecture, -EINVAL
+ *                          is returned. A process needs to register its
+ *                          intent to use the private expedited sync
+ *                          core command prior to using it, otherwise
+ *                          this command returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
+ *                          Register the process intent to use
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.
+ *                          If this command is not implemented by an
+ *                          architecture, -EINVAL is returned.
+ *                          Returns 0 on success.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
@@ -98,6 +126,8 @@ enum membarrier_cmd {
 	MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED		= (1 << 2),
 	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE		= (1 << 5),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE	= (1 << 6),
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/init/Kconfig b/init/Kconfig
index 2d118b6adee2..953c57dda8b9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1415,6 +1415,9 @@ config USERFAULTFD
 config ARCH_HAS_MEMBARRIER_HOOKS
 	bool
 
+config ARCH_HAS_MEMBARRIER_SYNC_CORE
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 14bc8bf9e736..fcd2306c2367 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -26,11 +26,20 @@
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
  * except MEMBARRIER_CMD_QUERY.
  */
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
+#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	\
+	(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE)
+#else
+#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	0
+#endif
+
 #define MEMBARRIER_CMD_BITMASK	\
 	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_SHARED_EXPEDITED \
 	| MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED \
 	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
-	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	\
+	| MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK)
 
 static void ipi_mb(void *info)
 {
@@ -104,15 +113,23 @@ static int membarrier_shared_expedited(void)
 	return 0;
 }
 
-static int membarrier_private_expedited(void)
+static int membarrier_private_expedited(int flags)
 {
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
 
-	if (!(atomic_read(&current->mm->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
-		return -EPERM;
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
+			return -EINVAL;
+		if (!(atomic_read(&current->mm->membarrier_state) &
+		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
+			return -EPERM;
+	} else {
+		if (!(atomic_read(&current->mm->membarrier_state) &
+		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
+			return -EPERM;
+	}
 
 	if (num_online_cpus() == 1)
 		return 0;
@@ -205,18 +222,24 @@ static int membarrier_register_shared_expedited(void)
 	return 0;
 }
 
-static int membarrier_register_private_expedited(void)
+static int membarrier_register_private_expedited(int flags)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
+	int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY;
+
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
+			return -EINVAL;
+		state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+	}
 
 	/*
 	 * We need to consider threads belonging to different thread
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if (atomic_read(&mm->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
+	if (atomic_read(&mm->membarrier_state) & state)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
@@ -226,8 +249,7 @@ static int membarrier_register_private_expedited(void)
 		 */
 		synchronize_sched();
 	}
-	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
-			&mm->membarrier_state);
+	atomic_or(state, &mm->membarrier_state);
 	return 0;
 }
 
@@ -283,9 +305,13 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 	case MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED:
 		return membarrier_register_shared_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		return membarrier_private_expedited();
+		return membarrier_private_expedited(0);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		return membarrier_register_private_expedited();
+		return membarrier_register_private_expedited(0);
+	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
+		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
+		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2018-01-15 19:11 ` [PATCH for 4.16 06/10] membarrier: Provide core serializing command Mathieu Desnoyers
@ 2018-01-15 19:11 ` Mathieu Desnoyers
  2018-01-16 18:28   ` Thomas Gleixner
  2018-01-15 19:11 ` [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3) Mathieu Desnoyers
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, linux-arch

Introduce an architecture function that ensures the current CPU
issues a core serializing instruction before returning to usermode.

This is needed for the membarrier "sync_core" command.

Architectures defining the sync_core_before_usermode() static inline
need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Fix prototype of sync_core_before_usermode in generic code (missing
  return type).
- Add linux/processor.h include to sched/core.c.
- Add ARCH_HAS_SYNC_CORE_BEFORE_USERMODE to init/Kconfig.
- Fix linux/processor.h ifdef to target
  CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE rather than
  ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.
---
 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/processor.h | 10 ++++++++++
 include/linux/processor.h        |  6 ++++++
 init/Kconfig                     |  3 +++
 4 files changed, 20 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 20da391b5f32..0b44c8dd0e95 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -61,6 +61,7 @@ config X86
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
+	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAS_ZONE_DEVICE		if X86_64
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index d3a67fba200a..3257d34dbb40 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -722,6 +722,16 @@ static inline void sync_core(void)
 #endif
 }
 
+/*
+ * Ensure that a core serializing instruction is issued before returning
+ * to user-mode. x86 implements return to user-space through sysexit,
+ * sysrel, and sysretq, which are not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+	sync_core();
+}
+
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 extern void amd_e400_c1e_apic_setup(void);
 
diff --git a/include/linux/processor.h b/include/linux/processor.h
index dbc952eec869..866de5326d34 100644
--- a/include/linux/processor.h
+++ b/include/linux/processor.h
@@ -68,4 +68,10 @@ do {								\
 
 #endif
 
+#ifndef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
+static inline void sync_core_before_usermode(void)
+{
+}
+#endif
+
 #endif /* _LINUX_PROCESSOR_H */
diff --git a/init/Kconfig b/init/Kconfig
index 953c57dda8b9..30b65febeb23 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1921,3 +1921,6 @@ config ASN1
 	  functions to call on what tags.
 
 source "kernel/Kconfig.locks"
+
+config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
+	bool
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3)
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2018-01-15 19:11 ` [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2) Mathieu Desnoyers
@ 2018-01-15 19:11 ` Mathieu Desnoyers
  2018-01-16 18:29   ` Thomas Gleixner
  2018-01-15 19:11 ` [PATCH for 4.16 09/10] membarrier: arm64: Provide core serializing command Mathieu Desnoyers
  2018-01-15 19:11 ` [PATCH for 4.16 10/10] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers
  9 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, linux-arch

There are two places where core serialization is needed by membarrier:

1) When returning from the membarrier IPI,
2) After scheduler updates curr to a thread with a different mm, before
   going back to user-space, since the curr->mm is used by membarrier to
   check whether it needs to send an IPI to that CPU.

x86-32 uses iret as return from interrupt, and both iret and sysexit to go
back to user-space. The iret instruction is core serializing, but not
sysexit.

x86-64 uses iret as return from interrupt, which takes care of the IPI.
However, it can return to user-space through either sysretl (compat
code), sysretq, or iret. Given that sysret{l,q} is not core serializing,
we rely instead on write_cr3() performed by switch_mm() to provide core
serialization after changing the current mm, and deal with the special
case of kthread -> uthread (temporarily keeping current mm into
active_mm) by adding a sync_core() in that specific case.

Use the new sync_core_before_usermode() to guarantee this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org

---
Changes since v1:
- Use the newly introduced sync_core_before_usermode(). Move all state
  handling to generic code.
- Add linux/processor.h include to include/linux/sched/mm.h.

Changes since v2:
- Fix use-after-free in membarrier_mm_sync_core_before_usermode.
---
 arch/x86/Kconfig          |  1 +
 arch/x86/entry/entry_32.S |  5 +++++
 arch/x86/entry/entry_64.S |  4 ++++
 arch/x86/mm/tlb.c         |  7 ++++---
 include/linux/sched/mm.h  | 12 ++++++++++++
 kernel/sched/core.c       |  6 +++++-
 kernel/sched/membarrier.c |  3 +++
 7 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0b44c8dd0e95..b5324f2e3162 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_KCOV			if X86_64
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_PMEM_API		if X86_64
 	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a1f28a54f23a..0c89cef690cf 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -554,6 +554,11 @@ restore_all:
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * when returning from IPI handler and when returning from
+	 * scheduler to user-space.
+	 */
 	INTERRUPT_RETURN
 
 .section .fixup, "ax"
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4f8e1d35a97c..8a32390240f1 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -792,6 +792,10 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	POP_EXTRA_REGS
 	POP_C_REGS
 	addq	$8, %rsp	/* skip regs->orig_ax */
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * when returning from IPI handler.
+	 */
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c28cd5592b0d..df4e21371c89 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -201,9 +201,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
 	/*
-	 * The membarrier system call requires a full memory barrier
-	 * before returning to user-space, after storing to rq->curr.
-	 * Writing to CR3 provides that full memory barrier.
+	 * The membarrier system call requires a full memory barrier and
+	 * core serialization before returning to user-space, after
+	 * storing to rq->curr. Writing to CR3 provides that full
+	 * memory barrier and core serializing instruction.
 	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3ff217a071ca..fcd2cdc482c1 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -7,6 +7,7 @@
 #include <linux/sched.h>
 #include <linux/mm_types.h>
 #include <linux/gfp.h>
+#include <linux/processor.h>
 
 /*
  * Routines for handling mm_structs
@@ -235,6 +236,14 @@ enum {
 #include <asm/membarrier.h>
 #endif
 
+static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
+{
+	if (likely(!(atomic_read(&mm->membarrier_state) &
+		     MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
+		return;
+	sync_core_before_usermode();
+}
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
@@ -250,6 +259,9 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
+static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
+{
+}
 #endif
 
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 62f269980e29..f86cbba038b9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2662,9 +2662,13 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	 * thread, mmdrop()'s implicit full barrier is required by the
 	 * membarrier system call, because the current active_mm can
 	 * become the current mm without going through switch_mm().
+	 * membarrier also requires a core serializing instruction
+	 * before going back to user-space after storing to rq->curr.
 	 */
-	if (mm)
+	if (mm) {
+		membarrier_mm_sync_core_before_usermode(mm);
 		mmdrop(mm);
+	}
 	if (unlikely(prev_state == TASK_DEAD)) {
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index fcd2306c2367..e4f7b6dfb07b 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -242,6 +242,9 @@ static int membarrier_register_private_expedited(int flags)
 	if (atomic_read(&mm->membarrier_state) & state)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
+		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
+			  &mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
 		/*
 		 * Ensure all future scheduler executions will observe the
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 09/10] membarrier: arm64: Provide core serializing command
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2018-01-15 19:11 ` [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3) Mathieu Desnoyers
@ 2018-01-15 19:11 ` Mathieu Desnoyers
  2018-01-15 19:11 ` [PATCH for 4.16 10/10] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, linux-arch

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
 arch/arm64/Kconfig        | 1 +
 arch/arm64/kernel/entry.S | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..5b0c06d8dbbe 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d14b8f29b5f..5edde1c2e93e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -302,6 +302,10 @@ alternative_else_nop_endif
 	ldp	x28, x29, [sp, #16 * 14]
 	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #S_FRAME_SIZE		// restore sp
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on eret context synchronization
+	 * when returning from IPI handler, and when returning to user-space.
+	 */
 	eret					// return to kernel
 	.endm
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH for 4.16 10/10] membarrier: selftest: Test private expedited sync core cmd
  2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (8 preceding siblings ...)
  2018-01-15 19:11 ` [PATCH for 4.16 09/10] membarrier: arm64: Provide core serializing command Mathieu Desnoyers
@ 2018-01-15 19:11 ` Mathieu Desnoyers
  9 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-15 19:11 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, Thomas Gleixner, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86, Mathieu Desnoyers, Greg Kroah-Hartman,
	Alan Stern, Alice Ferrazzi, Paul Elder, linux-kselftest,
	linux-arch

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 73 ++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index bb9c58072c5c..d9ab8b6ee52e 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,63 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_private_expedited_sync_core_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier_register_shared_expedited_success(void)
 {
 	int cmd = MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED, flags = 0;
@@ -188,6 +245,22 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+	if (status < 0) {
+		ksft_test_result_fail("sys_membarrier() failed\n");
+		return status;
+	}
+	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+		status = test_membarrier_private_expedited_sync_core_fail();
+		if (status)
+			return status;
+		status = test_membarrier_register_private_expedited_sync_core_success();
+		if (status)
+			return status;
+		status = test_membarrier_private_expedited_sync_core_success();
+		if (status)
+			return status;
+	}
 	/*
 	 * It is valid to send a shared membarrier from a non-registered
 	 * process.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2)
  2018-01-15 19:10 ` [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2) Mathieu Desnoyers
@ 2018-01-16 18:20   ` Thomas Gleixner
  2018-01-16 19:02     ` Mathieu Desnoyers
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2018-01-16 18:20 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E . McKenney, Boqun Feng, Andrew Hunter,
	Maged Michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H . Peter Anvin,
	Andrea Parri, Russell King, Greg Hackmann, Will Deacon,
	David Sehr, Linus Torvalds, x86

On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:
> +static int membarrier_shared_expedited(void)
> +{
> +	int cpu;
> +	bool fallback = false;
> +	cpumask_var_t tmpmask;
> +
> +	if (num_online_cpus() == 1)
> +		return 0;
> +
> +	/*
> +	 * Matches memory barriers around rq->curr modification in
> +	 * scheduler.
> +	 */
> +	smp_mb();	/* system call entry is not a mb. */
> +
> +	/*
> +	 * Expedited membarrier commands guarantee that they won't
> +	 * block, hence the GFP_NOWAIT allocation flag and fallback
> +	 * implementation.
> +	 */
> +	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
> +		/* Fallback for OOM. */
> +		fallback = true;
> +	}
> +
> +	cpus_read_lock();
> +	for_each_online_cpu(cpu) {
> +		struct task_struct *p;
> +
> +		/*
> +		 * Skipping the current CPU is OK even through we can be
> +		 * migrated at any point. The current CPU, at the point
> +		 * where we read raw_smp_processor_id(), is ensured to
> +		 * be in program order with respect to the caller
> +		 * thread. Therefore, we can skip this CPU from the
> +		 * iteration.
> +		 */
> +		if (cpu == raw_smp_processor_id())
> +			continue;
> +		rcu_read_lock();
> +		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
> +		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
> +				   MEMBARRIER_STATE_SHARED_EXPEDITED)) {

This does not make sense vs. the documentation:

> + * @MEMBARRIER_CMD_SHARED_EXPEDITED:
> + *                          Execute a memory barrier on all running threads
> + *                          part of a process which previously registered
> + *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

This should say:

> + *                          Execute a memory barrier on all running threads
> + *                          of all processes which previously registered
> + *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

And I really have to ask whether this should be named _GLOBAL_ instead of
_SHARED_.

Hmm?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2)
  2018-01-15 19:11 ` [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2) Mathieu Desnoyers
@ 2018-01-16 18:28   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2018-01-16 18:28 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E . McKenney, Boqun Feng, Andrew Hunter,
	Maged Michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H . Peter Anvin,
	Andrea Parri, Russell King, Greg Hackmann, Will Deacon,
	David Sehr, Linus Torvalds, x86, linux-arch

On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:

> Introduce an architecture function that ensures the current CPU
> issues a core serializing instruction before returning to usermode.
> 
> This is needed for the membarrier "sync_core" command.
> 
> Architectures defining the sync_core_before_usermode() static inline
> need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3)
  2018-01-15 19:11 ` [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3) Mathieu Desnoyers
@ 2018-01-16 18:29   ` Thomas Gleixner
  2018-01-16 19:22     ` Mathieu Desnoyers
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2018-01-16 18:29 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E . McKenney, Boqun Feng, Andrew Hunter,
	Maged Michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H . Peter Anvin,
	Andrea Parri, Russell King, Greg Hackmann, Will Deacon,
	David Sehr, Linus Torvalds, x86, linux-arch

On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:

> There are two places where core serialization is needed by membarrier:
> 
> 1) When returning from the membarrier IPI,
> 2) After scheduler updates curr to a thread with a different mm, before
>    going back to user-space, since the curr->mm is used by membarrier to
>    check whether it needs to send an IPI to that CPU.

This wants to be split into x86 and core changes. Ideally you make the core
changes before the previous patch and add the empty inline into
linux/processor.h....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2)
  2018-01-16 18:20   ` Thomas Gleixner
@ 2018-01-16 19:02     ` Mathieu Desnoyers
  2018-01-16 19:04       ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-16 19:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
	Will Deacon, David Sehr, Linus Torvalds, x86

----- On Jan 16, 2018, at 1:20 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:
>> +static int membarrier_shared_expedited(void)
>> +{
>> +	int cpu;
>> +	bool fallback = false;
>> +	cpumask_var_t tmpmask;
>> +
>> +	if (num_online_cpus() == 1)
>> +		return 0;
>> +
>> +	/*
>> +	 * Matches memory barriers around rq->curr modification in
>> +	 * scheduler.
>> +	 */
>> +	smp_mb();	/* system call entry is not a mb. */
>> +
>> +	/*
>> +	 * Expedited membarrier commands guarantee that they won't
>> +	 * block, hence the GFP_NOWAIT allocation flag and fallback
>> +	 * implementation.
>> +	 */
>> +	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
>> +		/* Fallback for OOM. */
>> +		fallback = true;
>> +	}
>> +
>> +	cpus_read_lock();
>> +	for_each_online_cpu(cpu) {
>> +		struct task_struct *p;
>> +
>> +		/*
>> +		 * Skipping the current CPU is OK even through we can be
>> +		 * migrated at any point. The current CPU, at the point
>> +		 * where we read raw_smp_processor_id(), is ensured to
>> +		 * be in program order with respect to the caller
>> +		 * thread. Therefore, we can skip this CPU from the
>> +		 * iteration.
>> +		 */
>> +		if (cpu == raw_smp_processor_id())
>> +			continue;
>> +		rcu_read_lock();
>> +		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> +		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
>> +				   MEMBARRIER_STATE_SHARED_EXPEDITED)) {
> 
> This does not make sense vs. the documentation:
> 
>> + * @MEMBARRIER_CMD_SHARED_EXPEDITED:
>> + *                          Execute a memory barrier on all running threads
>> + *                          part of a process which previously registered
>> + *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.
> 
> This should say:
> 
>> + *                          Execute a memory barrier on all running threads
>> + *                          of all processes which previously registered
>> + *                          with MEMBARRIER_CMD_REGISTER_SHARED_EXPEDITED.

Good point, will fix.

> 
> And I really have to ask whether this should be named _GLOBAL_ instead of
> _SHARED_.
> 
> Hmm?

I agree with you that this behavior fits better a "global" definition
than a "shared" one, especially given that it does not target a specific
shared memory mapping. The main issue I have is due to the pre-existing
MEMBARRIER_CMD_SHARED introduced in Linux 4.3. That one should also have
been called "MEMBARRIER_CMD_GLOBAL" based on the current line of thoughts.

Do you envision a way to transition forward to a new "MEMBARRIER_CMD_GLOBAL" for
the currently existing MEMBARRIER_CMD_SHARED ?

Perhaps with a duplicated enum entry ?

enum membarrier_cmd {
        MEMBARRIER_CMD_QUERY                                    = 0,
        MEMBARRIER_CMD_SHARED                                   = (1 << 0), /* use MEMBARRIER_CMD_GLOBAL instead */
        MEMBARRIER_CMD_GLOBAL                                   = (1 << 0),
[...]
};

Thanks,

Mathieu

> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2)
  2018-01-16 19:02     ` Mathieu Desnoyers
@ 2018-01-16 19:04       ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2018-01-16 19:04 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
	Will Deacon, David Sehr, Linus Torvalds, x86

On Tue, 16 Jan 2018, Mathieu Desnoyers wrote:
> ----- On Jan 16, 2018, at 1:20 PM, Thomas Gleixner tglx@linutronix.de wrote:
> > And I really have to ask whether this should be named _GLOBAL_ instead of
> > _SHARED_.
> > 
> > Hmm?
> 
> I agree with you that this behavior fits better a "global" definition
> than a "shared" one, especially given that it does not target a specific
> shared memory mapping. The main issue I have is due to the pre-existing
> MEMBARRIER_CMD_SHARED introduced in Linux 4.3. That one should also have
> been called "MEMBARRIER_CMD_GLOBAL" based on the current line of thoughts.
> 
> Do you envision a way to transition forward to a new "MEMBARRIER_CMD_GLOBAL" for
> the currently existing MEMBARRIER_CMD_SHARED ?
> 
> Perhaps with a duplicated enum entry ?
> 
> enum membarrier_cmd {
>         MEMBARRIER_CMD_QUERY                                    = 0,
>         MEMBARRIER_CMD_SHARED                                   = (1 << 0), /* use MEMBARRIER_CMD_GLOBAL instead */
>         MEMBARRIER_CMD_GLOBAL                                   = (1 << 0),

That should work. Though I doubt that you ever can get rid of CMD_SHARED,
but at least the code is clearer that way.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3)
  2018-01-16 18:29   ` Thomas Gleixner
@ 2018-01-16 19:22     ` Mathieu Desnoyers
  2018-01-16 20:41       ` Mathieu Desnoyers
  0 siblings, 1 reply; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-16 19:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
	Will Deacon, David Sehr, Linus Torvalds, x86, linux-arch

----- On Jan 16, 2018, at 1:29 PM, Thomas Gleixner tglx@linutronix.de wrote:

> On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:
> 
>> There are two places where core serialization is needed by membarrier:
>> 
>> 1) When returning from the membarrier IPI,
>> 2) After scheduler updates curr to a thread with a different mm, before
>>    going back to user-space, since the curr->mm is used by membarrier to
>>    check whether it needs to send an IPI to that CPU.
> 
> This wants to be split into x86 and core changes. Ideally you make the core
> changes before the previous patch and add the empty inline into
> linux/processor.h....

Good point, done. The first commit introducing the new command now also
introduces the generic stuff moved from the x86 patches.

Thanks,

Mathieu

> 
> Thanks,
> 
> 	tglx

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3)
  2018-01-16 19:22     ` Mathieu Desnoyers
@ 2018-01-16 20:41       ` Mathieu Desnoyers
  0 siblings, 0 replies; 18+ messages in thread
From: Mathieu Desnoyers @ 2018-01-16 20:41 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, linux-api,
	Andy Lutomirski, Paul E. McKenney, Boqun Feng, Andrew Hunter,
	maged michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann,
	Will Deacon, David Sehr, Linus Torvalds, x86, linux-arch

----- On Jan 16, 2018, at 2:22 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jan 16, 2018, at 1:29 PM, Thomas Gleixner tglx@linutronix.de wrote:
> 
>> On Mon, 15 Jan 2018, Mathieu Desnoyers wrote:
>> 
>>> There are two places where core serialization is needed by membarrier:
>>> 
>>> 1) When returning from the membarrier IPI,
>>> 2) After scheduler updates curr to a thread with a different mm, before
>>>    going back to user-space, since the curr->mm is used by membarrier to
>>>    check whether it needs to send an IPI to that CPU.
>> 
>> This wants to be split into x86 and core changes. Ideally you make the core
>> changes before the previous patch and add the empty inline into
>> linux/processor.h....
> 
> Good point, done. The first commit introducing the new command now also
> introduces the generic stuff moved from the x86 patches.

Scratch this: it's cleaner if I add a separate generic patch to introduce
just the empty inline into linux/processor.h and the
ARCH_HAS_SYNC_CORE_BEFORE_USERMODE in init/Kconfig.

Thanks,

Mathieu


> 
> Thanks,
> 
> Mathieu
> 
>> 
>> Thanks,
>> 
>> 	tglx
> 
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-01-16 20:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-15 19:10 [PATCH for 4.16 00/10] membarrier updates for 4.16 Mathieu Desnoyers
2018-01-15 19:10 ` [PATCH for 4.16 01/10] membarrier: selftest: Test private expedited cmd (v2) Mathieu Desnoyers
2018-01-15 19:10 ` [PATCH for 4.16 02/10] powerpc: membarrier: Skip memory barrier in switch_mm() (v7) Mathieu Desnoyers
2018-01-15 19:10 ` [PATCH for 4.16 03/10] membarrier: Document scheduler barrier requirements (v5) Mathieu Desnoyers
2018-01-15 19:10 ` [PATCH for 4.16 04/10] membarrier: provide SHARED_EXPEDITED command (v2) Mathieu Desnoyers
2018-01-16 18:20   ` Thomas Gleixner
2018-01-16 19:02     ` Mathieu Desnoyers
2018-01-16 19:04       ` Thomas Gleixner
2018-01-15 19:10 ` [PATCH for 4.16 05/10] membarrier: selftest: Test shared expedited cmd Mathieu Desnoyers
2018-01-15 19:11 ` [PATCH for 4.16 06/10] membarrier: Provide core serializing command Mathieu Desnoyers
2018-01-15 19:11 ` [PATCH for 4.16 07/10] x86: Introduce sync_core_before_usermode (v2) Mathieu Desnoyers
2018-01-16 18:28   ` Thomas Gleixner
2018-01-15 19:11 ` [PATCH for 4.16 08/10] membarrier: x86: Provide core serializing command (v3) Mathieu Desnoyers
2018-01-16 18:29   ` Thomas Gleixner
2018-01-16 19:22     ` Mathieu Desnoyers
2018-01-16 20:41       ` Mathieu Desnoyers
2018-01-15 19:11 ` [PATCH for 4.16 09/10] membarrier: arm64: Provide core serializing command Mathieu Desnoyers
2018-01-15 19:11 ` [PATCH for 4.16 10/10] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).