linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 4.16 00/11] membarrier updates for 4.16
@ 2018-01-29 20:20 Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v2 01/11] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86-DgEjT+Ai2ygdnm+yROfE0A, Mathieu Desnoyers

Hi Ingo, Peter, Thomas,

Here is the updated membarrier patch series following Acked-by from
Peter.

It would be appreciated of those can go through the scheduler tree for
the 4.16 merge window.

Highlights:

"powerpc: membarrier: Skip memory barrier in switch_mm()" takes care of
a TODO that was left in the private expedited implementation when merged
in 4.14: an extra memory barrier was added on context switch on powerpc.
Ensure that the barrier is only performed when scheduling between
different processes, only for threads belonging to processes that have
registered their intent to use the private expedited command.

"membarrier: provide GLOBAL_EXPEDITED command" adds new commands to
membarrier for registration and use of membarrier across processes
communicating through shared memory mappings. The non-expedited command
has proven to be really too slow (taking 10ms and more to complete) for
real-world use. The expedited version completes in a matter of
microseconds. This patch renames the pre-existing MEMBARRIER_CMD_SHARED
to MEMBARRIER_CMD_GLOBAL for consistency, keeping the old enum label
as an alias for backward compatibility.

"membarrier: Provide core serializing command" provides core
serialization for JIT reclaim. We received positive feedback from
Android developers that the proposed ABI fits their use-case.
Only x86 32/64 and arm 64 implement this command so far. This is
opt-in per architecture.

The other patches add selftests and documentation.

Thanks,

Mathieu


Mathieu Desnoyers (11):
  membarrier: selftest: Test private expedited cmd (v2)
  powerpc: membarrier: Skip memory barrier in switch_mm() (v7)
  membarrier: Document scheduler barrier requirements (v5)
  membarrier: provide GLOBAL_EXPEDITED command (v3)
  membarrier: selftest: Test global expedited cmd (v2)
  Introduce sync_core_before_usermode (v2)
  x86: Implement sync_core_before_usermode (v3)
  membarrier: Provide core serializing command (v3)
  membarrier: x86: Provide core serializing command (v4)
  membarrier: arm64: Provide core serializing command
  membarrier: selftest: Test private expedited sync core cmd

 MAINTAINERS                                        |   1 +
 arch/arm64/Kconfig                                 |   1 +
 arch/arm64/kernel/entry.S                          |   4 +
 arch/powerpc/Kconfig                               |   1 +
 arch/powerpc/include/asm/membarrier.h              |  27 +++
 arch/powerpc/mm/mmu_context.c                      |   7 +
 arch/x86/Kconfig                                   |   2 +
 arch/x86/entry/entry_32.S                          |   5 +
 arch/x86/entry/entry_64.S                          |   4 +
 arch/x86/include/asm/sync_core.h                   |  28 +++
 arch/x86/mm/tlb.c                                  |   6 +
 include/linux/sched/mm.h                           |  40 +++-
 include/linux/sync_core.h                          |  21 ++
 include/uapi/linux/membarrier.h                    |  74 ++++++-
 init/Kconfig                                       |   9 +
 kernel/sched/core.c                                |  57 +++--
 kernel/sched/membarrier.c                          | 177 +++++++++++++--
 .../testing/selftests/membarrier/membarrier_test.c | 237 +++++++++++++++++++--
 18 files changed, 633 insertions(+), 68 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h
 create mode 100644 arch/x86/include/asm/sync_core.h
 create mode 100644 include/linux/sync_core.h

-- 
2.11.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v2 01/11] membarrier: selftest: Test private expedited cmd
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED commands.

Add checks expecting specific error values on system calls expected to
fail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- return result of ksft_exit_pass from main(), silencing compiler
  warning about missing return value.
---
 .../testing/selftests/membarrier/membarrier_test.c | 111 ++++++++++++++++++---
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index 9e674d9514d1..e6ee73d01fa1 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -16,49 +16,119 @@ static int sys_membarrier(int cmd, int flags)
 static int test_membarrier_cmd_fail(void)
 {
 	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier invalid command test: command = %d, flags = %d. Should fail, but passed\n",
-			cmd, flags);
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier invalid command test: command = %d, flags = %d. Failed as expected\n",
-		cmd, flags);
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
 	return 0;
 }
 
 static int test_membarrier_flags_fail(void)
 {
 	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
 
 	if (sys_membarrier(cmd, flags) != -1) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Should fail, but passed\n",
-			flags);
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_QUERY invalid flags test: flags = %d. Failed as expected\n",
-		flags);
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
 	return 0;
 }
 
-static int test_membarrier_success(void)
+static int test_membarrier_shared_success(void)
 {
 	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED\n";
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
-			"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-			flags);
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
 	}
 
 	ksft_test_result_pass(
-		"sys membarrier MEMBARRIER_CMD_SHARED test: flags = %d\n",
-		flags);
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
 	return 0;
 }
 
@@ -72,7 +142,16 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_success();
+	status = test_membarrier_shared_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
 	return 0;
@@ -108,5 +187,5 @@ int main(int argc, char **argv)
 	test_membarrier_query();
 	test_membarrier();
 
-	ksft_exit_pass();
+	return ksft_exit_pass();
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v2 01/11] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-02-05 20:22   ` Ingo Molnar
  2021-06-18 17:13   ` Christophe Leroy
  2018-01-29 20:20 ` [PATCH for 4.16 v5 03/11] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: Maged Michael, Dave Watson, Will Deacon, Russell King,
	David Sehr, Paul Mackerras, H . Peter Anvin, linux-arch, x86,
	Andrew Hunter, Greg Hackmann, Alan Stern, Paul E . McKenney,
	Andrea Parri, Avi Kivity, Boqun Feng, linuxppc-dev,
	Nicholas Piggin, Mathieu Desnoyers, Alexander Viro,
	Andy Lutomirski, linux-api, linux-kernel, Linus Torvalds

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Use test_ti_thread_flag(next, ...) instead of test_thread_flag() in
  powerpc membarrier_arch_sched_in(), given that we want to specifically
  check the next thread state.
- Add missing ARCH_HAS_MEMBARRIER_HOOKS in Kconfig.
- Use task_thread_info() to pass thread_info from task to
  *_ti_thread_flag().

Changes since v2:
- Move membarrier_arch_sched_in() call to finish_task_switch().
- Check for NULL t->mm in membarrier_arch_fork().
- Use membarrier_sched_in() in generic code, which invokes the
  arch-specific membarrier_arch_sched_in(). This fixes allnoconfig
  build on PowerPC.
- Move asm/membarrier.h include under CONFIG_MEMBARRIER, fixing
  allnoconfig build on PowerPC.
- Build and runtime tested on PowerPC.

Changes since v3:
- Simply rely on copy_mm() to copy the membarrier_private_expedited mm
  field on fork.
- powerpc: test thread flag instead of reading
  membarrier_private_expedited in membarrier_arch_fork().
- powerpc: skip memory barrier in membarrier_arch_sched_in() if coming
  from kernel thread, since mmdrop() implies a full barrier.
- Set membarrier_private_expedited to 1 only after arch registration
  code, thus eliminating a race where concurrent commands could succeed
  when they should fail if issued concurrently with process
  registration.
- Use READ_ONCE() for membarrier_private_expedited field access in
  membarrier_private_expedited. Matches WRITE_ONCE() performed in
  process registration.

Changes since v4:
- Move powerpc hook from sched_in() to switch_mm(), based on feedback
  from Nicholas Piggin.

Changes since v5:
- Rebase on v4.14-rc6.
- Fold "Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on
  powerpc (v2)"

Changes since v6:
- Rename MEMBARRIER_STATE_SWITCH_MM to MEMBARRIER_STATE_PRIVATE_EXPEDITED.
---
 MAINTAINERS                           |  1 +
 arch/powerpc/Kconfig                  |  1 +
 arch/powerpc/include/asm/membarrier.h | 26 ++++++++++++++++++++++++++
 arch/powerpc/mm/mmu_context.c         |  7 +++++++
 include/linux/sched/mm.h              | 13 ++++++++++++-
 init/Kconfig                          |  3 +++
 kernel/sched/core.c                   | 10 ----------
 kernel/sched/membarrier.c             |  8 ++++++++
 8 files changed, 58 insertions(+), 11 deletions(-)
 create mode 100644 arch/powerpc/include/asm/membarrier.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 845fc25812f1..34c1ecd5a5d1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8929,6 +8929,7 @@ L:	linux-kernel@vger.kernel.org
 S:	Supported
 F:	kernel/sched/membarrier.c
 F:	include/uapi/linux/membarrier.h
+F:	arch/powerpc/include/asm/membarrier.h
 
 MEMORY MANAGEMENT
 L:	linux-mm@kvack.org
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 2ed525a44734..09b02180b8a0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,6 +140,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_PMEM_API                if PPC64
+	select ARCH_HAS_MEMBARRIER_HOOKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
new file mode 100644
index 000000000000..98ff4f1fcf2b
--- /dev/null
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -0,0 +1,26 @@
+#ifndef _ASM_POWERPC_MEMBARRIER_H
+#define _ASM_POWERPC_MEMBARRIER_H
+
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+					     struct mm_struct *next,
+					     struct task_struct *tsk)
+{
+	/*
+	 * Only need the full barrier when switching between processes.
+	 * Barrier when switching from kernel to userspace is not
+	 * required here, given that it is implied by mmdrop(). Barrier
+	 * when switching from userspace to kernel is not needed after
+	 * store to rq->curr.
+	 */
+	if (likely(!(atomic_read(&next->membarrier_state) &
+		     MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		return;
+
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * after storing to rq->curr, before going back to user-space.
+	 */
+	smp_mb();
+}
+
+#endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/mm/mmu_context.c b/arch/powerpc/mm/mmu_context.c
index d60a62bf4fc7..0ab297c4cfad 100644
--- a/arch/powerpc/mm/mmu_context.c
+++ b/arch/powerpc/mm/mmu_context.c
@@ -12,6 +12,7 @@
 
 #include <linux/mm.h>
 #include <linux/cpu.h>
+#include <linux/sched/mm.h>
 
 #include <asm/mmu_context.h>
 
@@ -58,6 +59,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 *
 		 * On the read side the barrier is in pte_xchg(), which orders
 		 * the store to the PTE vs the load of mm_cpumask.
+		 *
+		 * This full barrier is needed by membarrier when switching
+		 * between processes after store to rq->curr, before user-space
+		 * memory accesses.
 		 */
 		smp_mb();
 
@@ -80,6 +85,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 
 	if (new_on_cpu)
 		radix_kvm_prefetch_workaround(next);
+	else
+		membarrier_arch_switch_mm(prev, next, tsk);
 
 	/*
 	 * The actual HW switching method differs between the various
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 3d49b91b674d..1754396795f6 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,14 +215,25 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 #ifdef CONFIG_MEMBARRIER
 enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_SWITCH_MM			= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
 };
 
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#include <asm/membarrier.h>
+#endif
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
+					     struct mm_struct *next,
+					     struct task_struct *tsk)
+{
+}
+#endif
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
diff --git a/init/Kconfig b/init/Kconfig
index a9a2e2c86671..2d118b6adee2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1412,6 +1412,9 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config ARCH_HAS_MEMBARRIER_HOOKS
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a7bf32aabfda..c7e06dfa804b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2653,16 +2653,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	prev_state = prev->state;
 	vtime_task_switch(prev);
 	perf_event_task_sched_in(prev, current);
-	/*
-	 * The membarrier system call requires a full memory barrier
-	 * after storing to rq->curr, before going back to user-space.
-	 *
-	 * TODO: This smp_mb__after_unlock_lock can go away if PPC end
-	 * up adding a full barrier to switch_mm(), or we should figure
-	 * out if a smp_mb__after_unlock_lock is really the proper API
-	 * to use.
-	 */
-	smp_mb__after_unlock_lock();
 	finish_lock_switch(rq, prev);
 	finish_arch_post_lock_switch();
 
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 9bcbacba82a8..678577267a9a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -118,6 +118,14 @@ static void membarrier_register_private_expedited(void)
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
 		return;
+	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
+	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+		/*
+		 * Ensure all future scheduler executions will observe the
+		 * new thread flag state for this process.
+		 */
+		synchronize_sched();
+	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v5 03/11] membarrier: Document scheduler barrier requirements
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v2 01/11] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v2 05/11] membarrier: selftest: Test global expedited cmd Mathieu Desnoyers
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Document the membarrier requirement on having a full memory barrier in
__schedule() after coming from user-space, before storing to rq->curr.
It is provided by smp_mb__after_spinlock() in __schedule().

Document that membarrier requires a full barrier on transition from
kernel thread to userspace thread. We currently have an implicit barrier
from atomic_dec_and_test() in mmdrop() that ensures this.

The x86 switch_mm_irqs_off() full barrier is currently provided by many
cpumask update operations as well as write_cr3(). Document that
write_cr3() provides this barrier.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: x86@kernel.org
---
Changes since v1:
- Update comments to match reality for code paths which are after
  storing to rq->curr, before returning to user-space, based on feedback
  from Andrea Parri.
Changes since v2:
- Update changelog (smp_mb__before_spinlock -> smp_mb__after_spinlock).
  Based on feedback from Andrea Parri.
Changes since v3:
- Clarify comments following feeback from Peter Zijlstra.
Changes since v4:
- Update comment regarding powerpc barrier.
---
 arch/x86/mm/tlb.c        |  5 +++++
 include/linux/sched/mm.h |  5 +++++
 kernel/sched/core.c      | 37 ++++++++++++++++++++++++++-----------
 3 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 5bfe61a5e8e3..9fa7d2e0e15e 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -228,6 +228,11 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 #endif
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
+	/*
+	 * The membarrier system call requires a full memory barrier
+	 * before returning to user-space, after storing to rq->curr.
+	 * Writing to CR3 provides that full memory barrier.
+	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
 			   next->context.ctx_id);
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 1754396795f6..28aef7051d73 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -39,6 +39,11 @@ static inline void mmgrab(struct mm_struct *mm)
 extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
+	/*
+	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * required by the membarrier system call before returning to
+	 * user-space, after storing to rq->curr.
+	 */
 	if (unlikely(atomic_dec_and_test(&mm->mm_count)))
 		__mmdrop(mm);
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c7e06dfa804b..f38c4c7e256a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2657,6 +2657,12 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	finish_arch_post_lock_switch();
 
 	fire_sched_in_preempt_notifiers(current);
+	/*
+	 * When transitioning from a kernel thread to a userspace
+	 * thread, mmdrop()'s implicit full barrier is required by the
+	 * membarrier system call, because the current active_mm can
+	 * become the current mm without going through switch_mm().
+	 */
 	if (mm)
 		mmdrop(mm);
 	if (unlikely(prev_state == TASK_DEAD)) {
@@ -2762,6 +2768,13 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 */
 	arch_start_context_switch(prev);
 
+	/*
+	 * If mm is non-NULL, we pass through switch_mm(). If mm is
+	 * NULL, we will pass through mmdrop() in finish_task_switch().
+	 * Both of these contain the full memory barrier required by
+	 * membarrier after storing to rq->curr, before returning to
+	 * user-space.
+	 */
 	if (!mm) {
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
@@ -3298,6 +3311,9 @@ static void __sched notrace __schedule(bool preempt)
 	 * Make sure that signal_pending_state()->signal_pending() below
 	 * can't be reordered with __set_current_state(TASK_INTERRUPTIBLE)
 	 * done by the caller to avoid the race with signal_wake_up().
+	 *
+	 * The membarrier system call requires a full memory barrier
+	 * after coming from user-space, before storing to rq->curr.
 	 */
 	rq_lock(rq, &rf);
 	smp_mb__after_spinlock();
@@ -3345,17 +3361,16 @@ static void __sched notrace __schedule(bool preempt)
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
-		 * rq->curr, before returning to user-space. For TSO
-		 * (e.g. x86), the architecture must provide its own
-		 * barrier in switch_mm(). For weakly ordered machines
-		 * for which spin_unlock() acts as a full memory
-		 * barrier, finish_lock_switch() in common code takes
-		 * care of this barrier. For weakly ordered machines for
-		 * which spin_unlock() acts as a RELEASE barrier (only
-		 * arm64 and PowerPC), arm64 has a full barrier in
-		 * switch_to(), and PowerPC has
-		 * smp_mb__after_unlock_lock() before
-		 * finish_lock_switch().
+		 * rq->curr, before returning to user-space.
+		 *
+		 * Here are the schemes providing that barrier on the
+		 * various architectures:
+		 * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC.
+		 *   switch_mm() rely on membarrier_arch_switch_mm() on PowerPC.
+		 * - finish_lock_switch() for weakly-ordered
+		 *   architectures where spin_unlock is a full barrier,
+		 * - switch_to() for arm64 (weakly-ordered, spin_unlock
+		 *   is a RELEASE barrier),
 		 */
 		++*switch_count;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v3 04/11] membarrier: provide GLOBAL_EXPEDITED command
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2018-01-29 20:20   ` Mathieu Desnoyers
  2018-01-29 20:20   ` [PATCH for 4.16 v3 08/11] membarrier: Provide core serializing command Mathieu Desnoyers
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86-DgEjT+Ai2ygdnm+yROfE0A, Mathieu Desnoyers

Allow expedited membarrier to be used for data shared between processes
through shared memory.

Processes wishing to receive the membarriers register with
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue
membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This allows extremely simple kernel-level implementation: we have almost
everything we need with the PRIVATE_EXPEDITED barrier code. All we need
to do is to add a flag in the mm_struct that will be used to check
whether we need to send the IPI to the current thread of each CPU.

There is a slight downside to this approach compared to targeting
specific shared memory users: when performing a membarrier operation,
all registered "global" receivers will get the barrier, even if they
don't share a memory mapping with the sender issuing
MEMBARRIER_CMD_GLOBAL_EXPEDITED.

This registration approach seems to fit the requirement of not
disturbing processes that really deeply care about real-time: they
simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.

In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED"
command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of
MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward
compatibility.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Andrea Parri <parri.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
---
Changes since v1:
- Add missing preempt disable around smp_call_function_many().
Changes since v2:
- UAPI documentation fix based on Thomas Gleixner's feedback.
- Rename SHARED to GLOBAL.
---
 arch/powerpc/include/asm/membarrier.h |   3 +-
 include/linux/sched/mm.h              |   6 +-
 include/uapi/linux/membarrier.h       |  42 ++++++++++--
 kernel/sched/membarrier.c             | 120 +++++++++++++++++++++++++++++++---
 4 files changed, 153 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h b/arch/powerpc/include/asm/membarrier.h
index 98ff4f1fcf2b..6e20bb5c74ea 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -13,7 +13,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 	 * store to rq->curr.
 	 */
 	if (likely(!(atomic_read(&next->membarrier_state) &
-		     MEMBARRIER_STATE_PRIVATE_EXPEDITED) || !prev))
+		     (MEMBARRIER_STATE_PRIVATE_EXPEDITED |
+		      MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev))
 		return;
 
 	/*
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 28aef7051d73..c7ad7a70cfef 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -219,8 +219,10 @@ static inline void memalloc_noreclaim_restore(unsigned int flags)
 
 #ifdef CONFIG_MEMBARRIER
 enum {
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY	= (1U << 0),
-	MEMBARRIER_STATE_PRIVATE_EXPEDITED		= (1U << 1),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY		= (1U << 0),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
+	MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY			= (1U << 2),
+	MEMBARRIER_STATE_GLOBAL_EXPEDITED			= (1U << 3),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 4e01ad7ffe98..d252506e1b5e 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -31,7 +31,7 @@
  * enum membarrier_cmd - membarrier system call command
  * @MEMBARRIER_CMD_QUERY:   Query the set of supported commands. It returns
  *                          a bitmask of valid commands.
- * @MEMBARRIER_CMD_SHARED:  Execute a memory barrier on all running threads.
+ * @MEMBARRIER_CMD_GLOBAL:  Execute a memory barrier on all running threads.
  *                          Upon return from system call, the caller thread
  *                          is ensured that all running threads have passed
  *                          through a state where all memory accesses to
@@ -40,6 +40,28 @@
  *                          (non-running threads are de facto in such a
  *                          state). This covers threads from all processes
  *                          running on the system. This command returns 0.
+ * @MEMBARRIER_CMD_GLOBAL_EXPEDITED:
+ *                          Execute a memory barrier on all running threads
+ *                          of all processes which previously registered
+ *                          with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
+ *                          Upon return from system call, the caller thread
+ *                          is ensured that all running threads have passed
+ *                          through a state where all memory accesses to
+ *                          user-space addresses match program order between
+ *                          entry to and return from the system call
+ *                          (non-running threads are de facto in such a
+ *                          state). This only covers threads from processes
+ *                          which registered with
+ *                          MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED.
+ *                          This command returns 0. Given that
+ *                          registration is about the intent to receive
+ *                          the barriers, it is valid to invoke
+ *                          MEMBARRIER_CMD_GLOBAL_EXPEDITED from a
+ *                          non-registered process.
+ * @MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
+ *                          Register the process intent to receive
+ *                          MEMBARRIER_CMD_GLOBAL_EXPEDITED memory
+ *                          barriers. Always returns 0.
  * @MEMBARRIER_CMD_PRIVATE_EXPEDITED:
  *                          Execute a memory barrier on each running
  *                          thread belonging to the same process as the current
@@ -64,18 +86,24 @@
  *                          Register the process intent to use
  *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
  *                          returns 0.
+ * @MEMBARRIER_CMD_SHARED:
+ *                          Alias to MEMBARRIER_CMD_GLOBAL. Provided for
+ *                          header backward compatibility.
  *
  * Command to be passed to the membarrier system call. The commands need to
  * be a single bit each, except for MEMBARRIER_CMD_QUERY which is assigned to
  * the value 0.
  */
 enum membarrier_cmd {
-	MEMBARRIER_CMD_QUERY				= 0,
-	MEMBARRIER_CMD_SHARED				= (1 << 0),
-	/* reserved for MEMBARRIER_CMD_SHARED_EXPEDITED (1 << 1) */
-	/* reserved for MEMBARRIER_CMD_PRIVATE (1 << 2) */
-	MEMBARRIER_CMD_PRIVATE_EXPEDITED		= (1 << 3),
-	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	= (1 << 4),
+	MEMBARRIER_CMD_QUERY					= 0,
+	MEMBARRIER_CMD_GLOBAL					= (1 << 0),
+	MEMBARRIER_CMD_GLOBAL_EXPEDITED				= (1 << 1),
+	MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED		= (1 << 2),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
+
+	/* Alias for header backward compatibility. */
+	MEMBARRIER_CMD_SHARED			= MEMBARRIER_CMD_GLOBAL,
 };
 
 #endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 678577267a9a..d2087d5f9837 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -27,7 +27,9 @@
  * except MEMBARRIER_CMD_QUERY.
  */
 #define MEMBARRIER_CMD_BITMASK	\
-	(MEMBARRIER_CMD_SHARED | MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
+	(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
+	| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
+	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
 	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
 
 static void ipi_mb(void *info)
@@ -35,6 +37,73 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static int membarrier_global_expedited(void)
+{
+	int cpu;
+	bool fallback = false;
+	cpumask_var_t tmpmask;
+
+	if (num_online_cpus() == 1)
+		return 0;
+
+	/*
+	 * Matches memory barriers around rq->curr modification in
+	 * scheduler.
+	 */
+	smp_mb();	/* system call entry is not a mb. */
+
+	/*
+	 * Expedited membarrier commands guarantee that they won't
+	 * block, hence the GFP_NOWAIT allocation flag and fallback
+	 * implementation.
+	 */
+	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
+		/* Fallback for OOM. */
+		fallback = true;
+	}
+
+	cpus_read_lock();
+	for_each_online_cpu(cpu) {
+		struct task_struct *p;
+
+		/*
+		 * Skipping the current CPU is OK even through we can be
+		 * migrated at any point. The current CPU, at the point
+		 * where we read raw_smp_processor_id(), is ensured to
+		 * be in program order with respect to the caller
+		 * thread. Therefore, we can skip this CPU from the
+		 * iteration.
+		 */
+		if (cpu == raw_smp_processor_id())
+			continue;
+		rcu_read_lock();
+		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
+		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
+				   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
+			if (!fallback)
+				__cpumask_set_cpu(cpu, tmpmask);
+			else
+				smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		}
+		rcu_read_unlock();
+	}
+	if (!fallback) {
+		preempt_disable();
+		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+		preempt_enable();
+		free_cpumask_var(tmpmask);
+	}
+	cpus_read_unlock();
+
+	/*
+	 * Memory barrier on the caller thread _after_ we finished
+	 * waiting for the last IPI. Matches memory barriers around
+	 * rq->curr modification in scheduler.
+	 */
+	smp_mb();	/* exit from system call is not a mb */
+	return 0;
+}
+
 static int membarrier_private_expedited(void)
 {
 	int cpu;
@@ -105,7 +174,38 @@ static int membarrier_private_expedited(void)
 	return 0;
 }
 
-static void membarrier_register_private_expedited(void)
+static int membarrier_register_global_expedited(void)
+{
+	struct task_struct *p = current;
+	struct mm_struct *mm = p->mm;
+
+	if (atomic_read(&mm->membarrier_state) &
+	    MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
+		return 0;
+	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
+	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+		/*
+		 * For single mm user, single threaded process, we can
+		 * simply issue a memory barrier after setting
+		 * MEMBARRIER_STATE_GLOBAL_EXPEDITED to guarantee that
+		 * no memory access following registration is reordered
+		 * before registration.
+		 */
+		smp_mb();
+	} else {
+		/*
+		 * For multi-mm user threads, we need to ensure all
+		 * future scheduler executions will observe the new
+		 * thread flag state for this mm.
+		 */
+		synchronize_sched();
+	}
+	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY,
+		  &mm->membarrier_state);
+	return 0;
+}
+
+static int membarrier_register_private_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
@@ -117,7 +217,7 @@ static void membarrier_register_private_expedited(void)
 	 */
 	if (atomic_read(&mm->membarrier_state)
 			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
-		return;
+		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
 		/*
@@ -128,6 +228,7 @@ static void membarrier_register_private_expedited(void)
 	}
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
 			&mm->membarrier_state);
+	return 0;
 }
 
 /**
@@ -167,21 +268,24 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 		int cmd_mask = MEMBARRIER_CMD_BITMASK;
 
 		if (tick_nohz_full_enabled())
-			cmd_mask &= ~MEMBARRIER_CMD_SHARED;
+			cmd_mask &= ~MEMBARRIER_CMD_GLOBAL;
 		return cmd_mask;
 	}
-	case MEMBARRIER_CMD_SHARED:
-		/* MEMBARRIER_CMD_SHARED is not compatible with nohz_full. */
+	case MEMBARRIER_CMD_GLOBAL:
+		/* MEMBARRIER_CMD_GLOBAL is not compatible with nohz_full. */
 		if (tick_nohz_full_enabled())
 			return -EINVAL;
 		if (num_online_cpus() > 1)
 			synchronize_sched();
 		return 0;
+	case MEMBARRIER_CMD_GLOBAL_EXPEDITED:
+		return membarrier_global_expedited();
+	case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
+		return membarrier_register_global_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
 		return membarrier_private_expedited();
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		membarrier_register_private_expedited();
-		return 0;
+		return membarrier_register_private_expedited();
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v2 05/11] membarrier: selftest: Test global expedited cmd
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2018-01-29 20:20 ` [PATCH for 4.16 v5 03/11] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v2 06/11] Introduce sync_core_before_usermode Mathieu Desnoyers
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Test the new MEMBARRIER_CMD_GLOBAL_EXPEDITED and
MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED commands.

Adapt to the MEMBARRIER_CMD_SHARED -> MEMBARRIER_CMD_GLOBAL rename.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Alan Stern <stern@rowland.harvard.edu>
CC: Will Deacon <will.deacon@arm.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Alice Ferrazzi <alice.ferrazzi@gmail.com>
CC: Paul Elder <paul.elder@pitt.edu>
CC: linux-kselftest@vger.kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Rename SHARED to GLOBAL.
---
 .../testing/selftests/membarrier/membarrier_test.c | 59 ++++++++++++++++++++--
 1 file changed, 54 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index e6ee73d01fa1..eec39a0e1f77 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -59,10 +59,10 @@ static int test_membarrier_flags_fail(void)
 	return 0;
 }
 
-static int test_membarrier_shared_success(void)
+static int test_membarrier_global_success(void)
 {
-	int cmd = MEMBARRIER_CMD_SHARED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_SHARED";
+	int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL";
 
 	if (sys_membarrier(cmd, flags) != 0) {
 		ksft_exit_fail_msg(
@@ -132,6 +132,40 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_register_global_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_global_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier(void)
 {
 	int status;
@@ -142,7 +176,7 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_shared_success();
+	status = test_membarrier_global_success();
 	if (status)
 		return status;
 	status = test_membarrier_private_expedited_fail();
@@ -154,6 +188,19 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	/*
+	 * It is valid to send a global membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_global_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_global_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_global_expedited_success();
+	if (status)
+		return status;
 	return 0;
 }
 
@@ -173,8 +220,10 @@ static int test_membarrier_query(void)
 		}
 		ksft_exit_fail_msg("sys_membarrier() failed\n");
 	}
-	if (!(ret & MEMBARRIER_CMD_SHARED))
+	if (!(ret & MEMBARRIER_CMD_GLOBAL)) {
+		ksft_test_result_fail("sys_membarrier() CMD_GLOBAL query failed\n");
 		ksft_exit_fail_msg("sys_membarrier is not supported.\n");
+	}
 
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v2 06/11] Introduce sync_core_before_usermode
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2018-01-29 20:20 ` [PATCH for 4.16 v2 05/11] membarrier: selftest: Test global expedited cmd Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 v3 07/11] x86: Implement sync_core_before_usermode Mathieu Desnoyers
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Introduce an architecture function that ensures the current CPU
issues a core serializing instruction before returning to usermode.

This is needed for the membarrier "sync_core" command.

Architectures defining the sync_core_before_usermode() static inline
need to select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Arnd Bergmann <arnd@arndb.de>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Introduce include/linux/sync_core.h
---
 include/linux/sync_core.h | 21 +++++++++++++++++++++
 init/Kconfig              |  3 +++
 2 files changed, 24 insertions(+)
 create mode 100644 include/linux/sync_core.h

diff --git a/include/linux/sync_core.h b/include/linux/sync_core.h
new file mode 100644
index 000000000000..013da4b8b327
--- /dev/null
+++ b/include/linux/sync_core.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SYNC_CORE_H
+#define _LINUX_SYNC_CORE_H
+
+#ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
+#include <asm/sync_core.h>
+#else
+/*
+ * This is a dummy sync_core_before_usermode() implementation that can be used
+ * on all architectures which return to user-space through core serializing
+ * instructions.
+ * If your architecture returns to user-space through non-core-serializing
+ * instructions, you need to write your own functions.
+ */
+static inline void sync_core_before_usermode(void)
+{
+}
+#endif
+
+#endif /* _LINUX_SYNC_CORE_H */
+
diff --git a/init/Kconfig b/init/Kconfig
index 2d118b6adee2..30208da2221f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1918,3 +1918,6 @@ config ASN1
 	  functions to call on what tags.
 
 source "kernel/Kconfig.locks"
+
+config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
+	bool
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v3 07/11] x86: Implement sync_core_before_usermode
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2018-01-29 20:20 ` [PATCH for 4.16 v2 06/11] Introduce sync_core_before_usermode Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Ensure that a core serializing instruction is issued before returning to
user-mode. x86 implements return to user-space through sysexit, sysrel,
and sysretq, which are not core serializing.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Arnd Bergmann <arnd@arndb.de>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Fix prototype of sync_core_before_usermode in generic code (missing
  return type).
- Add linux/processor.h include to sched/core.c.
- Add ARCH_HAS_SYNC_CORE_BEFORE_USERMODE to init/Kconfig.
- Fix linux/processor.h ifdef to target
  CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE rather than
  ARCH_HAS_SYNC_CORE_BEFORE_USERMODE.
- Move empty static inline in processor.h to generic patch.

Changes since v2:
- Introduce arch/x86/include/asm/sync_core.h
- Don't sync_core when KPTI is enabled, and when invoked from irq and nmi
  context.
- Note: v2 was reviewed by Thomas Gleixner, but changes were introduced
  since.
---
 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/sync_core.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 29 insertions(+)
 create mode 100644 arch/x86/include/asm/sync_core.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 20da391b5f32..0b44c8dd0e95 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -61,6 +61,7 @@ config X86
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
 	select ARCH_HAS_STRICT_MODULE_RWX
+	select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARCH_HAS_ZONE_DEVICE		if X86_64
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG
diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
new file mode 100644
index 000000000000..c67caafd3381
--- /dev/null
+++ b/arch/x86/include/asm/sync_core.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_SYNC_CORE_H
+#define _ASM_X86_SYNC_CORE_H
+
+#include <linux/preempt.h>
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+
+/*
+ * Ensure that a core serializing instruction is issued before returning
+ * to user-mode. x86 implements return to user-space through sysexit,
+ * sysrel, and sysretq, which are not core serializing.
+ */
+static inline void sync_core_before_usermode(void)
+{
+	/* With PTI, we unconditionally serialize before running user code. */
+	if (static_cpu_has(X86_FEATURE_PTI))
+		return;
+	/*
+	 * Return from interrupt and NMI is done through iret, which is core
+	 * serializing.
+	 */
+	if (in_irq() || in_nmi())
+		return;
+	sync_core();
+}
+
+#endif /* _ASM_X86_SYNC_CORE_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v3 08/11] membarrier: Provide core serializing command
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2018-01-29 20:20   ` [PATCH for 4.16 v3 04/11] membarrier: provide GLOBAL_EXPEDITED command Mathieu Desnoyers
@ 2018-01-29 20:20   ` Mathieu Desnoyers
  2018-01-29 20:20   ` [PATCH for 4.16 11/11] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers
  2018-02-05 21:21   ` [PATCH for 4.16 00/11] membarrier updates for 4.16 Ingo Molnar
  3 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86-DgEjT+Ai2ygdnm+yROfE0A, Mathieu Desnoyers

Provide core serializing membarrier command to support memory reclaim
by JIT.

Each architecture needs to explicitly opt into that support by
documenting in their architecture code how they provide the core
serializing instructions required when returning from the membarrier
IPI, and after the scheduler has updated the curr->mm pointer (before
going back to user-space). They should then select
ARCH_HAS_MEMBARRIER_SYNC_CORE to enable support for that command on
their architecture.

Architectures selecting this feature need to either document that
they issue core serializing instructions when returning to user-space,
or implement their architecture-specific sync_core_before_usermode().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Andrea Parri <parri.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Russell King <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org>
CC: Greg Hackmann <ghackmann-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: David Sehr <sehr-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Changes since v1:
- Include linux/sync_core.h

Changes since v2:
- Rework a comment and changelog based on Peter Zijlstra's feedback.
---
 include/linux/sched/mm.h        | 18 ++++++++++++++
 include/uapi/linux/membarrier.h | 32 ++++++++++++++++++++++++-
 init/Kconfig                    |  3 +++
 kernel/sched/core.c             | 18 ++++++++++----
 kernel/sched/membarrier.c       | 53 +++++++++++++++++++++++++++++++----------
 5 files changed, 106 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index c7ad7a70cfef..a7840f0f8832 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -7,6 +7,7 @@
 #include <linux/sched.h>
 #include <linux/mm_types.h>
 #include <linux/gfp.h>
+#include <linux/sync_core.h>
 
 /*
  * Routines for handling mm_structs
@@ -223,12 +224,26 @@ enum {
 	MEMBARRIER_STATE_PRIVATE_EXPEDITED			= (1U << 1),
 	MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY			= (1U << 2),
 	MEMBARRIER_STATE_GLOBAL_EXPEDITED			= (1U << 3),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY	= (1U << 4),
+	MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE		= (1U << 5),
+};
+
+enum {
+	MEMBARRIER_FLAG_SYNC_CORE	= (1U << 0),
 };
 
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
 #include <asm/membarrier.h>
 #endif
 
+static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
+{
+	if (likely(!(atomic_read(&mm->membarrier_state) &
+		     MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
+		return;
+	sync_core_before_usermode();
+}
+
 static inline void membarrier_execve(struct task_struct *t)
 {
 	atomic_set(&t->mm->membarrier_state, 0);
@@ -244,6 +259,9 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 static inline void membarrier_execve(struct task_struct *t)
 {
 }
+static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
+{
+}
 #endif
 
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index d252506e1b5e..5891d7614c8c 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -73,7 +73,7 @@
  *                          to and return from the system call
  *                          (non-running threads are de facto in such a
  *                          state). This only covers threads from the
- *                          same processes as the caller thread. This
+ *                          same process as the caller thread. This
  *                          command returns 0 on success. The
  *                          "expedited" commands complete faster than
  *                          the non-expedited ones, they never block,
@@ -86,6 +86,34 @@
  *                          Register the process intent to use
  *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED. Always
  *                          returns 0.
+ * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
+ *                          In addition to provide memory ordering
+ *                          guarantees described in
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED, ensure
+ *                          the caller thread, upon return from system
+ *                          call, that all its running threads siblings
+ *                          have executed a core serializing
+ *                          instruction. (architectures are required to
+ *                          guarantee that non-running threads issue
+ *                          core serializing instructions before they
+ *                          resume user-space execution). This only
+ *                          covers threads from the same process as the
+ *                          caller thread. This command returns 0 on
+ *                          success. The "expedited" commands complete
+ *                          faster than the non-expedited ones, they
+ *                          never block, but have the downside of
+ *                          causing extra overhead. If this command is
+ *                          not implemented by an architecture, -EINVAL
+ *                          is returned. A process needs to register its
+ *                          intent to use the private expedited sync
+ *                          core command prior to using it, otherwise
+ *                          this command returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
+ *                          Register the process intent to use
+ *                          MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE.
+ *                          If this command is not implemented by an
+ *                          architecture, -EINVAL is returned.
+ *                          Returns 0 on success.
  * @MEMBARRIER_CMD_SHARED:
  *                          Alias to MEMBARRIER_CMD_GLOBAL. Provided for
  *                          header backward compatibility.
@@ -101,6 +129,8 @@ enum membarrier_cmd {
 	MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED		= (1 << 2),
 	MEMBARRIER_CMD_PRIVATE_EXPEDITED			= (1 << 3),
 	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED		= (1 << 4),
+	MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE		= (1 << 5),
+	MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE	= (1 << 6),
 
 	/* Alias for header backward compatibility. */
 	MEMBARRIER_CMD_SHARED			= MEMBARRIER_CMD_GLOBAL,
diff --git a/init/Kconfig b/init/Kconfig
index 30208da2221f..30b65febeb23 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1415,6 +1415,9 @@ config USERFAULTFD
 config ARCH_HAS_MEMBARRIER_HOOKS
 	bool
 
+config ARCH_HAS_MEMBARRIER_SYNC_CORE
+	bool
+
 config EMBEDDED
 	bool "Embedded system"
 	option allnoconfig_y
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f38c4c7e256a..48c3c1b83354 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2658,13 +2658,21 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 
 	fire_sched_in_preempt_notifiers(current);
 	/*
-	 * When transitioning from a kernel thread to a userspace
-	 * thread, mmdrop()'s implicit full barrier is required by the
-	 * membarrier system call, because the current active_mm can
-	 * become the current mm without going through switch_mm().
+	 * When switching through a kernel thread, the loop in
+	 * membarrier_{private,global}_expedited() may have observed that
+	 * kernel thread and not issued an IPI. It is therefore possible to
+	 * schedule between user->kernel->user threads without passing though
+	 * switch_mm(). Membarrier requires a barrier after storing to
+	 * rq->curr, before returning to userspace, so provide them here:
+	 *
+	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
+	 *   provided by mmdrop(),
+	 * - a sync_core for SYNC_CORE.
 	 */
-	if (mm)
+	if (mm) {
+		membarrier_mm_sync_core_before_usermode(mm);
 		mmdrop(mm);
+	}
 	if (unlikely(prev_state == TASK_DEAD)) {
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index d2087d5f9837..5d0762633639 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -26,11 +26,20 @@
  * Bitmask made from a "or" of all commands within enum membarrier_cmd,
  * except MEMBARRIER_CMD_QUERY.
  */
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
+#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	\
+	(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE \
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE)
+#else
+#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK	0
+#endif
+
 #define MEMBARRIER_CMD_BITMASK	\
 	(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
 	| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
 	| MEMBARRIER_CMD_PRIVATE_EXPEDITED	\
-	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED)
+	| MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED	\
+	| MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK)
 
 static void ipi_mb(void *info)
 {
@@ -104,15 +113,23 @@ static int membarrier_global_expedited(void)
 	return 0;
 }
 
-static int membarrier_private_expedited(void)
+static int membarrier_private_expedited(int flags)
 {
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
 
-	if (!(atomic_read(&current->mm->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
-		return -EPERM;
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
+			return -EINVAL;
+		if (!(atomic_read(&current->mm->membarrier_state) &
+		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
+			return -EPERM;
+	} else {
+		if (!(atomic_read(&current->mm->membarrier_state) &
+		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
+			return -EPERM;
+	}
 
 	if (num_online_cpus() == 1)
 		return 0;
@@ -205,20 +222,29 @@ static int membarrier_register_global_expedited(void)
 	return 0;
 }
 
-static int membarrier_register_private_expedited(void)
+static int membarrier_register_private_expedited(int flags)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
+	int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY;
+
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
+			return -EINVAL;
+		state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+	}
 
 	/*
 	 * We need to consider threads belonging to different thread
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if (atomic_read(&mm->membarrier_state)
-			& MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY)
+	if (atomic_read(&mm->membarrier_state) & state)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
+	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
+		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
+			  &mm->membarrier_state);
 	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
 		/*
 		 * Ensure all future scheduler executions will observe the
@@ -226,8 +252,7 @@ static int membarrier_register_private_expedited(void)
 		 */
 		synchronize_sched();
 	}
-	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
-			&mm->membarrier_state);
+	atomic_or(state, &mm->membarrier_state);
 	return 0;
 }
 
@@ -283,9 +308,13 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
 	case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
 		return membarrier_register_global_expedited();
 	case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
-		return membarrier_private_expedited();
+		return membarrier_private_expedited(0);
 	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
-		return membarrier_register_private_expedited();
+		return membarrier_register_private_expedited(0);
+	case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
+		return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+	case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
+		return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
 	default:
 		return -EINVAL;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 v4 09/11] membarrier: x86: Provide core serializing command
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (6 preceding siblings ...)
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  2018-01-29 20:20 ` [PATCH for 4.16 10/11] membarrier: arm64: " Mathieu Desnoyers
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

There are two places where core serialization is needed by membarrier:

1) When returning from the membarrier IPI,
2) After scheduler updates curr to a thread with a different mm, before
   going back to user-space, since the curr->mm is used by membarrier to
   check whether it needs to send an IPI to that CPU.

x86-32 uses iret as return from interrupt, and both iret and sysexit to go
back to user-space. The iret instruction is core serializing, but not
sysexit.

x86-64 uses iret as return from interrupt, which takes care of the IPI.
However, it can return to user-space through either sysretl (compat
code), sysretq, or iret. Given that sysret{l,q} is not core serializing,
we rely instead on write_cr3() performed by switch_mm() to provide core
serialization after changing the current mm, and deal with the special
case of kthread -> uthread (temporarily keeping current mm into
active_mm) by adding a sync_core() in that specific case.

Use the new sync_core_before_usermode() to guarantee this.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
Changes since v1:
- Use the newly introduced sync_core_before_usermode(). Move all state
  handling to generic code.
- Add linux/processor.h include to include/linux/sched/mm.h.
Changes since v2:
- Fix use-after-free in membarrier_mm_sync_core_before_usermode.
Changes since v3:
- Move generic code into separate patch.
---
 arch/x86/Kconfig          | 1 +
 arch/x86/entry/entry_32.S | 5 +++++
 arch/x86/entry/entry_64.S | 4 ++++
 arch/x86/mm/tlb.c         | 7 ++++---
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0b44c8dd0e95..b5324f2e3162 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,6 +54,7 @@ config X86
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_KCOV			if X86_64
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_PMEM_API		if X86_64
 	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 60c4c342316c..267d747a867f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -565,6 +565,11 @@ restore_all:
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * when returning from IPI handler and when returning from
+	 * scheduler to user-space.
+	 */
 	INTERRUPT_RETURN
 
 .section .fixup, "ax"
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index ff6f8022612c..52e20d37cdd3 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -803,6 +803,10 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	POP_EXTRA_REGS
 	POP_C_REGS
 	addq	$8, %rsp	/* skip regs->orig_ax */
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * when returning from IPI handler.
+	 */
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 9fa7d2e0e15e..9b34121c8f05 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -229,9 +229,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	this_cpu_write(cpu_tlbstate.is_lazy, false);
 
 	/*
-	 * The membarrier system call requires a full memory barrier
-	 * before returning to user-space, after storing to rq->curr.
-	 * Writing to CR3 provides that full memory barrier.
+	 * The membarrier system call requires a full memory barrier and
+	 * core serialization before returning to user-space, after
+	 * storing to rq->curr. Writing to CR3 provides that full
+	 * memory barrier and core serializing instruction.
 	 */
 	if (real_prev == next) {
 		VM_WARN_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 10/11] membarrier: arm64: Provide core serializing command
  2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
                   ` (7 preceding siblings ...)
  2018-01-29 20:20 ` [PATCH for 4.16 v4 09/11] membarrier: x86: Provide core serializing command Mathieu Desnoyers
@ 2018-01-29 20:20 ` Mathieu Desnoyers
  8 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
 arch/arm64/Kconfig        | 1 +
 arch/arm64/kernel/entry.S | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..5b0c06d8dbbe 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d14b8f29b5f..5edde1c2e93e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -302,6 +302,10 @@ alternative_else_nop_endif
 	ldp	x28, x29, [sp, #16 * 14]
 	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #S_FRAME_SIZE		// restore sp
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on eret context synchronization
+	 * when returning from IPI handler, and when returning to user-space.
+	 */
 	eret					// return to kernel
 	.endm
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 11/11] membarrier: selftest: Test private expedited sync core cmd
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
  2018-01-29 20:20   ` [PATCH for 4.16 v3 04/11] membarrier: provide GLOBAL_EXPEDITED command Mathieu Desnoyers
  2018-01-29 20:20   ` [PATCH for 4.16 v3 08/11] membarrier: Provide core serializing command Mathieu Desnoyers
@ 2018-01-29 20:20   ` Mathieu Desnoyers
  2018-02-05 21:21   ` [PATCH for 4.16 00/11] membarrier updates for 4.16 Ingo Molnar
  3 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-29 20:20 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David Sehr,
	Linus Torvalds, x86-DgEjT+Ai2ygdnm+yROfE0A, Mathieu Desnoyers

Test the new MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE and
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE commands.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
Acked-by: Shuah Khan <shuahkh-JPH+aEBZ4P+UEJcrhfAQsw@public.gmane.org>
Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Alice Ferrazzi <alice.ferrazzi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Paul Elder <paul.elder-fYq5UfK3d1k@public.gmane.org>
CC: linux-kselftest-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
CC: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 .../testing/selftests/membarrier/membarrier_test.c | 73 ++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
index eec39a0e1f77..22bffd55a523 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test.c
@@ -132,6 +132,63 @@ static int test_membarrier_private_expedited_success(void)
 	return 0;
 }
 
+static int test_membarrier_private_expedited_sync_core_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
 static int test_membarrier_register_global_expedited_success(void)
 {
 	int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
@@ -188,6 +245,22 @@ static int test_membarrier(void)
 	status = test_membarrier_private_expedited_success();
 	if (status)
 		return status;
+	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+	if (status < 0) {
+		ksft_test_result_fail("sys_membarrier() failed\n");
+		return status;
+	}
+	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+		status = test_membarrier_private_expedited_sync_core_fail();
+		if (status)
+			return status;
+		status = test_membarrier_register_private_expedited_sync_core_success();
+		if (status)
+			return status;
+		status = test_membarrier_private_expedited_sync_core_success();
+		if (status)
+			return status;
+	}
 	/*
 	 * It is valid to send a global membarrier from a non-registered
 	 * process.
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2018-01-29 20:20 ` [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
@ 2018-02-05 20:22   ` Ingo Molnar
       [not found]     ` <20180205202242.6ilyjh35byycf3ez-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  2021-06-18 17:13   ` Christophe Leroy
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2018-02-05 20:22 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-api, Andy Lutomirski, Paul E . McKenney, Boqun Feng,
	Andrew Hunter, Maged Michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H . Peter Anvin,
	Andrea Parri, Russell King, Greg Hackmann, Will Deacon, David


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

>  
> +config ARCH_HAS_MEMBARRIER_HOOKS
> +	bool

Yeah, so I have renamed this to ARCH_HAS_MEMBARRIER_CALLBACKS, and propagated it 
through the rest of the patches. "Callback" is the canonical name, and I also 
cringe every time I see 'hook'.

Please let me know if there are any big objections against this minor cleanup.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
       [not found]     ` <20180205202242.6ilyjh35byycf3ez-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-05 20:32       ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-02-05 20:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	linux-api, Andy Lutomirski, Paul E. McKenney, Boqun Feng,
	Andrew Hunter, maged michael, Avi Kivity, Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman, Dave Watson, H. Peter Anvin,
	Andrea Parri, Russell King, ARM Linux, Greg Hackmann, Will

----- On Feb 5, 2018, at 3:22 PM, Ingo Molnar mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org wrote:

> * Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org> wrote:
> 
>>  
>> +config ARCH_HAS_MEMBARRIER_HOOKS
>> +	bool
> 
> Yeah, so I have renamed this to ARCH_HAS_MEMBARRIER_CALLBACKS, and propagated it
> through the rest of the patches. "Callback" is the canonical name, and I also
> cringe every time I see 'hook'.
> 
> Please let me know if there are any big objections against this minor cleanup.

No objection at all,

Thanks!

Mathieu

> 
> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 00/11] membarrier updates for 4.16
       [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-01-29 20:20   ` [PATCH for 4.16 11/11] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers
@ 2018-02-05 21:21   ` Ingo Molnar
  3 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2018-02-05 21:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, H . Peter Anvin, Andrea Parri,
	Russell King, Greg Hackmann, Will Deacon, David


Hi,

Find below the interdiff of the (minor) edits I've done to the series, before 
applying them to tip:sched/urgent.

I've also tidied up some of the changelogs.

Nothing earth-shattering.

Thanks,

	Ingo

=========>

 arch/powerpc/Kconfig      | 2 +-
 arch/x86/entry/entry_32.S | 2 +-
 arch/x86/entry/entry_64.S | 2 +-
 include/linux/sched/mm.h  | 6 +++---
 init/Kconfig              | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 09b02180b8a0..a2380de50878 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -140,7 +140,7 @@ config PPC
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_PMEM_API                if PPC64
-	select ARCH_HAS_MEMBARRIER_HOOKS
+	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_SCALED_CPUTIME		if VIRT_CPU_ACCOUNTING_NATIVE
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST		if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d8ba23d0d77a..abee6d2b9311 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -567,7 +567,7 @@ ENTRY(entry_INT80_32)
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
 	/*
-	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
 	 * when returning from IPI handler and when returning from
 	 * scheduler to user-space.
 	 */
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 49852d9920da..5816858c8820 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -805,7 +805,7 @@ GLOBAL(restore_regs_and_return_to_kernel)
 	POP_C_REGS
 	addq	$8, %rsp	/* skip regs->orig_ax */
 	/*
-	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on iret core serialization
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
 	 * when returning from IPI handler.
 	 */
 	INTERRUPT_RETURN
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index a7840f0f8832..03a169087a18 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -41,7 +41,7 @@ extern void __mmdrop(struct mm_struct *);
 static inline void mmdrop(struct mm_struct *mm)
 {
 	/*
-	 * The implicit full barrier implied by atomic_dec_and_test is
+	 * The implicit full barrier implied by atomic_dec_and_test() is
 	 * required by the membarrier system call before returning to
 	 * user-space, after storing to rq->curr.
 	 */
@@ -232,7 +232,7 @@ enum {
 	MEMBARRIER_FLAG_SYNC_CORE	= (1U << 0),
 };
 
-#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
 #include <asm/membarrier.h>
 #endif
 
@@ -249,7 +249,7 @@ static inline void membarrier_execve(struct task_struct *t)
 	atomic_set(&t->mm->membarrier_state, 0);
 }
 #else
-#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 					     struct mm_struct *next,
 					     struct task_struct *tsk)
diff --git a/init/Kconfig b/init/Kconfig
index 30b65febeb23..e37f4b2a6445 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1412,7 +1412,7 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
-config ARCH_HAS_MEMBARRIER_HOOKS
+config ARCH_HAS_MEMBARRIER_CALLBACKS
 	bool
 
 config ARCH_HAS_MEMBARRIER_SYNC_CORE

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2018-01-29 20:20 ` [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
  2018-02-05 20:22   ` Ingo Molnar
@ 2021-06-18 17:13   ` Christophe Leroy
  2021-06-18 17:26     ` Mathieu Desnoyers
  1 sibling, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2021-06-18 17:13 UTC (permalink / raw)
  To: Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: Maged Michael, Dave Watson, Will Deacon, Russell King,
	David Sehr, Paul Mackerras, H . Peter Anvin, linux-arch, x86,
	Andrew Hunter, Greg Hackmann, Alan Stern, Paul E . McKenney,
	Andrea Parri, Avi Kivity, Boqun Feng, linuxppc-dev,
	Nicholas Piggin, Alexander Viro, Andy Lutomirski, linux-api,
	linux-kernel, Linus Torvalds



Le 29/01/2018 à 21:20, Mathieu Desnoyers a écrit :
> Allow PowerPC to skip the full memory barrier in switch_mm(), and
> only issue the barrier when scheduling into a task belonging to a
> process that has registered to use expedited private.
> 
> Threads targeting the same VM but which belong to different thread
> groups is a tricky case. It has a few consequences:
> 
> It turns out that we cannot rely on get_nr_threads(p) to count the
> number of threads using a VM. We can use
> (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
> instead to skip the synchronize_sched() for cases where the VM only has
> a single user, and that user only has a single thread.
> 
> It also turns out that we cannot use for_each_thread() to set
> thread flags in all threads using a VM, as it only iterates on the
> thread group.
> 
> Therefore, test the membarrier state variable directly rather than
> relying on thread flags. This means
> membarrier_register_private_expedited() needs to set the
> MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
> only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
> private expedited membarrier commands to succeed.
> membarrier_arch_switch_mm() now tests for the
> MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Looking at switch_mm_irqs_off(), I found it more complex than expected and found that this patch is 
the reason for that complexity.

Before the patch (ie in kernel 4.14), we have:

00000000 <switch_mm_irqs_off>:
    0:	81 24 01 c8 	lwz     r9,456(r4)
    4:	71 29 00 01 	andi.   r9,r9,1
    8:	40 82 00 1c 	bne     24 <switch_mm_irqs_off+0x24>
    c:	39 24 01 c8 	addi    r9,r4,456
   10:	39 40 00 01 	li      r10,1
   14:	7d 00 48 28 	lwarx   r8,0,r9
   18:	7d 08 53 78 	or      r8,r8,r10
   1c:	7d 00 49 2d 	stwcx.  r8,0,r9
   20:	40 c2 ff f4 	bne-    14 <switch_mm_irqs_off+0x14>
   24:	7c 04 18 40 	cmplw   r4,r3
   28:	81 24 00 24 	lwz     r9,36(r4)
   2c:	91 25 04 4c 	stw     r9,1100(r5)
   30:	4d 82 00 20 	beqlr
   34:	48 00 00 00 	b       34 <switch_mm_irqs_off+0x34>
			34: R_PPC_REL24	switch_mmu_context


After the patch (ie in 5.13-rc6), that now is:

00000000 <switch_mm_irqs_off>:
    0:	81 24 02 18 	lwz     r9,536(r4)
    4:	71 29 00 01 	andi.   r9,r9,1
    8:	41 82 00 24 	beq     2c <switch_mm_irqs_off+0x2c>
    c:	7c 04 18 40 	cmplw   r4,r3
   10:	81 24 00 24 	lwz     r9,36(r4)
   14:	91 25 04 d0 	stw     r9,1232(r5)
   18:	4d 82 00 20 	beqlr
   1c:	81 24 00 28 	lwz     r9,40(r4)
   20:	71 29 00 0a 	andi.   r9,r9,10
   24:	40 82 00 34 	bne     58 <switch_mm_irqs_off+0x58>
   28:	48 00 00 00 	b       28 <switch_mm_irqs_off+0x28>
			28: R_PPC_REL24	switch_mmu_context
   2c:	39 24 02 18 	addi    r9,r4,536
   30:	39 40 00 01 	li      r10,1
   34:	7d 00 48 28 	lwarx   r8,0,r9
   38:	7d 08 53 78 	or      r8,r8,r10
   3c:	7d 00 49 2d 	stwcx.  r8,0,r9
   40:	40 a2 ff f4 	bne     34 <switch_mm_irqs_off+0x34>
   44:	7c 04 18 40 	cmplw   r4,r3
   48:	81 24 00 24 	lwz     r9,36(r4)
   4c:	91 25 04 d0 	stw     r9,1232(r5)
   50:	4d 82 00 20 	beqlr
   54:	48 00 00 00 	b       54 <switch_mm_irqs_off+0x54>
			54: R_PPC_REL24	switch_mmu_context
   58:	2c 03 00 00 	cmpwi   r3,0
   5c:	41 82 ff cc 	beq     28 <switch_mm_irqs_off+0x28>
   60:	48 00 00 00 	b       60 <switch_mm_irqs_off+0x60>
			60: R_PPC_REL24	switch_mmu_context


Especially, the comparison of 'prev' to 0 is pointless as both cases end up with just branching to 
'switch_mmu_context'

I don't understand all that complexity to just replace a simple 'smp_mb__after_unlock_lock()'.

#define smp_mb__after_unlock_lock()	smp_mb()
#define smp_mb()	barrier()
# define barrier() __asm__ __volatile__("": : :"memory")


Am I missing some subtility ?

Thanks
Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2021-06-18 17:13   ` Christophe Leroy
@ 2021-06-18 17:26     ` Mathieu Desnoyers
  2021-06-19  9:35       ` Christophe Leroy
  0 siblings, 1 reply; 23+ messages in thread
From: Mathieu Desnoyers @ 2021-06-18 17:26 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, maged michael,
	Dave Watson, Will Deacon, Russell King, ARM Linux, David Sehr,
	Paul Mackerras, H. Peter Anvin, linux-arch, x86, Andrew Hunter,
	Greg Hackmann, Alan Stern, Paul, Andrea Parri, Avi Kivity,
	Boqun Feng, linuxppc-dev, Nicholas Piggin, Alexander Viro,
	Andy Lutomirski, linux-api, linux-kernel, Linus Torvalds

----- On Jun 18, 2021, at 1:13 PM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
[...]
> 
> I don't understand all that complexity to just replace a simple
> 'smp_mb__after_unlock_lock()'.
> 
> #define smp_mb__after_unlock_lock()	smp_mb()
> #define smp_mb()	barrier()
> # define barrier() __asm__ __volatile__("": : :"memory")
> 
> 
> Am I missing some subtility ?

On powerpc CONFIG_SMP, smp_mb() is actually defined as:

#define smp_mb()        __smp_mb()
#define __smp_mb()      mb()
#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

So the original motivation here was to skip a "sync" instruction whenever
switching between threads which are part of the same process. But based on
recent discussions, I suspect my implementation may be inaccurately doing
so though.

Thanks,

Mathieu


> 
> Thanks
> Christophe

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2021-06-18 17:26     ` Mathieu Desnoyers
@ 2021-06-19  9:35       ` Christophe Leroy
  2021-06-19 15:02         ` Segher Boessenkool
  0 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2021-06-19  9:35 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Peter Zijlstra, Thomas Gleixner, maged michael,
	Dave Watson, Will Deacon, Russell King, ARM Linux, David Sehr,
	Paul Mackerras, H. Peter Anvin, linux-arch, x86, Andrew Hunter,
	Greg Hackmann, Alan Stern, Paul, Andrea Parri, Avi Kivity,
	Boqun Feng, linuxppc-dev, Nicholas Piggin, Alexander Viro,
	Andy Lutomirski, linux-api, linux-kernel, Linus Torvalds



Le 18/06/2021 à 19:26, Mathieu Desnoyers a écrit :
> ----- On Jun 18, 2021, at 1:13 PM, Christophe Leroy christophe.leroy@csgroup.eu wrote:
> [...]
>>
>> I don't understand all that complexity to just replace a simple
>> 'smp_mb__after_unlock_lock()'.
>>
>> #define smp_mb__after_unlock_lock()	smp_mb()
>> #define smp_mb()	barrier()
>> # define barrier() __asm__ __volatile__("": : :"memory")
>>
>>
>> Am I missing some subtility ?
> 
> On powerpc CONFIG_SMP, smp_mb() is actually defined as:
> 
> #define smp_mb()        __smp_mb()
> #define __smp_mb()      mb()
> #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> 
> So the original motivation here was to skip a "sync" instruction whenever
> switching between threads which are part of the same process. But based on
> recent discussions, I suspect my implementation may be inaccurately doing
> so though.
> 

I see.

Then, if you think a 'sync' is a concern, shouldn't we try and remove the forest of 'sync' in the 
I/O accessors ?

I can't really understand why we need all those 'sync' and 'isync' and 'twi' around the accesses 
whereas I/O memory is usually mapped as 'Guarded' so memory access ordering is already garantied.

I'm sure we'll save a lot with that.

Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2021-06-19  9:35       ` Christophe Leroy
@ 2021-06-19 15:02         ` Segher Boessenkool
  2021-06-21 14:11           ` Christophe Leroy
  0 siblings, 1 reply; 23+ messages in thread
From: Segher Boessenkool @ 2021-06-19 15:02 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mathieu Desnoyers, maged michael, Peter Zijlstra, Dave Watson,
	Will Deacon, Andrew Hunter, David Sehr, Paul Mackerras,
	H. Peter Anvin, linux-arch, x86, Russell King, ARM Linux,
	Greg Hackmann, Linus Torvalds, Ingo Molnar, Alan Stern, Paul,
	Andrea Parri, Avi Kivity, Boqun Feng, Nicholas Piggin,
	Alexander Viro, Andy Lutomirski, Thomas Gleixner, linux-api,
	linux-kernel, linuxppc-dev

On Sat, Jun 19, 2021 at 11:35:34AM +0200, Christophe Leroy wrote:
> 
> 
> Le 18/06/2021 à 19:26, Mathieu Desnoyers a écrit :
> >----- On Jun 18, 2021, at 1:13 PM, Christophe Leroy 
> >christophe.leroy@csgroup.eu wrote:
> >[...]
> >>
> >>I don't understand all that complexity to just replace a simple
> >>'smp_mb__after_unlock_lock()'.
> >>
> >>#define smp_mb__after_unlock_lock()	smp_mb()
> >>#define smp_mb()	barrier()
> >># define barrier() __asm__ __volatile__("": : :"memory")
> >>
> >>
> >>Am I missing some subtility ?
> >
> >On powerpc CONFIG_SMP, smp_mb() is actually defined as:
> >
> >#define smp_mb()        __smp_mb()
> >#define __smp_mb()      mb()
> >#define mb()   __asm__ __volatile__ ("sync" : : : "memory")
> >
> >So the original motivation here was to skip a "sync" instruction whenever
> >switching between threads which are part of the same process. But based on
> >recent discussions, I suspect my implementation may be inaccurately doing
> >so though.
> >
> 
> I see.
> 
> Then, if you think a 'sync' is a concern, shouldn't we try and remove the 
> forest of 'sync' in the I/O accessors ?
> 
> I can't really understand why we need all those 'sync' and 'isync' and 
> 'twi' around the accesses whereas I/O memory is usually mapped as 'Guarded' 
> so memory access ordering is already garantied.
> 
> I'm sure we'll save a lot with that.

The point of the twi in the I/O accessors was to make things easier to
debug if the accesses fail: for the twi insn to complete the load will
have to have completed as well.  On a correctly working system you never
should need this (until something fails ;-) )

Without the twi you might need to enforce ordering in some cases still.
The twi is a very heavy hammer, but some of that that gives us is no
doubt actually needed.


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2021-06-19 15:02         ` Segher Boessenkool
@ 2021-06-21 14:11           ` Christophe Leroy
  2021-06-22  0:15             ` Segher Boessenkool
  0 siblings, 1 reply; 23+ messages in thread
From: Christophe Leroy @ 2021-06-21 14:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Mathieu Desnoyers, maged michael, Peter Zijlstra, Dave Watson,
	Will Deacon, Andrew Hunter, David Sehr, Paul Mackerras,
	H. Peter Anvin, linux-arch, x86, Russell King, ARM Linux,
	Greg Hackmann, Linus Torvalds, Ingo Molnar, Alan Stern, Paul,
	Andrea Parri, Avi Kivity, Boqun Feng, Nicholas Piggin,
	Alexander Viro, Andy Lutomirski, Thomas Gleixner, linux-api,
	linux-kernel, linuxppc-dev



Le 19/06/2021 à 17:02, Segher Boessenkool a écrit :
> On Sat, Jun 19, 2021 at 11:35:34AM +0200, Christophe Leroy wrote:
>>
>>
>> Le 18/06/2021 à 19:26, Mathieu Desnoyers a écrit :
>>> ----- On Jun 18, 2021, at 1:13 PM, Christophe Leroy
>>> christophe.leroy@csgroup.eu wrote:
>>> [...]
>>>>
>>>> I don't understand all that complexity to just replace a simple
>>>> 'smp_mb__after_unlock_lock()'.
>>>>
>>>> #define smp_mb__after_unlock_lock()	smp_mb()
>>>> #define smp_mb()	barrier()
>>>> # define barrier() __asm__ __volatile__("": : :"memory")
>>>>
>>>>
>>>> Am I missing some subtility ?
>>>
>>> On powerpc CONFIG_SMP, smp_mb() is actually defined as:
>>>
>>> #define smp_mb()        __smp_mb()
>>> #define __smp_mb()      mb()
>>> #define mb()   __asm__ __volatile__ ("sync" : : : "memory")
>>>
>>> So the original motivation here was to skip a "sync" instruction whenever
>>> switching between threads which are part of the same process. But based on
>>> recent discussions, I suspect my implementation may be inaccurately doing
>>> so though.
>>>
>>
>> I see.
>>
>> Then, if you think a 'sync' is a concern, shouldn't we try and remove the
>> forest of 'sync' in the I/O accessors ?
>>
>> I can't really understand why we need all those 'sync' and 'isync' and
>> 'twi' around the accesses whereas I/O memory is usually mapped as 'Guarded'
>> so memory access ordering is already garantied.
>>
>> I'm sure we'll save a lot with that.
> 
> The point of the twi in the I/O accessors was to make things easier to
> debug if the accesses fail: for the twi insn to complete the load will
> have to have completed as well.  On a correctly working system you never
> should need this (until something fails ;-) )
> 
> Without the twi you might need to enforce ordering in some cases still.
> The twi is a very heavy hammer, but some of that that gives us is no
> doubt actually needed.
> 

Well, I've always been quite perplex about that. According to the documentation of the 8xx, if a bus 
error or something happens on an I/O access, the exception will be accounted on the instruction 
which does the access. But based on the following function, I understand that some version of 
powerpc do generate the trap on the instruction which was being executed at the time the I/O access 
failed, not the instruction that does the access itself ?

/*
  * I/O accesses can cause machine checks on powermacs.
  * Check if the NIP corresponds to the address of a sync
  * instruction for which there is an entry in the exception
  * table.
  *  -- paulus.
  */
static inline int check_io_access(struct pt_regs *regs)
{
#ifdef CONFIG_PPC32
	unsigned long msr = regs->msr;
	const struct exception_table_entry *entry;
	unsigned int *nip = (unsigned int *)regs->nip;

	if (((msr & 0xffff0000) == 0 || (msr & (0x80000 | 0x40000)))
	    && (entry = search_exception_tables(regs->nip)) != NULL) {
		/*
		 * Check that it's a sync instruction, or somewhere
		 * in the twi; isync; nop sequence that inb/inw/inl uses.
		 * As the address is in the exception table
		 * we should be able to read the instr there.
		 * For the debug message, we look at the preceding
		 * load or store.
		 */
		if (*nip == PPC_INST_NOP)
			nip -= 2;
		else if (*nip == PPC_INST_ISYNC)
			--nip;
		if (*nip == PPC_INST_SYNC || (*nip >> 26) == OP_TRAP) {
			unsigned int rb;

			--nip;
			rb = (*nip >> 11) & 0x1f;
			printk(KERN_DEBUG "%s bad port %lx at %p\n",
			       (*nip & 0x100)? "OUT to": "IN from",
			       regs->gpr[rb] - _IO_BASE, nip);
			regs->msr |= MSR_RI;
			regs->nip = extable_fixup(entry);
			return 1;
		}
	}
#endif /* CONFIG_PPC32 */
	return 0;
}

Am I right ?

It is not only the twi which bother's me in the I/O accessors but also the sync/isync and stuff.

A write typically is

	sync
	stw

A read is

	sync
	lwz
	twi
	isync

Taking into account that HW ordering is garanteed by the fact that __iomem is guarded, isn't the 
'memory' clobber enough as a barrier ?

Thanks
Christophe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()
  2021-06-21 14:11           ` Christophe Leroy
@ 2021-06-22  0:15             ` Segher Boessenkool
  0 siblings, 0 replies; 23+ messages in thread
From: Segher Boessenkool @ 2021-06-22  0:15 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Mathieu Desnoyers, maged michael, Peter Zijlstra, Dave Watson,
	Will Deacon, Andrew Hunter, David Sehr, Paul Mackerras,
	H. Peter Anvin, linux-arch, x86, Russell King, ARM Linux,
	Greg Hackmann, Linus Torvalds, Ingo Molnar, Alan Stern, Paul,
	Andrea Parri, Avi Kivity, Boqun Feng, Nicholas Piggin,
	Alexander Viro, Andy Lutomirski, Thomas Gleixner, linux-api,
	linux-kernel, linuxppc-dev

Hi!

On Mon, Jun 21, 2021 at 04:11:04PM +0200, Christophe Leroy wrote:
> Le 19/06/2021 à 17:02, Segher Boessenkool a écrit :
> >The point of the twi in the I/O accessors was to make things easier to
> >debug if the accesses fail: for the twi insn to complete the load will
> >have to have completed as well.  On a correctly working system you never
> >should need this (until something fails ;-) )
> >
> >Without the twi you might need to enforce ordering in some cases still.
> >The twi is a very heavy hammer, but some of that that gives us is no
> >doubt actually needed.
> 
> Well, I've always been quite perplex about that. According to the 
> documentation of the 8xx, if a bus error or something happens on an I/O 
> access, the exception will be accounted on the instruction which does the 
> access. But based on the following function, I understand that some version 
> of powerpc do generate the trap on the instruction which was being executed 
> at the time the I/O access failed, not the instruction that does the access 
> itself ?

Trap instructions are never speculated (this may not be architectural,
but it is true on all existing implementations).  So the instructions
after the twi;isync will not execute until the twi itself has finished,
and that cannot happen before the preceding load has (because it uses
the loaded register).

Now, some I/O accesses can cause machine checks.  Machine checks are
asynchronous and can be hard to correlate to specific load insns, and
worse, you may not even have the address loaded from in architected
registers anymore.  Since I/O accesses often take *long*, tens or even
hundreds of cycles is not unusual, this can be a challenge.

To recover from machine checks you typically need special debug hardware
and/or software.  For the Apple machines those are not so easy to come
by.  This "twi after loads" thing made it pretty easy to figure out
where your code was going wrong.

And it isn't as slow as it may sound: typically you really need to have
the result of the load before you can go on do useful work anyway, and
loads from I/O are slow non-posted things.

> /*
>  * I/O accesses can cause machine checks on powermacs.
>  * Check if the NIP corresponds to the address of a sync
>  * instruction for which there is an entry in the exception
>  * table.
>  *  -- paulus.
>  */

I suspect this is from before the twi thing was added?

> It is not only the twi which bother's me in the I/O accessors but also the 
> sync/isync and stuff.
> 
> A write typically is
> 
> 	sync
> 	stw
> 
> A read is
> 
> 	sync
> 	lwz
> 	twi
> 	isync
> 
> Taking into account that HW ordering is garanteed by the fact that __iomem 
> is guarded,

Yes.  But machine checks are asynchronous :-)

> isn't the 'memory' clobber enough as a barrier ?

A "memory" clobber isn't a barrier of any kind.  "Compiler barriers" do
not exist.

The only thing such a clobber does is it tells the compiler that this
inline asm can access some memory, and we do not say at what address.
So the compiler cannot reorder this asm with other memory accesses.  It
has no other effects, no magical effects, and it is not comparable to
actual barrier instructions (that actually tell the hardware that some
certain ordering is required).

"Compiler barrier" is a harmful misnomer: language shapes thoughts,
using misleading names causes misguided thoughts.

Anyway :-)

The isync is simply to make sure the code after it does not start before
the code before it has completed.  The sync before I am not sure.


Segher

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 10/11] membarrier: arm64: Provide core serializing command
  2018-01-17 16:54 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
@ 2018-01-17 16:54 ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-17 16:54 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Thomas Gleixner
  Cc: linux-kernel, linux-api, Andy Lutomirski, Paul E . McKenney,
	Boqun Feng, Andrew Hunter, Maged Michael, Avi Kivity,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Dave Watson, H . Peter Anvin, Andrea Parri, Russell King,
	Greg Hackmann, Will Deacon, David Sehr, Linus Torvalds, x86,
	Mathieu Desnoyers

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andy Lutomirski <luto@kernel.org>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Andrew Hunter <ahh@google.com>
CC: Maged Michael <maged.michael@gmail.com>
CC: Avi Kivity <avi@scylladb.com>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Dave Watson <davejwatson@fb.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Parri <parri.andrea@gmail.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Greg Hackmann <ghackmann@google.com>
CC: Will Deacon <will.deacon@arm.com>
CC: David Sehr <sehr@google.com>
CC: x86@kernel.org
CC: linux-arch@vger.kernel.org
---
 arch/arm64/Kconfig        | 1 +
 arch/arm64/kernel/entry.S | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..5b0c06d8dbbe 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d14b8f29b5f..5edde1c2e93e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -302,6 +302,10 @@ alternative_else_nop_endif
 	ldp	x28, x29, [sp, #16 * 14]
 	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #S_FRAME_SIZE		// restore sp
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on eret context synchronization
+	 * when returning from IPI handler, and when returning to user-space.
+	 */
 	eret					// return to kernel
 	.endm
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH for 4.16 10/11] membarrier: arm64: Provide core serializing command
       [not found] ` <20180110154625.4319-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
@ 2018-01-10 15:46   ` Mathieu Desnoyers
  0 siblings, 0 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2018-01-10 15:46 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, Andy Lutomirski,
	Paul E . McKenney, Boqun Feng, Andrew Hunter, Maged Michael,
	Avi Kivity, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Dave Watson, Thomas Gleixner, H . Peter Anvin,
	Andrea Parri, Russell King, Greg Hackmann, Will Deacon,
	David Sehr, Linus Torvalds, x86

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
CC: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
CC: Andy Lutomirski <luto-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
CC: Paul E. McKenney <paulmck-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
CC: Boqun Feng <boqun.feng-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Andrew Hunter <ahh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Maged Michael <maged.michael-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Avi Kivity <avi-VrcmuVmyx1hWk0Htik3J/w@public.gmane.org>
CC: Benjamin Herrenschmidt <benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>
CC: Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>
CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org>
CC: Dave Watson <davejwatson-b10kYP2dOMg@public.gmane.org>
CC: Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
CC: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: "H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>
CC: Andrea Parri <parri.andrea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: Russell King <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org>
CC: Greg Hackmann <ghackmann-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
CC: David Sehr <sehr-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
CC: x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
CC: linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 arch/arm64/Kconfig        | 1 +
 arch/arm64/kernel/entry.S | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c9a7e9e1414f..5b0c06d8dbbe 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -16,6 +16,7 @@ config ARM64
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_SET_MEMORY
 	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d14b8f29b5f..5edde1c2e93e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -302,6 +302,10 @@ alternative_else_nop_endif
 	ldp	x28, x29, [sp, #16 * 14]
 	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #S_FRAME_SIZE		// restore sp
+	/*
+	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on eret context synchronization
+	 * when returning from IPI handler, and when returning to user-space.
+	 */
 	eret					// return to kernel
 	.endm
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-06-22  0:30 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29 20:20 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 v2 01/11] membarrier: selftest: Test private expedited cmd Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm() Mathieu Desnoyers
2018-02-05 20:22   ` Ingo Molnar
     [not found]     ` <20180205202242.6ilyjh35byycf3ez-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-05 20:32       ` Mathieu Desnoyers
2021-06-18 17:13   ` Christophe Leroy
2021-06-18 17:26     ` Mathieu Desnoyers
2021-06-19  9:35       ` Christophe Leroy
2021-06-19 15:02         ` Segher Boessenkool
2021-06-21 14:11           ` Christophe Leroy
2021-06-22  0:15             ` Segher Boessenkool
2018-01-29 20:20 ` [PATCH for 4.16 v5 03/11] membarrier: Document scheduler barrier requirements Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 v2 05/11] membarrier: selftest: Test global expedited cmd Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 v2 06/11] Introduce sync_core_before_usermode Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 v3 07/11] x86: Implement sync_core_before_usermode Mathieu Desnoyers
     [not found] ` <20180129202020.8515-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2018-01-29 20:20   ` [PATCH for 4.16 v3 04/11] membarrier: provide GLOBAL_EXPEDITED command Mathieu Desnoyers
2018-01-29 20:20   ` [PATCH for 4.16 v3 08/11] membarrier: Provide core serializing command Mathieu Desnoyers
2018-01-29 20:20   ` [PATCH for 4.16 11/11] membarrier: selftest: Test private expedited sync core cmd Mathieu Desnoyers
2018-02-05 21:21   ` [PATCH for 4.16 00/11] membarrier updates for 4.16 Ingo Molnar
2018-01-29 20:20 ` [PATCH for 4.16 v4 09/11] membarrier: x86: Provide core serializing command Mathieu Desnoyers
2018-01-29 20:20 ` [PATCH for 4.16 10/11] membarrier: arm64: " Mathieu Desnoyers
  -- strict thread matches above, loose matches on Subject: below --
2018-01-17 16:54 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
2018-01-17 16:54 ` [PATCH for 4.16 10/11] membarrier: arm64: Provide core serializing command Mathieu Desnoyers
2018-01-10 15:46 [PATCH for 4.16 00/11] membarrier updates for 4.16 Mathieu Desnoyers
     [not found] ` <20180110154625.4319-1-mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w@public.gmane.org>
2018-01-10 15:46   ` [PATCH for 4.16 10/11] membarrier: arm64: Provide core serializing command Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).