linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups
@ 2019-09-19 17:36 Mathieu Desnoyers
  2019-09-19 17:36 ` [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check Mathieu Desnoyers
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:36 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

Hi,

Those series of fixes and cleanups are initially motivated by the report
of race in membarrier, which can load p->mm->membarrier_state after mm
has been freed (use-after-free).

Thanks,

Mathieu

Mathieu Desnoyers (7):
  Fix: sched/membarrier: Private expedited registration check
  Cleanup: sched/membarrier: Remove redundant check
  Cleanup: sched/membarrier: Only sync_core before usermode for same mm
  Fix: sched/membarrier: p->mm->membarrier_state racy load (v4)
  selftests: sched/membarrier: Add multi-threaded test
  sched/membarrier: Skip IPIs when mm->mm_users == 1
  sched/membarrier: Return -ENOMEM to userspace on memory allocation
    failure

 fs/exec.c                                     |   2 +-
 include/linux/mm_types.h                      |  14 +-
 include/linux/sched/mm.h                      |  10 +-
 kernel/sched/core.c                           |   4 +-
 kernel/sched/membarrier.c                     | 236 +++++++++++-------
 kernel/sched/sched.h                          |  34 +++
 tools/testing/selftests/membarrier/.gitignore |   3 +-
 tools/testing/selftests/membarrier/Makefile   |   5 +-
 ...mbarrier_test.c => membarrier_test_impl.h} |  40 +--
 .../membarrier/membarrier_test_multi_thread.c |  73 ++++++
 .../membarrier_test_single_thread.c           |  24 ++
 11 files changed, 329 insertions(+), 116 deletions(-)
 rename tools/testing/selftests/membarrier/{membarrier_test.c => membarrier_test_impl.h} (95%)
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
@ 2019-09-19 17:36 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Fix private " tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 2/7] Cleanup: sched/membarrier: Remove redundant check Mathieu Desnoyers
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:36 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

Fix a logic flaw in the way membarrier_register_private_expedited()
handles ready state checks for private expedited sync core and private
expedited registrations.

If a private expedited membarrier registration is first performed, and
then a private expedited sync_core registration is performed, the ready
state check will skip the second registration when it really should not.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index aa8d75804108..5110d91b1b0e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -226,7 +226,7 @@ static int membarrier_register_private_expedited(int flags)
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if (atomic_read(&mm->membarrier_state) & state)
+	if ((atomic_read(&mm->membarrier_state) & state) == state)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 2/7] Cleanup: sched/membarrier: Remove redundant check
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
  2019-09-19 17:36 ` [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 3/7] Cleanup: sched/membarrier: Only sync_core before usermode for same mm Mathieu Desnoyers
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

Checking that the number of threads is 1 is redundant with checking
mm_users == 1.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 5110d91b1b0e..7e0a0d6535f3 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -186,7 +186,7 @@ static int membarrier_register_global_expedited(void)
 	    MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+	if (atomic_read(&mm->mm_users) == 1) {
 		/*
 		 * For single mm user, single threaded process, we can
 		 * simply issue a memory barrier after setting
@@ -232,7 +232,7 @@ static int membarrier_register_private_expedited(int flags)
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
 		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
 			  &mm->membarrier_state);
-	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+	if (atomic_read(&mm->mm_users) != 1) {
 		/*
 		 * Ensure all future scheduler executions will observe the
 		 * new thread flag state for this process.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 3/7] Cleanup: sched/membarrier: Only sync_core before usermode for same mm
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
  2019-09-19 17:36 ` [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 2/7] Cleanup: sched/membarrier: Remove redundant check Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Call sync_core only " tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 4/7] Fix: sched/membarrier: p->mm->membarrier_state racy load (v4) Mathieu Desnoyers
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

When the prev and next task's mm change, switch_mm() provides the core
serializing guarantees before returning to usermode. The only case
where an explicit core serialization is needed is when the scheduler
keeps the same mm for prev and next.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/mm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4a7944078cc3..8557ec664213 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -362,6 +362,8 @@ enum {
 
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
 {
+	if (current->mm != mm)
+		return;
 	if (likely(!(atomic_read(&mm->membarrier_state) &
 		     MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
 		return;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 4/7] Fix: sched/membarrier: p->mm->membarrier_state racy load (v4)
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2019-09-19 17:37 ` [RFC PATCH for 5.4 3/7] Cleanup: sched/membarrier: Only sync_core before usermode for same mm Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 5/7] selftests: sched/membarrier: Add multi-threaded test Mathieu Desnoyers
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

The membarrier_state field is located within the mm_struct, which
is not guaranteed to exist when used from runqueue-lock-free iteration
on runqueues by the membarrier system call.

Copy the membarrier_state from the mm_struct into the scheduler runqueue
when the scheduler switches between mm.

When registering membarrier for mm, after setting the registration bit
in the mm membarrier state, issue a synchronize_rcu() to ensure the
scheduler observes the change. In order to take care of the case
where a runqueue keeps executing the target mm without swapping to
other mm, iterate over each runqueue and issue an IPI to copy the
membarrier_state from the mm_struct into each runqueue which have the
same mm which state has just been modified.

Move the mm membarrier_state field closer to pgd in mm_struct to use
a cache line already touched by the scheduler switch_mm.

The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
clear the runqueue's membarrier state in addition to clear the mm
membarrier state, so move its implementation into the scheduler
membarrier code so it can access the runqueue structure.

Add memory barrier in membarrier_exec_mmap() prior to clearing
the membarrier state, ensuring memory accesses executed prior to exec
are not reordered with the stores clearing the membarrier state.

As suggested by Linus, move all membarrier.c RCU read-side locks outside
of the for each cpu loops.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
Changes since v1:
- Take care of Peter Zijlstra's feedback, moving callsites closer
  to switch_mm() (scheduler) and activate_mm() (execve).
- Add memory barrier in membarrier_exec_mmap() prior to clearing
  the membarrier state.

Changes since v2:
- Move the call to membarrier_exec_mmap() before setting tsk->mm
  to its new value, so the memory barrier is done before this
  store.

Changes since v3:
- Use GFP_KERNEL for registration cpumask allocation.
- Use this_cpu_write() to update the runqueue's membarrier_state.
- Return -ENOMEM to userspace if cpumask allocation fails. Remove
  fallback code.
---
 fs/exec.c                 |   2 +-
 include/linux/mm_types.h  |  14 ++-
 include/linux/sched/mm.h  |   8 +-
 kernel/sched/core.c       |   4 +-
 kernel/sched/membarrier.c | 176 ++++++++++++++++++++++++++++----------
 kernel/sched/sched.h      |  34 ++++++++
 6 files changed, 184 insertions(+), 54 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index f7f6a140856a..555e93c7dec8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1033,6 +1033,7 @@ static int exec_mmap(struct mm_struct *mm)
 	}
 	task_lock(tsk);
 	active_mm = tsk->active_mm;
+	membarrier_exec_mmap(mm);
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
@@ -1825,7 +1826,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
-	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current, false);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6a7a1083b6fb..ec9bd3a6c827 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -383,6 +383,16 @@ struct mm_struct {
 		unsigned long highest_vm_end;	/* highest vma end address */
 		pgd_t * pgd;
 
+#ifdef CONFIG_MEMBARRIER
+		/**
+		 * @membarrier_state: Flags controlling membarrier behavior.
+		 *
+		 * This field is close to @pgd to hopefully fit in the same
+		 * cache-line, which needs to be touched by switch_mm().
+		 */
+		atomic_t membarrier_state;
+#endif
+
 		/**
 		 * @mm_users: The number of users including userspace.
 		 *
@@ -452,9 +462,7 @@ struct mm_struct {
 		unsigned long flags; /* Must use atomic bitops to access */
 
 		struct core_state *core_state; /* coredumping support */
-#ifdef CONFIG_MEMBARRIER
-		atomic_t membarrier_state;
-#endif
+
 #ifdef CONFIG_AIO
 		spinlock_t			ioctx_lock;
 		struct kioctx_table __rcu	*ioctx_table;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 8557ec664213..e6770012db18 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -370,10 +370,8 @@ static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
 	sync_core_before_usermode();
 }
 
-static inline void membarrier_execve(struct task_struct *t)
-{
-	atomic_set(&t->mm->membarrier_state, 0);
-}
+extern void membarrier_exec_mmap(struct mm_struct *mm);
+
 #else
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
@@ -382,7 +380,7 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 {
 }
 #endif
-static inline void membarrier_execve(struct task_struct *t)
+static inline void membarrier_exec_mmap(struct mm_struct *mm)
 {
 }
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index df9f1fe5689b..e25d0b97a242 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3238,8 +3238,10 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		next->active_mm = oldmm;
 		mmgrab(oldmm);
 		enter_lazy_tlb(oldmm, next);
-	} else
+	} else {
+		membarrier_switch_mm(rq, oldmm, mm);
 		switch_mm_irqs_off(oldmm, mm, next);
+	}
 
 	if (!prev->mm) {
 		prev->active_mm = NULL;
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 7e0a0d6535f3..fce06a2e1d89 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -30,6 +30,39 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static void ipi_sync_rq_state(void *info)
+{
+	struct mm_struct *mm = (struct mm_struct *) info;
+
+	if (current->mm != mm)
+		return;
+	this_cpu_write(runqueues.membarrier_state,
+		       atomic_read(&mm->membarrier_state));
+	/*
+	 * Issue a memory barrier after setting
+	 * MEMBARRIER_STATE_GLOBAL_EXPEDITED in the current runqueue to
+	 * guarantee that no memory access following registration is reordered
+	 * before registration.
+	 */
+	smp_mb();
+}
+
+void membarrier_exec_mmap(struct mm_struct *mm)
+{
+	/*
+	 * Issue a memory barrier before clearing membarrier_state to
+	 * guarantee that no memory access prior to exec is reordered after
+	 * clearing this state.
+	 */
+	smp_mb();
+	atomic_set(&mm->membarrier_state, 0);
+	/*
+	 * Keep the runqueue membarrier_state in sync with this mm
+	 * membarrier_state.
+	 */
+	this_cpu_write(runqueues.membarrier_state, 0);
+}
+
 static int membarrier_global_expedited(void)
 {
 	int cpu;
@@ -56,6 +89,7 @@ static int membarrier_global_expedited(void)
 	}
 
 	cpus_read_lock();
+	rcu_read_lock();
 	for_each_online_cpu(cpu) {
 		struct task_struct *p;
 
@@ -70,17 +104,25 @@ static int membarrier_global_expedited(void)
 		if (cpu == raw_smp_processor_id())
 			continue;
 
-		rcu_read_lock();
+		if (!(READ_ONCE(cpu_rq(cpu)->membarrier_state) &
+		    MEMBARRIER_STATE_GLOBAL_EXPEDITED))
+			continue;
+
+		/*
+		 * Skip the CPU if it runs a kernel thread. The scheduler
+		 * leaves the prior task mm in place as an optimization when
+		 * scheduling a kthread.
+		 */
 		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
-				   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
-			if (!fallback)
-				__cpumask_set_cpu(cpu, tmpmask);
-			else
-				smp_call_function_single(cpu, ipi_mb, NULL, 1);
-		}
-		rcu_read_unlock();
+		if (p->flags & PF_KTHREAD)
+			continue;
+
+		if (!fallback)
+			__cpumask_set_cpu(cpu, tmpmask);
+		else
+			smp_call_function_single(cpu, ipi_mb, NULL, 1);
 	}
+	rcu_read_unlock();
 	if (!fallback) {
 		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
@@ -136,6 +178,7 @@ static int membarrier_private_expedited(int flags)
 	}
 
 	cpus_read_lock();
+	rcu_read_lock();
 	for_each_online_cpu(cpu) {
 		struct task_struct *p;
 
@@ -149,7 +192,6 @@ static int membarrier_private_expedited(int flags)
 		 */
 		if (cpu == raw_smp_processor_id())
 			continue;
-		rcu_read_lock();
 		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
 		if (p && p->mm == current->mm) {
 			if (!fallback)
@@ -157,8 +199,8 @@ static int membarrier_private_expedited(int flags)
 			else
 				smp_call_function_single(cpu, ipi_mb, NULL, 1);
 		}
-		rcu_read_unlock();
 	}
+	rcu_read_unlock();
 	if (!fallback) {
 		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
@@ -177,32 +219,78 @@ static int membarrier_private_expedited(int flags)
 	return 0;
 }
 
+static int sync_runqueues_membarrier_state(struct mm_struct *mm)
+{
+	int membarrier_state = atomic_read(&mm->membarrier_state);
+	cpumask_var_t tmpmask;
+	int cpu;
+
+	if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) {
+		this_cpu_write(runqueues.membarrier_state, membarrier_state);
+
+		/*
+		 * For single mm user, we can simply issue a memory barrier
+		 * after setting MEMBARRIER_STATE_GLOBAL_EXPEDITED in the
+		 * mm and in the current runqueue to guarantee that no memory
+		 * access following registration is reordered before
+		 * registration.
+		 */
+		smp_mb();
+		return 0;
+	}
+
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+
+	/*
+	 * For mm with multiple users, we need to ensure all future
+	 * scheduler executions will observe @mm's new membarrier
+	 * state.
+	 */
+	synchronize_rcu();
+
+	/*
+	 * For each cpu runqueue, if the task's mm match @mm, ensure that all
+	 * @mm's membarrier state set bits are also set in in the runqueue's
+	 * membarrier state. This ensures that a runqueue scheduling
+	 * between threads which are users of @mm has its membarrier state
+	 * updated.
+	 */
+	cpus_read_lock();
+	rcu_read_lock();
+	for_each_online_cpu(cpu) {
+		struct rq *rq = cpu_rq(cpu);
+		struct task_struct *p;
+
+		p = task_rcu_dereference(&rq->curr);
+		if (p && p->mm == mm)
+			__cpumask_set_cpu(cpu, tmpmask);
+	}
+	rcu_read_unlock();
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
+	cpus_read_unlock();
+
+	return 0;
+}
+
 static int membarrier_register_global_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
+	int ret;
 
 	if (atomic_read(&mm->membarrier_state) &
 	    MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) == 1) {
-		/*
-		 * For single mm user, single threaded process, we can
-		 * simply issue a memory barrier after setting
-		 * MEMBARRIER_STATE_GLOBAL_EXPEDITED to guarantee that
-		 * no memory access following registration is reordered
-		 * before registration.
-		 */
-		smp_mb();
-	} else {
-		/*
-		 * For multi-mm user threads, we need to ensure all
-		 * future scheduler executions will observe the new
-		 * thread flag state for this mm.
-		 */
-		synchronize_rcu();
-	}
+	ret = sync_runqueues_membarrier_state(mm);
+	if (ret)
+		return ret;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY,
 		  &mm->membarrier_state);
 
@@ -213,12 +301,15 @@ static int membarrier_register_private_expedited(int flags)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
-	int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY;
+	int ready_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
+	    set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
+	    ret;
 
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
-		state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+		ready_state =
+			MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
 	}
 
 	/*
@@ -226,20 +317,15 @@ static int membarrier_register_private_expedited(int flags)
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if ((atomic_read(&mm->membarrier_state) & state) == state)
+	if ((atomic_read(&mm->membarrier_state) & ready_state) == ready_state)
 		return 0;
-	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
-		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
-			  &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) != 1) {
-		/*
-		 * Ensure all future scheduler executions will observe the
-		 * new thread flag state for this process.
-		 */
-		synchronize_rcu();
-	}
-	atomic_or(state, &mm->membarrier_state);
+		set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+	atomic_or(set_state, &mm->membarrier_state);
+	ret = sync_runqueues_membarrier_state(mm);
+	if (ret)
+		return ret;
+	atomic_or(ready_state, &mm->membarrier_state);
 
 	return 0;
 }
@@ -253,8 +339,10 @@ static int membarrier_register_private_expedited(int flags)
  * command specified does not exist, not available on the running
  * kernel, or if the command argument is invalid, this system call
  * returns -EINVAL. For a given command, with flags argument set to 0,
- * this system call is guaranteed to always return the same value until
- * reboot.
+ * if this system call returns -ENOSYS or -EINVAL, it is guaranteed to
+ * always return the same value until reboot. In addition, it can return
+ * -ENOMEM if there is not enough memory available to perform the system
+ * call.
  *
  * All memory accesses performed in program order from each targeted thread
  * is guaranteed to be ordered with respect to sys_membarrier(). If we use
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 802b1f3405f2..85cb30b1b6b7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -907,6 +907,10 @@ struct rq {
 
 	atomic_t		nr_iowait;
 
+#ifdef CONFIG_MEMBARRIER
+	int membarrier_state;
+#endif
+
 #ifdef CONFIG_SMP
 	struct root_domain		*rd;
 	struct sched_domain __rcu	*sd;
@@ -2423,3 +2427,33 @@ static inline bool sched_energy_enabled(void)
 static inline bool sched_energy_enabled(void) { return false; }
 
 #endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+
+#ifdef CONFIG_MEMBARRIER
+/*
+ * The scheduler provides memory barriers required by membarrier between:
+ * - prior user-space memory accesses and store to rq->membarrier_state,
+ * - store to rq->membarrier_state and following user-space memory accesses.
+ * In the same way it provides those guarantees around store to rq->curr.
+ */
+static inline void membarrier_switch_mm(struct rq *rq,
+					struct mm_struct *prev_mm,
+					struct mm_struct *next_mm)
+{
+	int membarrier_state;
+
+	if (prev_mm == next_mm)
+		return;
+
+	membarrier_state = atomic_read(&next_mm->membarrier_state);
+	if (READ_ONCE(rq->membarrier_state) == membarrier_state)
+		return;
+
+	WRITE_ONCE(rq->membarrier_state, membarrier_state);
+}
+#else
+static inline void membarrier_switch_mm(struct rq *rq,
+					struct mm_struct *prev_mm,
+					struct mm_struct *next_mm)
+{
+}
+#endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 5/7] selftests: sched/membarrier: Add multi-threaded test
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2019-09-19 17:37 ` [RFC PATCH for 5.4 4/7] Fix: sched/membarrier: p->mm->membarrier_state racy load (v4) Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] selftests, " tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 6/7] sched/membarrier: Skip IPIs when mm->mm_users == 1 Mathieu Desnoyers
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers, Shuah Khan

membarrier commands cover very different code paths if they are in
a single-threaded vs multi-threaded process. Therefore, exercise both
scenarios in the kernel selftests to increase coverage of this selftest.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/membarrier/.gitignore |  3 +-
 tools/testing/selftests/membarrier/Makefile   |  5 +-
 ...mbarrier_test.c => membarrier_test_impl.h} | 40 +++++-----
 .../membarrier/membarrier_test_multi_thread.c | 73 +++++++++++++++++++
 .../membarrier_test_single_thread.c           | 24 ++++++
 5 files changed, 124 insertions(+), 21 deletions(-)
 rename tools/testing/selftests/membarrier/{membarrier_test.c => membarrier_test_impl.h} (95%)
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c

diff --git a/tools/testing/selftests/membarrier/.gitignore b/tools/testing/selftests/membarrier/.gitignore
index 020c44f49a9e..f2f7ec0a99b4 100644
--- a/tools/testing/selftests/membarrier/.gitignore
+++ b/tools/testing/selftests/membarrier/.gitignore
@@ -1 +1,2 @@
-membarrier_test
+membarrier_test_multi_thread
+membarrier_test_single_thread
diff --git a/tools/testing/selftests/membarrier/Makefile b/tools/testing/selftests/membarrier/Makefile
index 97e3bdf3d1e9..34d1c81a2324 100644
--- a/tools/testing/selftests/membarrier/Makefile
+++ b/tools/testing/selftests/membarrier/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -g -I../../../../usr/include/
+LDLIBS += -lpthread
 
-TEST_GEN_PROGS := membarrier_test
+TEST_GEN_PROGS := membarrier_test_single_thread \
+		membarrier_test_multi_thread
 
 include ../lib.mk
-
diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test_impl.h
similarity index 95%
rename from tools/testing/selftests/membarrier/membarrier_test.c
rename to tools/testing/selftests/membarrier/membarrier_test_impl.h
index 70b4ddbf126b..186be69f0a59 100644
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ b/tools/testing/selftests/membarrier/membarrier_test_impl.h
@@ -1,10 +1,11 @@
-// SPDX-License-Identifier: GPL-2.0
+/* SPDX-License-Identifier: GPL-2.0 */
 #define _GNU_SOURCE
 #include <linux/membarrier.h>
 #include <syscall.h>
 #include <stdio.h>
 #include <errno.h>
 #include <string.h>
+#include <pthread.h>
 
 #include "../kselftest.h"
 
@@ -223,7 +224,7 @@ static int test_membarrier_global_expedited_success(void)
 	return 0;
 }
 
-static int test_membarrier(void)
+static int test_membarrier_fail(void)
 {
 	int status;
 
@@ -233,10 +234,27 @@ static int test_membarrier(void)
 	status = test_membarrier_flags_fail();
 	if (status)
 		return status;
-	status = test_membarrier_global_success();
+	status = test_membarrier_private_expedited_fail();
 	if (status)
 		return status;
-	status = test_membarrier_private_expedited_fail();
+	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+	if (status < 0) {
+		ksft_test_result_fail("sys_membarrier() failed\n");
+		return status;
+	}
+	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+		status = test_membarrier_private_expedited_sync_core_fail();
+		if (status)
+			return status;
+	}
+	return 0;
+}
+
+static int test_membarrier_success(void)
+{
+	int status;
+
+	status = test_membarrier_global_success();
 	if (status)
 		return status;
 	status = test_membarrier_register_private_expedited_success();
@@ -251,9 +269,6 @@ static int test_membarrier(void)
 		return status;
 	}
 	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
-		status = test_membarrier_private_expedited_sync_core_fail();
-		if (status)
-			return status;
 		status = test_membarrier_register_private_expedited_sync_core_success();
 		if (status)
 			return status;
@@ -300,14 +315,3 @@ static int test_membarrier_query(void)
 	ksft_test_result_pass("sys_membarrier available\n");
 	return 0;
 }
-
-int main(int argc, char **argv)
-{
-	ksft_print_header();
-	ksft_set_plan(13);
-
-	test_membarrier_query();
-	test_membarrier();
-
-	return ksft_exit_pass();
-}
diff --git a/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
new file mode 100644
index 000000000000..ac5613e5b0eb
--- /dev/null
+++ b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+static int thread_ready, thread_quit;
+static pthread_mutex_t test_membarrier_thread_mutex =
+	PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t test_membarrier_thread_cond =
+	PTHREAD_COND_INITIALIZER;
+
+void *test_membarrier_thread(void *arg)
+{
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	thread_ready = 1;
+	pthread_cond_broadcast(&test_membarrier_thread_cond);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	while (!thread_quit)
+		pthread_cond_wait(&test_membarrier_thread_cond,
+				  &test_membarrier_thread_mutex);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	return NULL;
+}
+
+static int test_mt_membarrier(void)
+{
+	int i;
+	pthread_t test_thread;
+
+	pthread_create(&test_thread, NULL,
+		       test_membarrier_thread, NULL);
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	while (!thread_ready)
+		pthread_cond_wait(&test_membarrier_thread_cond,
+				  &test_membarrier_thread_mutex);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	test_membarrier_fail();
+
+	test_membarrier_success();
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	thread_quit = 1;
+	pthread_cond_broadcast(&test_membarrier_thread_cond);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	pthread_join(test_thread, NULL);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+	ksft_set_plan(13);
+
+	test_membarrier_query();
+
+	/* Multi-threaded */
+	test_mt_membarrier();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/membarrier/membarrier_test_single_thread.c b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c
new file mode 100644
index 000000000000..c1c963902854
--- /dev/null
+++ b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+	ksft_set_plan(13);
+
+	test_membarrier_query();
+
+	test_membarrier_fail();
+
+	test_membarrier_success();
+
+	return ksft_exit_pass();
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 6/7] sched/membarrier: Skip IPIs when mm->mm_users == 1
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
                   ` (4 preceding siblings ...)
  2019-09-19 17:37 ` [RFC PATCH for 5.4 5/7] selftests: sched/membarrier: Add multi-threaded test Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
  2019-09-19 17:37 ` [RFC PATCH for 5.4 7/7] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure Mathieu Desnoyers
  2019-09-23  9:06 ` [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Peter Zijlstra
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

If there is only a single mm_user for the mm, the private expedited
membarrier command can skip the IPIs, because only a single thread
is using the mm.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index fce06a2e1d89..8afbdf92be0a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -145,20 +145,21 @@ static int membarrier_private_expedited(int flags)
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
+	struct mm_struct *mm = current->mm;
 
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
-		if (!(atomic_read(&current->mm->membarrier_state) &
+		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
 	} else {
-		if (!(atomic_read(&current->mm->membarrier_state) &
+		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
 			return -EPERM;
 	}
 
-	if (num_online_cpus() == 1)
+	if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
 		return 0;
 
 	/*
@@ -193,7 +194,7 @@ static int membarrier_private_expedited(int flags)
 		if (cpu == raw_smp_processor_id())
 			continue;
 		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-		if (p && p->mm == current->mm) {
+		if (p && p->mm == mm) {
 			if (!fallback)
 				__cpumask_set_cpu(cpu, tmpmask);
 			else
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH for 5.4 7/7] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
                   ` (5 preceding siblings ...)
  2019-09-19 17:37 ` [RFC PATCH for 5.4 6/7] sched/membarrier: Skip IPIs when mm->mm_users == 1 Mathieu Desnoyers
@ 2019-09-19 17:37 ` Mathieu Desnoyers
  2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
  2019-09-23  9:06 ` [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Peter Zijlstra
  7 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-19 17:37 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Ingo Molnar, linux-kernel, Oleg Nesterov, Eric W. Biederman,
	Linus Torvalds, Russell King - ARM Linux admin, Chris Metcalf,
	Christoph Lameter, Kirill Tkhai, Mike Galbraith, Thomas Gleixner,
	Mathieu Desnoyers

Remove the IPI fallback code from membarrier to deal with very
infrequent cpumask memory allocation failure. Use GFP_KERNEL rather
than GFP_NOWAIT, and relax the blocking guarantees for the expedited
membarrier system call commands, allowing it to block if waiting for
memory to be made available.

In addition, now -ENOMEM can be returned to user-space if the cpumask
memory allocation fails.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 61 ++++++++++++---------------------------
 1 file changed, 19 insertions(+), 42 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 8afbdf92be0a..1420c656e8f0 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -66,7 +66,6 @@ void membarrier_exec_mmap(struct mm_struct *mm)
 static int membarrier_global_expedited(void)
 {
 	int cpu;
-	bool fallback = false;
 	cpumask_var_t tmpmask;
 
 	if (num_online_cpus() == 1)
@@ -78,15 +77,8 @@ static int membarrier_global_expedited(void)
 	 */
 	smp_mb();	/* system call entry is not a mb. */
 
-	/*
-	 * Expedited membarrier commands guarantee that they won't
-	 * block, hence the GFP_NOWAIT allocation flag and fallback
-	 * implementation.
-	 */
-	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-		/* Fallback for OOM. */
-		fallback = true;
-	}
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
 
 	cpus_read_lock();
 	rcu_read_lock();
@@ -117,18 +109,15 @@ static int membarrier_global_expedited(void)
 		if (p->flags & PF_KTHREAD)
 			continue;
 
-		if (!fallback)
-			__cpumask_set_cpu(cpu, tmpmask);
-		else
-			smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		__cpumask_set_cpu(cpu, tmpmask);
 	}
 	rcu_read_unlock();
-	if (!fallback) {
-		preempt_disable();
-		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-		preempt_enable();
-		free_cpumask_var(tmpmask);
-	}
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
 
 	/*
@@ -143,7 +132,6 @@ static int membarrier_global_expedited(void)
 static int membarrier_private_expedited(int flags)
 {
 	int cpu;
-	bool fallback = false;
 	cpumask_var_t tmpmask;
 	struct mm_struct *mm = current->mm;
 
@@ -168,15 +156,8 @@ static int membarrier_private_expedited(int flags)
 	 */
 	smp_mb();	/* system call entry is not a mb. */
 
-	/*
-	 * Expedited membarrier commands guarantee that they won't
-	 * block, hence the GFP_NOWAIT allocation flag and fallback
-	 * implementation.
-	 */
-	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-		/* Fallback for OOM. */
-		fallback = true;
-	}
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
 
 	cpus_read_lock();
 	rcu_read_lock();
@@ -194,20 +175,16 @@ static int membarrier_private_expedited(int flags)
 		if (cpu == raw_smp_processor_id())
 			continue;
 		p = task_rcu_dereference(&cpu_rq(cpu)->curr);
-		if (p && p->mm == mm) {
-			if (!fallback)
-				__cpumask_set_cpu(cpu, tmpmask);
-			else
-				smp_call_function_single(cpu, ipi_mb, NULL, 1);
-		}
+		if (p && p->mm == mm)
+			__cpumask_set_cpu(cpu, tmpmask);
 	}
 	rcu_read_unlock();
-	if (!fallback) {
-		preempt_disable();
-		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-		preempt_enable();
-		free_cpumask_var(tmpmask);
-	}
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups
  2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
                   ` (6 preceding siblings ...)
  2019-09-19 17:37 ` [RFC PATCH for 5.4 7/7] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure Mathieu Desnoyers
@ 2019-09-23  9:06 ` Peter Zijlstra
  2019-09-23 14:55   ` Mathieu Desnoyers
  7 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2019-09-23  9:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Paul E. McKenney, Ingo Molnar, linux-kernel, Oleg Nesterov,
	Eric W. Biederman, Linus Torvalds,
	Russell King - ARM Linux admin, Chris Metcalf, Christoph Lameter,
	Kirill Tkhai, Mike Galbraith, Thomas Gleixner

On Thu, Sep 19, 2019 at 01:36:58PM -0400, Mathieu Desnoyers wrote:
> Hi,
> 
> Those series of fixes and cleanups are initially motivated by the report
> of race in membarrier, which can load p->mm->membarrier_state after mm
> has been freed (use-after-free).
> 

The lot looks good to me; what do you want done with them (them being
RFC and all) ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups
  2019-09-23  9:06 ` [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Peter Zijlstra
@ 2019-09-23 14:55   ` Mathieu Desnoyers
  2019-09-25  8:07     ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-23 14:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Ingo Molnar, linux-kernel, Oleg Nesterov,
	Eric W. Biederman, Linus Torvalds, Russell King, ARM Linux,
	Chris Metcalf, Chris Lameter, Kirill Tkhai, Mike Galbraith,
	Thomas Gleixner

----- On Sep 23, 2019, at 5:06 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Thu, Sep 19, 2019 at 01:36:58PM -0400, Mathieu Desnoyers wrote:
>> Hi,
>> 
>> Those series of fixes and cleanups are initially motivated by the report
>> of race in membarrier, which can load p->mm->membarrier_state after mm
>> has been freed (use-after-free).
>> 
> 
> The lot looks good to me; what do you want done with them (them being
> RFC and all) ?

I can either re-send them without the RFC tag, or you can pick them directly
through the scheduler tree.

As you prefer,

Thanks!

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups
  2019-09-23 14:55   ` Mathieu Desnoyers
@ 2019-09-25  8:07     ` Peter Zijlstra
  2019-09-25 15:50       ` Mathieu Desnoyers
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2019-09-25  8:07 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: paulmck, Ingo Molnar, linux-kernel, Oleg Nesterov,
	Eric W. Biederman, Linus Torvalds, Russell King, ARM Linux,
	Chris Metcalf, Chris Lameter, Kirill Tkhai, Mike Galbraith,
	Thomas Gleixner

On Mon, Sep 23, 2019 at 10:55:32AM -0400, Mathieu Desnoyers wrote:
> ----- On Sep 23, 2019, at 5:06 AM, Peter Zijlstra peterz@infradead.org wrote:
> 
> > On Thu, Sep 19, 2019 at 01:36:58PM -0400, Mathieu Desnoyers wrote:
> >> Hi,
> >> 
> >> Those series of fixes and cleanups are initially motivated by the report
> >> of race in membarrier, which can load p->mm->membarrier_state after mm
> >> has been freed (use-after-free).
> >> 
> > 
> > The lot looks good to me; what do you want done with them (them being
> > RFC and all) ?
> 
> I can either re-send them without the RFC tag, or you can pick them directly
> through the scheduler tree.

I've picked them up (and fixed them up, they didn't apply to tip) and
merge them with Eric's task_rcu_dereference() patches.

I'll push it out in a bit.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups
  2019-09-25  8:07     ` Peter Zijlstra
@ 2019-09-25 15:50       ` Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: Mathieu Desnoyers @ 2019-09-25 15:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: paulmck, Ingo Molnar, linux-kernel, Oleg Nesterov,
	Eric W. Biederman, Linus Torvalds, Russell King, ARM Linux,
	Chris Metcalf, Chris Lameter, Kirill Tkhai, Mike Galbraith,
	Thomas Gleixner

----- On Sep 25, 2019, at 4:07 AM, Peter Zijlstra peterz@infradead.org wrote:

> On Mon, Sep 23, 2019 at 10:55:32AM -0400, Mathieu Desnoyers wrote:
>> ----- On Sep 23, 2019, at 5:06 AM, Peter Zijlstra peterz@infradead.org wrote:
>> 
>> > On Thu, Sep 19, 2019 at 01:36:58PM -0400, Mathieu Desnoyers wrote:
>> >> Hi,
>> >> 
>> >> Those series of fixes and cleanups are initially motivated by the report
>> >> of race in membarrier, which can load p->mm->membarrier_state after mm
>> >> has been freed (use-after-free).
>> >> 
>> > 
>> > The lot looks good to me; what do you want done with them (them being
>> > RFC and all) ?
>> 
>> I can either re-send them without the RFC tag, or you can pick them directly
>> through the scheduler tree.
> 
> I've picked them up (and fixed them up, they didn't apply to tip) and
> merge them with Eric's task_rcu_dereference() patches.
> 
> I'll push it out in a bit.

Thanks Peter!

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
  2019-09-19 17:37 ` [RFC PATCH for 5.4 7/7] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     c172e0a3e8e65a4c6fffec5bc4d6de08d6f894f7
Gitweb:        https://git.kernel.org/tip/c172e0a3e8e65a4c6fffec5bc4d6de08d6f894f7
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:05 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

sched/membarrier: Return -ENOMEM to userspace on memory allocation failure

Remove the IPI fallback code from membarrier to deal with very
infrequent cpumask memory allocation failure. Use GFP_KERNEL rather
than GFP_NOWAIT, and relax the blocking guarantees for the expedited
membarrier system call commands, allowing it to block if waiting for
memory to be made available.

In addition, now -ENOMEM can be returned to user-space if the cpumask
memory allocation fails.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-8-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 63 ++++++++++++--------------------------
 1 file changed, 20 insertions(+), 43 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index fced54a..a39bed2 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -66,7 +66,6 @@ void membarrier_exec_mmap(struct mm_struct *mm)
 static int membarrier_global_expedited(void)
 {
 	int cpu;
-	bool fallback = false;
 	cpumask_var_t tmpmask;
 
 	if (num_online_cpus() == 1)
@@ -78,15 +77,8 @@ static int membarrier_global_expedited(void)
 	 */
 	smp_mb();	/* system call entry is not a mb. */
 
-	/*
-	 * Expedited membarrier commands guarantee that they won't
-	 * block, hence the GFP_NOWAIT allocation flag and fallback
-	 * implementation.
-	 */
-	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-		/* Fallback for OOM. */
-		fallback = true;
-	}
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
 
 	cpus_read_lock();
 	rcu_read_lock();
@@ -117,18 +109,15 @@ static int membarrier_global_expedited(void)
 		if (p->flags & PF_KTHREAD)
 			continue;
 
-		if (!fallback)
-			__cpumask_set_cpu(cpu, tmpmask);
-		else
-			smp_call_function_single(cpu, ipi_mb, NULL, 1);
+		__cpumask_set_cpu(cpu, tmpmask);
 	}
 	rcu_read_unlock();
-	if (!fallback) {
-		preempt_disable();
-		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-		preempt_enable();
-		free_cpumask_var(tmpmask);
-	}
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
 
 	/*
@@ -143,7 +132,6 @@ static int membarrier_global_expedited(void)
 static int membarrier_private_expedited(int flags)
 {
 	int cpu;
-	bool fallback = false;
 	cpumask_var_t tmpmask;
 	struct mm_struct *mm = current->mm;
 
@@ -168,15 +156,8 @@ static int membarrier_private_expedited(int flags)
 	 */
 	smp_mb();	/* system call entry is not a mb. */
 
-	/*
-	 * Expedited membarrier commands guarantee that they won't
-	 * block, hence the GFP_NOWAIT allocation flag and fallback
-	 * implementation.
-	 */
-	if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
-		/* Fallback for OOM. */
-		fallback = true;
-	}
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
 
 	cpus_read_lock();
 	rcu_read_lock();
@@ -195,20 +176,16 @@ static int membarrier_private_expedited(int flags)
 			continue;
 		rcu_read_lock();
 		p = rcu_dereference(cpu_rq(cpu)->curr);
-		if (p && p->mm == mm) {
-			if (!fallback)
-				__cpumask_set_cpu(cpu, tmpmask);
-			else
-				smp_call_function_single(cpu, ipi_mb, NULL, 1);
-		}
+		if (p && p->mm == mm)
+			__cpumask_set_cpu(cpu, tmpmask);
 	}
 	rcu_read_unlock();
-	if (!fallback) {
-		preempt_disable();
-		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
-		preempt_enable();
-		free_cpumask_var(tmpmask);
-	}
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
 	cpus_read_unlock();
 
 	/*
@@ -264,7 +241,7 @@ static int sync_runqueues_membarrier_state(struct mm_struct *mm)
 		struct rq *rq = cpu_rq(cpu);
 		struct task_struct *p;
 
-		p = rcu_dereference(&rq->curr);
+		p = rcu_dereference(rq->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);
 	}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Skip IPIs when mm->mm_users == 1
  2019-09-19 17:37 ` [RFC PATCH for 5.4 6/7] sched/membarrier: Skip IPIs when mm->mm_users == 1 Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     c6d68c1c4a4d6611fc0f8145d764226571d737ca
Gitweb:        https://git.kernel.org/tip/c6d68c1c4a4d6611fc0f8145d764226571d737ca
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:04 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

sched/membarrier: Skip IPIs when mm->mm_users == 1

If there is only a single mm_user for the mm, the private expedited
membarrier command can skip the IPIs, because only a single thread
is using the mm.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-7-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c |  9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 070cf43..fced54a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -145,20 +145,21 @@ static int membarrier_private_expedited(int flags)
 	int cpu;
 	bool fallback = false;
 	cpumask_var_t tmpmask;
+	struct mm_struct *mm = current->mm;
 
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
-		if (!(atomic_read(&current->mm->membarrier_state) &
+		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
 			return -EPERM;
 	} else {
-		if (!(atomic_read(&current->mm->membarrier_state) &
+		if (!(atomic_read(&mm->membarrier_state) &
 		      MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
 			return -EPERM;
 	}
 
-	if (num_online_cpus() == 1)
+	if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
 		return 0;
 
 	/*
@@ -194,7 +195,7 @@ static int membarrier_private_expedited(int flags)
 			continue;
 		rcu_read_lock();
 		p = rcu_dereference(cpu_rq(cpu)->curr);
-		if (p && p->mm == current->mm) {
+		if (p && p->mm == mm) {
 			if (!fallback)
 				__cpumask_set_cpu(cpu, tmpmask);
 			else

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Remove redundant check
  2019-09-19 17:37 ` [RFC PATCH for 5.4 2/7] Cleanup: sched/membarrier: Remove redundant check Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Oleg Nesterov, Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Paul E. McKenney,
	Russell King - ARM Linux admin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     09554009c0cad4cb2223dd943c813c9257c6883a
Gitweb:        https://git.kernel.org/tip/09554009c0cad4cb2223dd943c813c9257c6883a
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:00 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Remove redundant check

Checking that the number of threads is 1 is redundant with checking
mm_users == 1.

No change in functionality intended.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-3-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index d48b95f..7ccbd0e 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -186,7 +186,7 @@ static int membarrier_register_global_expedited(void)
 	    MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) {
+	if (atomic_read(&mm->mm_users) == 1) {
 		/*
 		 * For single mm user, single threaded process, we can
 		 * simply issue a memory barrier after setting
@@ -232,7 +232,7 @@ static int membarrier_register_private_expedited(int flags)
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
 		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
 			  &mm->membarrier_state);
-	if (!(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)) {
+	if (atomic_read(&mm->mm_users) != 1) {
 		/*
 		 * Ensure all future scheduler executions will observe the
 		 * new thread flag state for this process.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Fix private expedited registration check
  2019-09-19 17:36 ` [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     fc0d77387cb5ae883fd774fc559e056a8dde024c
Gitweb:        https://git.kernel.org/tip/fc0d77387cb5ae883fd774fc559e056a8dde024c
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:36:59 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Fix private expedited registration check

Fix a logic flaw in the way membarrier_register_private_expedited()
handles ready state checks for private expedited sync core and private
expedited registrations.

If a private expedited membarrier registration is first performed, and
then a private expedited sync_core registration is performed, the ready
state check will skip the second registration when it really should not.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-2-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b14250a..d48b95f 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -226,7 +226,7 @@ static int membarrier_register_private_expedited(int flags)
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if (atomic_read(&mm->membarrier_state) & state)
+	if ((atomic_read(&mm->membarrier_state) & state) == state)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load
  2019-09-19 17:37 ` [RFC PATCH for 5.4 4/7] Fix: sched/membarrier: p->mm->membarrier_state racy load (v4) Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  2019-10-01  8:44     ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Linus Torvalds, Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Mike Galbraith, Oleg Nesterov, Paul E. McKenney,
	Russell King - ARM Linux admin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
Gitweb:        https://git.kernel.org/tip/227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:02 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Fix p->mm->membarrier_state racy load

The membarrier_state field is located within the mm_struct, which
is not guaranteed to exist when used from runqueue-lock-free iteration
on runqueues by the membarrier system call.

Copy the membarrier_state from the mm_struct into the scheduler runqueue
when the scheduler switches between mm.

When registering membarrier for mm, after setting the registration bit
in the mm membarrier state, issue a synchronize_rcu() to ensure the
scheduler observes the change. In order to take care of the case
where a runqueue keeps executing the target mm without swapping to
other mm, iterate over each runqueue and issue an IPI to copy the
membarrier_state from the mm_struct into each runqueue which have the
same mm which state has just been modified.

Move the mm membarrier_state field closer to pgd in mm_struct to use
a cache line already touched by the scheduler switch_mm.

The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
clear the runqueue's membarrier state in addition to clear the mm
membarrier state, so move its implementation into the scheduler
membarrier code so it can access the runqueue structure.

Add memory barrier in membarrier_exec_mmap() prior to clearing
the membarrier state, ensuring memory accesses executed prior to exec
are not reordered with the stores clearing the membarrier state.

As suggested by Linus, move all membarrier.c RCU read-side locks outside
of the for each cpu loops.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 fs/exec.c                 |   2 +-
 include/linux/mm_types.h  |  14 ++-
 include/linux/sched/mm.h  |   8 +--
 kernel/sched/core.c       |   4 +-
 kernel/sched/membarrier.c | 175 +++++++++++++++++++++++++++----------
 kernel/sched/sched.h      |  34 +++++++-
 6 files changed, 183 insertions(+), 54 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index f7f6a14..555e93c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1033,6 +1033,7 @@ static int exec_mmap(struct mm_struct *mm)
 	}
 	task_lock(tsk);
 	active_mm = tsk->active_mm;
+	membarrier_exec_mmap(mm);
 	tsk->mm = mm;
 	tsk->active_mm = mm;
 	activate_mm(active_mm, mm);
@@ -1825,7 +1826,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	/* execve succeeded */
 	current->fs->in_exec = 0;
 	current->in_execve = 0;
-	membarrier_execve(current);
 	rseq_execve(current);
 	acct_update_integrals(current);
 	task_numa_free(current, false);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 6a7a108..ec9bd3a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -383,6 +383,16 @@ struct mm_struct {
 		unsigned long highest_vm_end;	/* highest vma end address */
 		pgd_t * pgd;
 
+#ifdef CONFIG_MEMBARRIER
+		/**
+		 * @membarrier_state: Flags controlling membarrier behavior.
+		 *
+		 * This field is close to @pgd to hopefully fit in the same
+		 * cache-line, which needs to be touched by switch_mm().
+		 */
+		atomic_t membarrier_state;
+#endif
+
 		/**
 		 * @mm_users: The number of users including userspace.
 		 *
@@ -452,9 +462,7 @@ struct mm_struct {
 		unsigned long flags; /* Must use atomic bitops to access */
 
 		struct core_state *core_state; /* coredumping support */
-#ifdef CONFIG_MEMBARRIER
-		atomic_t membarrier_state;
-#endif
+
 #ifdef CONFIG_AIO
 		spinlock_t			ioctx_lock;
 		struct kioctx_table __rcu	*ioctx_table;
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 8557ec6..e677001 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -370,10 +370,8 @@ static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
 	sync_core_before_usermode();
 }
 
-static inline void membarrier_execve(struct task_struct *t)
-{
-	atomic_set(&t->mm->membarrier_state, 0);
-}
+extern void membarrier_exec_mmap(struct mm_struct *mm);
+
 #else
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
@@ -382,7 +380,7 @@ static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
 {
 }
 #endif
-static inline void membarrier_execve(struct task_struct *t)
+static inline void membarrier_exec_mmap(struct mm_struct *mm)
 {
 }
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84c7116..2d9a394 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3358,15 +3358,15 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		else
 			prev->active_mm = NULL;
 	} else {                                        // to user
+		membarrier_switch_mm(rq, prev->active_mm, next->mm);
 		/*
 		 * sys_membarrier() requires an smp_mb() between setting
-		 * rq->curr and returning to userspace.
+		 * rq->curr / membarrier_switch_mm() and returning to userspace.
 		 *
 		 * The below provides this either through switch_mm(), or in
 		 * case 'prev->active_mm == next->mm' through
 		 * finish_task_switch()'s mmdrop().
 		 */
-
 		switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
 		if (!prev->mm) {                        // from kernel
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 7ccbd0e..070cf43 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -30,6 +30,39 @@ static void ipi_mb(void *info)
 	smp_mb();	/* IPIs should be serializing but paranoid. */
 }
 
+static void ipi_sync_rq_state(void *info)
+{
+	struct mm_struct *mm = (struct mm_struct *) info;
+
+	if (current->mm != mm)
+		return;
+	this_cpu_write(runqueues.membarrier_state,
+		       atomic_read(&mm->membarrier_state));
+	/*
+	 * Issue a memory barrier after setting
+	 * MEMBARRIER_STATE_GLOBAL_EXPEDITED in the current runqueue to
+	 * guarantee that no memory access following registration is reordered
+	 * before registration.
+	 */
+	smp_mb();
+}
+
+void membarrier_exec_mmap(struct mm_struct *mm)
+{
+	/*
+	 * Issue a memory barrier before clearing membarrier_state to
+	 * guarantee that no memory access prior to exec is reordered after
+	 * clearing this state.
+	 */
+	smp_mb();
+	atomic_set(&mm->membarrier_state, 0);
+	/*
+	 * Keep the runqueue membarrier_state in sync with this mm
+	 * membarrier_state.
+	 */
+	this_cpu_write(runqueues.membarrier_state, 0);
+}
+
 static int membarrier_global_expedited(void)
 {
 	int cpu;
@@ -56,6 +89,7 @@ static int membarrier_global_expedited(void)
 	}
 
 	cpus_read_lock();
+	rcu_read_lock();
 	for_each_online_cpu(cpu) {
 		struct task_struct *p;
 
@@ -70,17 +104,25 @@ static int membarrier_global_expedited(void)
 		if (cpu == raw_smp_processor_id())
 			continue;
 
-		rcu_read_lock();
+		if (!(READ_ONCE(cpu_rq(cpu)->membarrier_state) &
+		    MEMBARRIER_STATE_GLOBAL_EXPEDITED))
+			continue;
+
+		/*
+		 * Skip the CPU if it runs a kernel thread. The scheduler
+		 * leaves the prior task mm in place as an optimization when
+		 * scheduling a kthread.
+		 */
 		p = rcu_dereference(cpu_rq(cpu)->curr);
-		if (p && p->mm && (atomic_read(&p->mm->membarrier_state) &
-				   MEMBARRIER_STATE_GLOBAL_EXPEDITED)) {
-			if (!fallback)
-				__cpumask_set_cpu(cpu, tmpmask);
-			else
-				smp_call_function_single(cpu, ipi_mb, NULL, 1);
-		}
-		rcu_read_unlock();
+		if (p->flags & PF_KTHREAD)
+			continue;
+
+		if (!fallback)
+			__cpumask_set_cpu(cpu, tmpmask);
+		else
+			smp_call_function_single(cpu, ipi_mb, NULL, 1);
 	}
+	rcu_read_unlock();
 	if (!fallback) {
 		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
@@ -136,6 +178,7 @@ static int membarrier_private_expedited(int flags)
 	}
 
 	cpus_read_lock();
+	rcu_read_lock();
 	for_each_online_cpu(cpu) {
 		struct task_struct *p;
 
@@ -157,8 +200,8 @@ static int membarrier_private_expedited(int flags)
 			else
 				smp_call_function_single(cpu, ipi_mb, NULL, 1);
 		}
-		rcu_read_unlock();
 	}
+	rcu_read_unlock();
 	if (!fallback) {
 		preempt_disable();
 		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
@@ -177,32 +220,78 @@ static int membarrier_private_expedited(int flags)
 	return 0;
 }
 
+static int sync_runqueues_membarrier_state(struct mm_struct *mm)
+{
+	int membarrier_state = atomic_read(&mm->membarrier_state);
+	cpumask_var_t tmpmask;
+	int cpu;
+
+	if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1) {
+		this_cpu_write(runqueues.membarrier_state, membarrier_state);
+
+		/*
+		 * For single mm user, we can simply issue a memory barrier
+		 * after setting MEMBARRIER_STATE_GLOBAL_EXPEDITED in the
+		 * mm and in the current runqueue to guarantee that no memory
+		 * access following registration is reordered before
+		 * registration.
+		 */
+		smp_mb();
+		return 0;
+	}
+
+	if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+		return -ENOMEM;
+
+	/*
+	 * For mm with multiple users, we need to ensure all future
+	 * scheduler executions will observe @mm's new membarrier
+	 * state.
+	 */
+	synchronize_rcu();
+
+	/*
+	 * For each cpu runqueue, if the task's mm match @mm, ensure that all
+	 * @mm's membarrier state set bits are also set in in the runqueue's
+	 * membarrier state. This ensures that a runqueue scheduling
+	 * between threads which are users of @mm has its membarrier state
+	 * updated.
+	 */
+	cpus_read_lock();
+	rcu_read_lock();
+	for_each_online_cpu(cpu) {
+		struct rq *rq = cpu_rq(cpu);
+		struct task_struct *p;
+
+		p = rcu_dereference(&rq->curr);
+		if (p && p->mm == mm)
+			__cpumask_set_cpu(cpu, tmpmask);
+	}
+	rcu_read_unlock();
+
+	preempt_disable();
+	smp_call_function_many(tmpmask, ipi_sync_rq_state, mm, 1);
+	preempt_enable();
+
+	free_cpumask_var(tmpmask);
+	cpus_read_unlock();
+
+	return 0;
+}
+
 static int membarrier_register_global_expedited(void)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
+	int ret;
 
 	if (atomic_read(&mm->membarrier_state) &
 	    MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY)
 		return 0;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED, &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) == 1) {
-		/*
-		 * For single mm user, single threaded process, we can
-		 * simply issue a memory barrier after setting
-		 * MEMBARRIER_STATE_GLOBAL_EXPEDITED to guarantee that
-		 * no memory access following registration is reordered
-		 * before registration.
-		 */
-		smp_mb();
-	} else {
-		/*
-		 * For multi-mm user threads, we need to ensure all
-		 * future scheduler executions will observe the new
-		 * thread flag state for this mm.
-		 */
-		synchronize_rcu();
-	}
+	ret = sync_runqueues_membarrier_state(mm);
+	if (ret)
+		return ret;
 	atomic_or(MEMBARRIER_STATE_GLOBAL_EXPEDITED_READY,
 		  &mm->membarrier_state);
 
@@ -213,12 +302,15 @@ static int membarrier_register_private_expedited(int flags)
 {
 	struct task_struct *p = current;
 	struct mm_struct *mm = p->mm;
-	int state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY;
+	int ready_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY,
+	    set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
+	    ret;
 
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
 		if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
 			return -EINVAL;
-		state = MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+		ready_state =
+			MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
 	}
 
 	/*
@@ -226,20 +318,15 @@ static int membarrier_register_private_expedited(int flags)
 	 * groups, which use the same mm. (CLONE_VM but not
 	 * CLONE_THREAD).
 	 */
-	if ((atomic_read(&mm->membarrier_state) & state) == state)
+	if ((atomic_read(&mm->membarrier_state) & ready_state) == ready_state)
 		return 0;
-	atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED, &mm->membarrier_state);
 	if (flags & MEMBARRIER_FLAG_SYNC_CORE)
-		atomic_or(MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE,
-			  &mm->membarrier_state);
-	if (atomic_read(&mm->mm_users) != 1) {
-		/*
-		 * Ensure all future scheduler executions will observe the
-		 * new thread flag state for this process.
-		 */
-		synchronize_rcu();
-	}
-	atomic_or(state, &mm->membarrier_state);
+		set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+	atomic_or(set_state, &mm->membarrier_state);
+	ret = sync_runqueues_membarrier_state(mm);
+	if (ret)
+		return ret;
+	atomic_or(ready_state, &mm->membarrier_state);
 
 	return 0;
 }
@@ -253,8 +340,10 @@ static int membarrier_register_private_expedited(int flags)
  * command specified does not exist, not available on the running
  * kernel, or if the command argument is invalid, this system call
  * returns -EINVAL. For a given command, with flags argument set to 0,
- * this system call is guaranteed to always return the same value until
- * reboot.
+ * if this system call returns -ENOSYS or -EINVAL, it is guaranteed to
+ * always return the same value until reboot. In addition, it can return
+ * -ENOMEM if there is not enough memory available to perform the system
+ * call.
  *
  * All memory accesses performed in program order from each targeted thread
  * is guaranteed to be ordered with respect to sys_membarrier(). If we use
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b3cb895..0db2c1b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -911,6 +911,10 @@ struct rq {
 
 	atomic_t		nr_iowait;
 
+#ifdef CONFIG_MEMBARRIER
+	int membarrier_state;
+#endif
+
 #ifdef CONFIG_SMP
 	struct root_domain		*rd;
 	struct sched_domain __rcu	*sd;
@@ -2438,3 +2442,33 @@ static inline bool sched_energy_enabled(void)
 static inline bool sched_energy_enabled(void) { return false; }
 
 #endif /* CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL */
+
+#ifdef CONFIG_MEMBARRIER
+/*
+ * The scheduler provides memory barriers required by membarrier between:
+ * - prior user-space memory accesses and store to rq->membarrier_state,
+ * - store to rq->membarrier_state and following user-space memory accesses.
+ * In the same way it provides those guarantees around store to rq->curr.
+ */
+static inline void membarrier_switch_mm(struct rq *rq,
+					struct mm_struct *prev_mm,
+					struct mm_struct *next_mm)
+{
+	int membarrier_state;
+
+	if (prev_mm == next_mm)
+		return;
+
+	membarrier_state = atomic_read(&next_mm->membarrier_state);
+	if (READ_ONCE(rq->membarrier_state) == membarrier_state)
+		return;
+
+	WRITE_ONCE(rq->membarrier_state, membarrier_state);
+}
+#else
+static inline void membarrier_switch_mm(struct rq *rq,
+					struct mm_struct *prev_mm,
+					struct mm_struct *next_mm)
+{
+}
+#endif

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] sched/membarrier: Call sync_core only before usermode for same mm
  2019-09-19 17:37 ` [RFC PATCH for 5.4 3/7] Cleanup: sched/membarrier: Only sync_core before usermode for same mm Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Oleg Nesterov, Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Paul E. McKenney,
	Russell King - ARM Linux admin, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     2840cf02fae627860156737e83326df354ee4ec6
Gitweb:        https://git.kernel.org/tip/2840cf02fae627860156737e83326df354ee4ec6
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:01 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00

sched/membarrier: Call sync_core only before usermode for same mm

When the prev and next task's mm change, switch_mm() provides the core
serializing guarantees before returning to usermode. The only case
where an explicit core serialization is needed is when the scheduler
keeps the same mm for prev and next.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-4-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched/mm.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 4a79440..8557ec6 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -362,6 +362,8 @@ enum {
 
 static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm)
 {
+	if (current->mm != mm)
+		return;
 	if (likely(!(atomic_read(&mm->membarrier_state) &
 		     MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE)))
 		return;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] selftests, sched/membarrier: Add multi-threaded test
  2019-09-19 17:37 ` [RFC PATCH for 5.4 5/7] selftests: sched/membarrier: Add multi-threaded test Mathieu Desnoyers
@ 2019-09-27  8:10   ` tip-bot2 for Mathieu Desnoyers
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Mathieu Desnoyers @ 2019-09-27  8:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mathieu Desnoyers, Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Linus Torvalds, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin, Shuah Khan,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     19a4ff534bb09686f53800564cb977bad2177c00
Gitweb:        https://git.kernel.org/tip/19a4ff534bb09686f53800564cb977bad2177c00
Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
AuthorDate:    Thu, 19 Sep 2019 13:37:03 -04:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:31 +02:00

selftests, sched/membarrier: Add multi-threaded test

membarrier commands cover very different code paths if they are in
a single-threaded vs multi-threaded process. Therefore, exercise both
scenarios in the kernel selftests to increase coverage of this selftest.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190919173705.2181-6-mathieu.desnoyers@efficios.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 tools/testing/selftests/membarrier/.gitignore                      |   3 +-
 tools/testing/selftests/membarrier/Makefile                        |   5 +-
 tools/testing/selftests/membarrier/membarrier_test.c               | 313 +---------------------------------------------------------------------
 tools/testing/selftests/membarrier/membarrier_test_impl.h          | 317 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c  |  73 ++++++++++++++++-
 tools/testing/selftests/membarrier/membarrier_test_single_thread.c |  24 +++++-
 6 files changed, 419 insertions(+), 316 deletions(-)
 delete mode 100644 tools/testing/selftests/membarrier/membarrier_test.c
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_impl.h
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
 create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c

diff --git a/tools/testing/selftests/membarrier/.gitignore b/tools/testing/selftests/membarrier/.gitignore
index 020c44f..f2f7ec0 100644
--- a/tools/testing/selftests/membarrier/.gitignore
+++ b/tools/testing/selftests/membarrier/.gitignore
@@ -1 +1,2 @@
-membarrier_test
+membarrier_test_multi_thread
+membarrier_test_single_thread
diff --git a/tools/testing/selftests/membarrier/Makefile b/tools/testing/selftests/membarrier/Makefile
index 97e3bdf..34d1c81 100644
--- a/tools/testing/selftests/membarrier/Makefile
+++ b/tools/testing/selftests/membarrier/Makefile
@@ -1,7 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -g -I../../../../usr/include/
+LDLIBS += -lpthread
 
-TEST_GEN_PROGS := membarrier_test
+TEST_GEN_PROGS := membarrier_test_single_thread \
+		membarrier_test_multi_thread
 
 include ../lib.mk
-
diff --git a/tools/testing/selftests/membarrier/membarrier_test.c b/tools/testing/selftests/membarrier/membarrier_test.c
deleted file mode 100644
index 70b4ddb..0000000
--- a/tools/testing/selftests/membarrier/membarrier_test.c
+++ /dev/null
@@ -1,313 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#define _GNU_SOURCE
-#include <linux/membarrier.h>
-#include <syscall.h>
-#include <stdio.h>
-#include <errno.h>
-#include <string.h>
-
-#include "../kselftest.h"
-
-static int sys_membarrier(int cmd, int flags)
-{
-	return syscall(__NR_membarrier, cmd, flags);
-}
-
-static int test_membarrier_cmd_fail(void)
-{
-	int cmd = -1, flags = 0;
-	const char *test_name = "sys membarrier invalid command";
-
-	if (sys_membarrier(cmd, flags) != -1) {
-		ksft_exit_fail_msg(
-			"%s test: command = %d, flags = %d. Should fail, but passed\n",
-			test_name, cmd, flags);
-	}
-	if (errno != EINVAL) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
-			test_name, flags, EINVAL, strerror(EINVAL),
-			errno, strerror(errno));
-	}
-
-	ksft_test_result_pass(
-		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
-		test_name, cmd, flags, errno);
-	return 0;
-}
-
-static int test_membarrier_flags_fail(void)
-{
-	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
-
-	if (sys_membarrier(cmd, flags) != -1) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should fail, but passed\n",
-			test_name, flags);
-	}
-	if (errno != EINVAL) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
-			test_name, flags, EINVAL, strerror(EINVAL),
-			errno, strerror(errno));
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d, errno = %d. Failed as expected\n",
-		test_name, flags, errno);
-	return 0;
-}
-
-static int test_membarrier_global_success(void)
-{
-	int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n", test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_private_expedited_fail(void)
-{
-	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
-
-	if (sys_membarrier(cmd, flags) != -1) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should fail, but passed\n",
-			test_name, flags);
-	}
-	if (errno != EPERM) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
-			test_name, flags, EPERM, strerror(EPERM),
-			errno, strerror(errno));
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d, errno = %d\n",
-		test_name, flags, errno);
-	return 0;
-}
-
-static int test_membarrier_register_private_expedited_success(void)
-{
-	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_private_expedited_success(void)
-{
-	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_private_expedited_sync_core_fail(void)
-{
-	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
-
-	if (sys_membarrier(cmd, flags) != -1) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should fail, but passed\n",
-			test_name, flags);
-	}
-	if (errno != EPERM) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
-			test_name, flags, EPERM, strerror(EPERM),
-			errno, strerror(errno));
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d, errno = %d\n",
-		test_name, flags, errno);
-	return 0;
-}
-
-static int test_membarrier_register_private_expedited_sync_core_success(void)
-{
-	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_private_expedited_sync_core_success(void)
-{
-	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_register_global_expedited_success(void)
-{
-	int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier_global_expedited_success(void)
-{
-	int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0;
-	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED";
-
-	if (sys_membarrier(cmd, flags) != 0) {
-		ksft_exit_fail_msg(
-			"%s test: flags = %d, errno = %d\n",
-			test_name, flags, errno);
-	}
-
-	ksft_test_result_pass(
-		"%s test: flags = %d\n",
-		test_name, flags);
-	return 0;
-}
-
-static int test_membarrier(void)
-{
-	int status;
-
-	status = test_membarrier_cmd_fail();
-	if (status)
-		return status;
-	status = test_membarrier_flags_fail();
-	if (status)
-		return status;
-	status = test_membarrier_global_success();
-	if (status)
-		return status;
-	status = test_membarrier_private_expedited_fail();
-	if (status)
-		return status;
-	status = test_membarrier_register_private_expedited_success();
-	if (status)
-		return status;
-	status = test_membarrier_private_expedited_success();
-	if (status)
-		return status;
-	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
-	if (status < 0) {
-		ksft_test_result_fail("sys_membarrier() failed\n");
-		return status;
-	}
-	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
-		status = test_membarrier_private_expedited_sync_core_fail();
-		if (status)
-			return status;
-		status = test_membarrier_register_private_expedited_sync_core_success();
-		if (status)
-			return status;
-		status = test_membarrier_private_expedited_sync_core_success();
-		if (status)
-			return status;
-	}
-	/*
-	 * It is valid to send a global membarrier from a non-registered
-	 * process.
-	 */
-	status = test_membarrier_global_expedited_success();
-	if (status)
-		return status;
-	status = test_membarrier_register_global_expedited_success();
-	if (status)
-		return status;
-	status = test_membarrier_global_expedited_success();
-	if (status)
-		return status;
-	return 0;
-}
-
-static int test_membarrier_query(void)
-{
-	int flags = 0, ret;
-
-	ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags);
-	if (ret < 0) {
-		if (errno == ENOSYS) {
-			/*
-			 * It is valid to build a kernel with
-			 * CONFIG_MEMBARRIER=n. However, this skips the tests.
-			 */
-			ksft_exit_skip(
-				"sys membarrier (CONFIG_MEMBARRIER) is disabled.\n");
-		}
-		ksft_exit_fail_msg("sys_membarrier() failed\n");
-	}
-	if (!(ret & MEMBARRIER_CMD_GLOBAL))
-		ksft_exit_skip(
-			"sys_membarrier unsupported: CMD_GLOBAL not found.\n");
-
-	ksft_test_result_pass("sys_membarrier available\n");
-	return 0;
-}
-
-int main(int argc, char **argv)
-{
-	ksft_print_header();
-	ksft_set_plan(13);
-
-	test_membarrier_query();
-	test_membarrier();
-
-	return ksft_exit_pass();
-}
diff --git a/tools/testing/selftests/membarrier/membarrier_test_impl.h b/tools/testing/selftests/membarrier/membarrier_test_impl.h
new file mode 100644
index 0000000..186be69
--- /dev/null
+++ b/tools/testing/selftests/membarrier/membarrier_test_impl.h
@@ -0,0 +1,317 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "../kselftest.h"
+
+static int sys_membarrier(int cmd, int flags)
+{
+	return syscall(__NR_membarrier, cmd, flags);
+}
+
+static int test_membarrier_cmd_fail(void)
+{
+	int cmd = -1, flags = 0;
+	const char *test_name = "sys membarrier invalid command";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: command = %d, flags = %d. Should fail, but passed\n",
+			test_name, cmd, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: command = %d, flags = %d, errno = %d. Failed as expected\n",
+		test_name, cmd, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_flags_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_QUERY, flags = 1;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_QUERY invalid flags";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EINVAL) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EINVAL, strerror(EINVAL),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d. Failed as expected\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_global_success(void)
+{
+	int cmd = MEMBARRIER_CMD_GLOBAL, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n", test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_fail(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE not registered failure";
+
+	if (sys_membarrier(cmd, flags) != -1) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should fail, but passed\n",
+			test_name, flags);
+	}
+	if (errno != EPERM) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d. Should return (%d: \"%s\"), but returned (%d: \"%s\").\n",
+			test_name, flags, EPERM, strerror(EPERM),
+			errno, strerror(errno));
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d, errno = %d\n",
+		test_name, flags, errno);
+	return 0;
+}
+
+static int test_membarrier_register_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_private_expedited_sync_core_success(void)
+{
+	int cmd = MEMBARRIER_CMD_PRIVATE_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_register_global_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_global_expedited_success(void)
+{
+	int cmd = MEMBARRIER_CMD_GLOBAL_EXPEDITED, flags = 0;
+	const char *test_name = "sys membarrier MEMBARRIER_CMD_GLOBAL_EXPEDITED";
+
+	if (sys_membarrier(cmd, flags) != 0) {
+		ksft_exit_fail_msg(
+			"%s test: flags = %d, errno = %d\n",
+			test_name, flags, errno);
+	}
+
+	ksft_test_result_pass(
+		"%s test: flags = %d\n",
+		test_name, flags);
+	return 0;
+}
+
+static int test_membarrier_fail(void)
+{
+	int status;
+
+	status = test_membarrier_cmd_fail();
+	if (status)
+		return status;
+	status = test_membarrier_flags_fail();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_fail();
+	if (status)
+		return status;
+	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+	if (status < 0) {
+		ksft_test_result_fail("sys_membarrier() failed\n");
+		return status;
+	}
+	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+		status = test_membarrier_private_expedited_sync_core_fail();
+		if (status)
+			return status;
+	}
+	return 0;
+}
+
+static int test_membarrier_success(void)
+{
+	int status;
+
+	status = test_membarrier_global_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_private_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_private_expedited_success();
+	if (status)
+		return status;
+	status = sys_membarrier(MEMBARRIER_CMD_QUERY, 0);
+	if (status < 0) {
+		ksft_test_result_fail("sys_membarrier() failed\n");
+		return status;
+	}
+	if (status & MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE) {
+		status = test_membarrier_register_private_expedited_sync_core_success();
+		if (status)
+			return status;
+		status = test_membarrier_private_expedited_sync_core_success();
+		if (status)
+			return status;
+	}
+	/*
+	 * It is valid to send a global membarrier from a non-registered
+	 * process.
+	 */
+	status = test_membarrier_global_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_register_global_expedited_success();
+	if (status)
+		return status;
+	status = test_membarrier_global_expedited_success();
+	if (status)
+		return status;
+	return 0;
+}
+
+static int test_membarrier_query(void)
+{
+	int flags = 0, ret;
+
+	ret = sys_membarrier(MEMBARRIER_CMD_QUERY, flags);
+	if (ret < 0) {
+		if (errno == ENOSYS) {
+			/*
+			 * It is valid to build a kernel with
+			 * CONFIG_MEMBARRIER=n. However, this skips the tests.
+			 */
+			ksft_exit_skip(
+				"sys membarrier (CONFIG_MEMBARRIER) is disabled.\n");
+		}
+		ksft_exit_fail_msg("sys_membarrier() failed\n");
+	}
+	if (!(ret & MEMBARRIER_CMD_GLOBAL))
+		ksft_exit_skip(
+			"sys_membarrier unsupported: CMD_GLOBAL not found.\n");
+
+	ksft_test_result_pass("sys_membarrier available\n");
+	return 0;
+}
diff --git a/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
new file mode 100644
index 0000000..ac5613e
--- /dev/null
+++ b/tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+static int thread_ready, thread_quit;
+static pthread_mutex_t test_membarrier_thread_mutex =
+	PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t test_membarrier_thread_cond =
+	PTHREAD_COND_INITIALIZER;
+
+void *test_membarrier_thread(void *arg)
+{
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	thread_ready = 1;
+	pthread_cond_broadcast(&test_membarrier_thread_cond);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	while (!thread_quit)
+		pthread_cond_wait(&test_membarrier_thread_cond,
+				  &test_membarrier_thread_mutex);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	return NULL;
+}
+
+static int test_mt_membarrier(void)
+{
+	int i;
+	pthread_t test_thread;
+
+	pthread_create(&test_thread, NULL,
+		       test_membarrier_thread, NULL);
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	while (!thread_ready)
+		pthread_cond_wait(&test_membarrier_thread_cond,
+				  &test_membarrier_thread_mutex);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	test_membarrier_fail();
+
+	test_membarrier_success();
+
+	pthread_mutex_lock(&test_membarrier_thread_mutex);
+	thread_quit = 1;
+	pthread_cond_broadcast(&test_membarrier_thread_cond);
+	pthread_mutex_unlock(&test_membarrier_thread_mutex);
+
+	pthread_join(test_thread, NULL);
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+	ksft_set_plan(13);
+
+	test_membarrier_query();
+
+	/* Multi-threaded */
+	test_mt_membarrier();
+
+	return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/membarrier/membarrier_test_single_thread.c b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c
new file mode 100644
index 0000000..c1c9639
--- /dev/null
+++ b/tools/testing/selftests/membarrier/membarrier_test_single_thread.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <linux/membarrier.h>
+#include <syscall.h>
+#include <stdio.h>
+#include <errno.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "membarrier_test_impl.h"
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+	ksft_set_plan(13);
+
+	test_membarrier_query();
+
+	test_membarrier_fail();
+
+	test_membarrier_success();
+
+	return ksft_exit_pass();
+}

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load
  2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load tip-bot2 for Mathieu Desnoyers
@ 2019-10-01  8:44     ` Ingo Molnar
  2019-10-01  8:50       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2019-10-01  8:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Linus Torvalds, Mathieu Desnoyers,
	Peter Zijlstra (Intel),
	Chris Metcalf, Christoph Lameter, Eric W. Biederman,
	Kirill Tkhai, Mike Galbraith, Oleg Nesterov, Paul E. McKenney,
	Russell King - ARM Linux admin, Thomas Gleixner, Borislav Petkov


* tip-bot2 for Mathieu Desnoyers <tip-bot2@linutronix.de> wrote:

> The following commit has been merged into the sched/urgent branch of tip:
> 
> Commit-ID:     227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
> Gitweb:        https://git.kernel.org/tip/227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
> Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> AuthorDate:    Thu, 19 Sep 2019 13:37:02 -04:00
> Committer:     Ingo Molnar <mingo@kernel.org>
> CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00
> 
> sched/membarrier: Fix p->mm->membarrier_state racy load

> +	rcu_read_unlock();
>  	if (!fallback) {
>  		preempt_disable();
>  		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> @@ -136,6 +178,7 @@ static int membarrier_private_expedited(int flags)
>  	}
>  
>  	cpus_read_lock();
> +	rcu_read_lock();
>  	for_each_online_cpu(cpu) {
>  		struct task_struct *p;
>  
> @@ -157,8 +200,8 @@ static int membarrier_private_expedited(int flags)
>  			else
>  				smp_call_function_single(cpu, ipi_mb, NULL, 1);
>  		}
> -		rcu_read_unlock();
>  	}
> +	rcu_read_unlock();

I noticed this too late, but the locking in this part is now bogus:

	rcu_read_lock();
	for_each_online_cpu(cpu) {
		struct task_struct *p;

		/*
		 * Skipping the current CPU is OK even through we can be
		 * migrated at any point. The current CPU, at the point
		 * where we read raw_smp_processor_id(), is ensured to
		 * be in program order with respect to the caller
		 * thread. Therefore, we can skip this CPU from the
		 * iteration.
		 */
		if (cpu == raw_smp_processor_id())
			continue;
		rcu_read_lock();
		p = rcu_dereference(cpu_rq(cpu)->curr);
		if (p && p->mm == mm)
			__cpumask_set_cpu(cpu, tmpmask);
	}
	rcu_read_unlock();

Note the double rcu_read_lock() ....

This bug is now upstream, so requires an urgent fix, as it should be 
trivial to trigger with pretty much any membarrier user.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load
  2019-10-01  8:44     ` Ingo Molnar
@ 2019-10-01  8:50       ` Peter Zijlstra
  2019-10-01 19:34         ` [tip: sched/urgent] membarrier: Fix RCU locking bug caused by faulty merge tip-bot2 for Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2019-10-01  8:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-tip-commits, Linus Torvalds,
	Mathieu Desnoyers, Chris Metcalf, Christoph Lameter,
	Eric W. Biederman, Kirill Tkhai, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin,
	Thomas Gleixner, Borislav Petkov

On Tue, Oct 01, 2019 at 10:44:05AM +0200, Ingo Molnar wrote:
> 
> * tip-bot2 for Mathieu Desnoyers <tip-bot2@linutronix.de> wrote:
> 
> > The following commit has been merged into the sched/urgent branch of tip:
> > 
> > Commit-ID:     227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
> > Gitweb:        https://git.kernel.org/tip/227a4aadc75ba22fcb6c4e1c078817b8cbaae4ce
> > Author:        Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > AuthorDate:    Thu, 19 Sep 2019 13:37:02 -04:00
> > Committer:     Ingo Molnar <mingo@kernel.org>
> > CommitterDate: Wed, 25 Sep 2019 17:42:30 +02:00
> > 
> > sched/membarrier: Fix p->mm->membarrier_state racy load
> 
> > +	rcu_read_unlock();
> >  	if (!fallback) {
> >  		preempt_disable();
> >  		smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> > @@ -136,6 +178,7 @@ static int membarrier_private_expedited(int flags)
> >  	}
> >  
> >  	cpus_read_lock();
> > +	rcu_read_lock();
> >  	for_each_online_cpu(cpu) {
> >  		struct task_struct *p;
> >  
> > @@ -157,8 +200,8 @@ static int membarrier_private_expedited(int flags)
> >  			else
> >  				smp_call_function_single(cpu, ipi_mb, NULL, 1);
> >  		}
> > -		rcu_read_unlock();
> >  	}
> > +	rcu_read_unlock();
> 
> I noticed this too late, but the locking in this part is now bogus:
> 
> 	rcu_read_lock();
> 	for_each_online_cpu(cpu) {
> 		struct task_struct *p;
> 
> 		/*
> 		 * Skipping the current CPU is OK even through we can be
> 		 * migrated at any point. The current CPU, at the point
> 		 * where we read raw_smp_processor_id(), is ensured to
> 		 * be in program order with respect to the caller
> 		 * thread. Therefore, we can skip this CPU from the
> 		 * iteration.
> 		 */
> 		if (cpu == raw_smp_processor_id())
> 			continue;
> 		rcu_read_lock();

Yeah, that one needs to go.

> 		p = rcu_dereference(cpu_rq(cpu)->curr);
> 		if (p && p->mm == mm)
> 			__cpumask_set_cpu(cpu, tmpmask);
> 	}
> 	rcu_read_unlock();
> 
> Note the double rcu_read_lock() ....
> 
> This bug is now upstream, so requires an urgent fix, as it should be 
> trivial to trigger with pretty much any membarrier user.

---
Subject: membarrier: Fix faulty merge

Commit 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy
load") got fat fingered by me when merging it with other patches. It
meant to move the rcu section out of the for loop but ended up doing it
partially, leaving a superfluous rcu_read_lock() inside, causing havok.

Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/membarrier.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a39bed2c784f..168479a7d61b 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -174,7 +174,6 @@ static int membarrier_private_expedited(int flags)
 		 */
 		if (cpu == raw_smp_processor_id())
 			continue;
-		rcu_read_lock();
 		p = rcu_dereference(cpu_rq(cpu)->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: sched/urgent] membarrier: Fix RCU locking bug caused by faulty merge
  2019-10-01  8:50       ` Peter Zijlstra
@ 2019-10-01 19:34         ` tip-bot2 for Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2019-10-01 19:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ingo Molnar, Peter Zijlstra (Intel),
	Borislav Petkov, Chris Metcalf, Christoph Lameter,
	Eric W. Biederman, Kirill Tkhai, Linus Torvalds,
	Mathieu Desnoyers, Mike Galbraith, Oleg Nesterov,
	Paul E. McKenney, Russell King - ARM Linux admin,
	Thomas Gleixner, linux-tip-commits, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     73956fc07dd7b25d4a33ab3fdd6247c60d0b237c
Gitweb:        https://git.kernel.org/tip/73956fc07dd7b25d4a33ab3fdd6247c60d0b237c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Tue, 01 Oct 2019 10:50:33 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 01 Oct 2019 21:27:50 +02:00

membarrier: Fix RCU locking bug caused by faulty merge

The following commit:

  227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")

got fat fingered by me when merging it with other patches. It meant to move
the RCU section out of the for loop but ended up doing it partially, leaving
a superfluous rcu_read_lock() inside, causing havok.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-tip-commits@vger.kernel.org
Fixes: 227a4aadc75b ("sched/membarrier: Fix p->mm->membarrier_state racy load")
Link: https://lkml.kernel.org/r/20191001085033.GP4519@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/membarrier.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a39bed2..168479a 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -174,7 +174,6 @@ static int membarrier_private_expedited(int flags)
 		 */
 		if (cpu == raw_smp_processor_id())
 			continue;
-		rcu_read_lock();
 		p = rcu_dereference(cpu_rq(cpu)->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-10-01 19:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19 17:36 [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Mathieu Desnoyers
2019-09-19 17:36 ` [RFC PATCH for 5.4 1/7] Fix: sched/membarrier: Private expedited registration check Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Fix private " tip-bot2 for Mathieu Desnoyers
2019-09-19 17:37 ` [RFC PATCH for 5.4 2/7] Cleanup: sched/membarrier: Remove redundant check Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
2019-09-19 17:37 ` [RFC PATCH for 5.4 3/7] Cleanup: sched/membarrier: Only sync_core before usermode for same mm Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Call sync_core only " tip-bot2 for Mathieu Desnoyers
2019-09-19 17:37 ` [RFC PATCH for 5.4 4/7] Fix: sched/membarrier: p->mm->membarrier_state racy load (v4) Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] sched/membarrier: Fix p->mm->membarrier_state racy load tip-bot2 for Mathieu Desnoyers
2019-10-01  8:44     ` Ingo Molnar
2019-10-01  8:50       ` Peter Zijlstra
2019-10-01 19:34         ` [tip: sched/urgent] membarrier: Fix RCU locking bug caused by faulty merge tip-bot2 for Peter Zijlstra
2019-09-19 17:37 ` [RFC PATCH for 5.4 5/7] selftests: sched/membarrier: Add multi-threaded test Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] selftests, " tip-bot2 for Mathieu Desnoyers
2019-09-19 17:37 ` [RFC PATCH for 5.4 6/7] sched/membarrier: Skip IPIs when mm->mm_users == 1 Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
2019-09-19 17:37 ` [RFC PATCH for 5.4 7/7] sched/membarrier: Return -ENOMEM to userspace on memory allocation failure Mathieu Desnoyers
2019-09-27  8:10   ` [tip: sched/urgent] " tip-bot2 for Mathieu Desnoyers
2019-09-23  9:06 ` [RFC PATCH for 5.4 0/7] Membarrier fixes and cleanups Peter Zijlstra
2019-09-23 14:55   ` Mathieu Desnoyers
2019-09-25  8:07     ` Peter Zijlstra
2019-09-25 15:50       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).