[RFC PATCH 1/2] sched: Fix exit_mm vs membarrier

* [RFC PATCH 1/2] sched: Fix exit_mm vs membarrier
@ 2020-07-28 16:00 Mathieu Desnoyers
  2020-07-28 16:00 ` [RFC PATCH 2/2] sched: membarrier: cover kthread_use_mm Mathieu Desnoyers
  2020-08-04 14:34 ` [RFC PATCH 1/2] sched: Fix exit_mm vs membarrier peterz
  0 siblings, 2 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2020-07-28 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Mathieu Desnoyers, Will Deacon, Paul E . McKenney,
	Nicholas Piggin, Andy Lutomirski, Thomas Gleixner,
	Linus Torvalds, Alan Stern, linux-mm

exit_mm should issue memory barriers after user-space memory accesses,
before clearing current->mm, to order user-space memory accesses
performed prior to exit_mm before clearing tsk->mm, which has the
effect of skipping the membarrier private expedited IPIs.

The membarrier system call can be issued concurrently with do_exit
if we have thread groups created with CLONE_VM but not CLONE_THREAD.

The following comment in exit_mm() seems rather puzzling though:

        /* more a memory barrier than a real lock */
        task_lock(current);

So considering that spinlocks are not full memory barriers nowadays,
some digging into the origins of this comment led me to 2002, at the
earliest commit tracked by Thomas Gleixner's bitkeeper era's history
at https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/

At that point, this comment was followed by:

+               /* more a memory barrier than a real lock */
+               task_lock(tsk);
+               tsk->mm = NULL;
+               task_unlock(tsk);

Which seems to indicate that grabbing the lock is really about acting as
a memory barrier, but I wonder if it is meant to be a full barrier or
just an acquire.

If a full memory barrier happens to be needed even without membarrier,
perhaps this fix also corrects other issues ? It unclear from the
comment what this memory barrier orders though. Is it the chain
exit -> waitpid, or is it related to entering lazy TLB ?

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mm@kvack.org
---
 kernel/exit.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index 727150f28103..ce272ec55cdc 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -474,6 +474,14 @@ static void exit_mm(void)
 	BUG_ON(mm != current->active_mm);
 	/* more a memory barrier than a real lock */
 	task_lock(current);
+	/*
+	 * When a thread stops operating on an address space, the loop
+	 * in membarrier_{private,global}_expedited() may not observe
+	 * that tsk->mm, and not issue an IPI. Membarrier requires a
+	 * memory barrier after accessing user-space memory, before
+	 * clearing tsk->mm.
+	 */
+	smp_mb();
 	current->mm = NULL;
 	mmap_read_unlock(mm);
 	enter_lazy_tlb(mm, current);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread