All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: x86@kernel.org
Cc: Dave Hansen <dave.hansen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH 1/8] membarrier: Document why membarrier() works
Date: Tue, 15 Jun 2021 20:21:06 -0700	[thread overview]
Message-ID: <b648efcb72feb257b9fe004bd132f581805ec0d6.1623813516.git.luto@kernel.org> (raw)
In-Reply-To: <cover.1623813516.git.luto@kernel.org>

We had a nice comment at the top of membarrier.c explaining why membarrier
worked in a handful of scenarios, but that consisted more of a list of
things not to forget than an actual description of the algorithm and why it
should be expected to work.

Add a comment explaining my understanding of the algorithm.  This exposes a
couple of implementation issues that I will hopefully fix up in subsequent
patches.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/sched/membarrier.c | 55 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b5add64d9698..3173b063d358 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -7,6 +7,61 @@
 #include "sched.h"
 
 /*
+ * The basic principle behind the regular memory barrier mode of membarrier()
+ * is as follows.  For each CPU, membarrier() operates in one of two
+ * modes.  Either it sends an IPI or it does not. If membarrier() sends an
+ * IPI, then we have the following sequence of events:
+ *
+ * 1. membarrier() does smp_mb().
+ * 2. membarrier() does a store (the IPI request payload) that is observed by
+ *    the target CPU.
+ * 3. The target CPU does smp_mb().
+ * 4. The target CPU does a store (the completion indication) that is observed
+ *    by membarrier()'s wait-for-IPIs-to-finish request.
+ * 5. membarrier() does smp_mb().
+ *
+ * So all pre-membarrier() local accesses are visible after the IPI on the
+ * target CPU and all pre-IPI remote accesses are visible after
+ * membarrier(). IOW membarrier() has synchronized both ways with the target
+ * CPU.
+ *
+ * (This has the caveat that membarrier() does not interrupt the CPU that it's
+ * running on at the time it sends the IPIs. However, if that is the CPU on
+ * which membarrier() starts and/or finishes, membarrier() does smp_mb() and,
+ * if not, then membarrier() scheduled, and scheduling had better include a
+ * full barrier somewhere for basic correctness regardless of membarrier.)
+ *
+ * If membarrier() does not send an IPI, this means that membarrier() reads
+ * cpu_rq(cpu)->curr->mm and that the result is not equal to the target
+ * mm.  Let's assume for now that tasks never change their mm field.  The
+ * sequence of events is:
+ *
+ * 1. Target CPU switches away from the target mm (or goes lazy or has never
+ *    run the target mm in the first place). This involves smp_mb() followed
+ *    by a write to cpu_rq(cpu)->curr.
+ * 2. membarrier() does smp_mb(). (This is NOT synchronized with any action
+ *    done by the target.)
+ * 3. membarrier() observes the value written in step 1 and does *not* observe
+ *    the value written in step 5.
+ * 4. membarrier() does smp_mb().
+ * 5. Target CPU switches back to the target mm and writes to
+ *    cpu_rq(cpu)->curr. (This is NOT synchronized with any action on
+ *    membarrier()'s part.)
+ * 6. Target CPU executes smp_mb()
+ *
+ * All pre-schedule accesses on the remote CPU are visible after membarrier()
+ * because they all precede the target's write in step 1 and are synchronized
+ * to the local CPU by steps 3 and 4.  All pre-membarrier() accesses on the
+ * local CPU are visible on the remote CPU after scheduling because they
+ * happen before the smp_mb(); read in steps 2 and 3 and that read preceeds
+ * the target's smp_mb() in step 6.
+ *
+ * However, tasks can change their ->mm, e.g., via kthread_use_mm().  So
+ * tasks that switch their ->mm must follow the same rules as the scheduler
+ * changing rq->curr, and the membarrier() code needs to do both dereferences
+ * carefully.
+ *
+ *
  * For documentation purposes, here are some membarrier ordering
  * scenarios to keep in mind:
  *
-- 
2.31.1


  reply	other threads:[~2021-06-16  3:21 UTC|newest]

Thread overview: 165+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16  3:21 [PATCH 0/8] membarrier cleanups Andy Lutomirski
2021-06-16  3:21 ` Andy Lutomirski [this message]
2021-06-16  4:00   ` [PATCH 1/8] membarrier: Document why membarrier() works Nicholas Piggin
2021-06-16  7:30     ` Peter Zijlstra
2021-06-17 23:45       ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 2/8] x86/mm: Handle unlazying membarrier core sync in the arch code Andy Lutomirski
2021-06-16  4:25   ` Nicholas Piggin
2021-06-16 18:31     ` Andy Lutomirski
2021-06-16 17:49   ` Mathieu Desnoyers
2021-06-16 17:49     ` Mathieu Desnoyers
2021-06-16 18:31     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 3/8] membarrier: Remove membarrier_arch_switch_mm() prototype in core code Andy Lutomirski
2021-06-16  4:26   ` Nicholas Piggin
2021-06-16 17:52   ` Mathieu Desnoyers
2021-06-16 17:52     ` Mathieu Desnoyers
2021-06-16  3:21 ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Andy Lutomirski
2021-06-16  4:19   ` Nicholas Piggin
2021-06-16  7:35     ` Peter Zijlstra
2021-06-16 18:41       ` Andy Lutomirski
2021-06-17  1:37         ` Nicholas Piggin
2021-06-17  2:57           ` Andy Lutomirski
2021-06-17  5:32             ` Andy Lutomirski
2021-06-17  6:51               ` Nicholas Piggin
2021-06-17 23:49                 ` Andy Lutomirski
2021-06-19  2:53                   ` Nicholas Piggin
2021-06-19  3:20                     ` Andy Lutomirski
2021-06-19  4:27                       ` Nicholas Piggin
2021-06-17  9:08               ` [RFC][PATCH] sched: Use lightweight hazard pointers to grab lazy mms Peter Zijlstra
2021-06-17  9:10                 ` Peter Zijlstra
2021-06-17 10:00                   ` Nicholas Piggin
2021-06-17  9:13                 ` Peter Zijlstra
2021-06-17 14:06                   ` Andy Lutomirski
2021-06-17  9:28                 ` Peter Zijlstra
2021-06-17 14:03                   ` Andy Lutomirski
2021-06-17 14:10                 ` Andy Lutomirski
2021-06-17 15:45                   ` Peter Zijlstra
2021-06-18  3:29                 ` Paul E. McKenney
2021-06-18  5:04                   ` Andy Lutomirski
2021-06-17 15:02               ` [PATCH 4/8] membarrier: Make the post-switch-mm barrier explicit Paul E. McKenney
2021-06-18  0:06                 ` Andy Lutomirski
2021-06-18  3:35                   ` Paul E. McKenney
2021-06-17  8:45         ` Peter Zijlstra
2021-06-16  3:21 ` [PATCH 5/8] membarrier, kthread: Use _ONCE accessors for task->mm Andy Lutomirski
2021-06-16  4:28   ` Nicholas Piggin
2021-06-16 18:08   ` Mathieu Desnoyers
2021-06-16 18:08     ` Mathieu Desnoyers
2021-06-16 18:45     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 6/8] powerpc/membarrier: Remove special barrier on mm switch Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  4:36   ` Nicholas Piggin
2021-06-16  4:36     ` Nicholas Piggin
2021-06-16  3:21 ` [PATCH 7/8] membarrier: Remove arm (32) support for SYNC_CORE Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  9:28   ` Russell King (Oracle)
2021-06-16  9:28     ` Russell King (Oracle)
2021-06-16 10:16   ` Peter Zijlstra
2021-06-16 10:16     ` Peter Zijlstra
2021-06-16 10:20     ` Peter Zijlstra
2021-06-16 10:20       ` Peter Zijlstra
2021-06-16 10:34       ` Russell King (Oracle)
2021-06-16 10:34         ` Russell King (Oracle)
2021-06-16 11:10         ` Peter Zijlstra
2021-06-16 11:10           ` Peter Zijlstra
2021-06-16 13:22           ` Russell King (Oracle)
2021-06-16 13:22             ` Russell King (Oracle)
2021-06-16 15:04             ` Catalin Marinas
2021-06-16 15:04               ` Catalin Marinas
2021-06-16 15:23               ` Russell King (Oracle)
2021-06-16 15:23                 ` Russell King (Oracle)
2021-06-16 15:45                 ` Catalin Marinas
2021-06-16 15:45                   ` Catalin Marinas
2021-06-16 16:00                   ` Catalin Marinas
2021-06-16 16:00                     ` Catalin Marinas
2021-06-16 16:27                     ` Russell King (Oracle)
2021-06-16 16:27                       ` Russell King (Oracle)
2021-06-17  8:55                       ` Krzysztof Hałasa
2021-06-17  8:55                         ` Krzysztof Hałasa
2021-06-17  8:55                         ` Krzysztof Hałasa
2021-06-18 12:54                       ` Linus Walleij
2021-06-18 12:54                         ` Linus Walleij
2021-06-18 12:54                         ` Linus Walleij
2021-06-18 13:19                         ` Russell King (Oracle)
2021-06-18 13:19                           ` Russell King (Oracle)
2021-06-18 13:36                         ` Arnd Bergmann
2021-06-18 13:36                           ` Arnd Bergmann
2021-06-18 13:36                           ` Arnd Bergmann
2021-06-17 10:40   ` Mark Rutland
2021-06-17 10:40     ` Mark Rutland
2021-06-17 11:23     ` Russell King (Oracle)
2021-06-17 11:23       ` Russell King (Oracle)
2021-06-17 11:33       ` Mark Rutland
2021-06-17 11:33         ` Mark Rutland
2021-06-17 13:41         ` Andy Lutomirski
2021-06-17 13:41           ` Andy Lutomirski
2021-06-17 13:51           ` Mark Rutland
2021-06-17 13:51             ` Mark Rutland
2021-06-17 14:00             ` Andy Lutomirski
2021-06-17 14:00               ` Andy Lutomirski
2021-06-17 14:20               ` Mark Rutland
2021-06-17 14:20                 ` Mark Rutland
2021-06-17 15:01               ` Peter Zijlstra
2021-06-17 15:01                 ` Peter Zijlstra
2021-06-17 15:13                 ` Peter Zijlstra
2021-06-17 15:13                   ` Peter Zijlstra
2021-06-17 14:16             ` Mathieu Desnoyers
2021-06-17 14:16               ` Mathieu Desnoyers
2021-06-17 14:05           ` Peter Zijlstra
2021-06-17 14:05             ` Peter Zijlstra
2021-06-18  0:07   ` Andy Lutomirski
2021-06-18  0:07     ` Andy Lutomirski
2021-06-16  3:21 ` [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  3:21   ` Andy Lutomirski
2021-06-16  4:45   ` Nicholas Piggin
2021-06-16  4:45     ` Nicholas Piggin
2021-06-16  4:45     ` Nicholas Piggin
2021-06-16 18:52     ` Andy Lutomirski
2021-06-16 18:52       ` Andy Lutomirski
2021-06-16 18:52       ` Andy Lutomirski
2021-06-16 23:48       ` Andy Lutomirski
2021-06-16 23:48         ` Andy Lutomirski
2021-06-16 23:48         ` Andy Lutomirski
2021-06-18 15:27       ` Christophe Leroy
2021-06-18 15:27         ` Christophe Leroy
2021-06-18 15:27         ` Christophe Leroy
2021-06-16 10:20   ` Will Deacon
2021-06-16 10:20     ` Will Deacon
2021-06-16 10:20     ` Will Deacon
2021-06-16 23:58     ` Andy Lutomirski
2021-06-16 23:58       ` Andy Lutomirski
2021-06-16 23:58       ` Andy Lutomirski
2021-06-17 14:47   ` Mathieu Desnoyers
2021-06-17 14:47     ` Mathieu Desnoyers
2021-06-17 14:47     ` Mathieu Desnoyers
2021-06-17 14:47     ` Mathieu Desnoyers
2021-06-18  0:12     ` Andy Lutomirski
2021-06-18  0:12       ` Andy Lutomirski
2021-06-18  0:12       ` Andy Lutomirski
2021-06-18 16:31       ` Mathieu Desnoyers
2021-06-18 16:31         ` Mathieu Desnoyers
2021-06-18 16:31         ` Mathieu Desnoyers
2021-06-18 16:31         ` Mathieu Desnoyers
2021-06-18 19:58         ` Andy Lutomirski
2021-06-18 19:58           ` Andy Lutomirski
2021-06-18 19:58           ` Andy Lutomirski
2021-06-18 20:09           ` Mathieu Desnoyers
2021-06-18 20:09             ` Mathieu Desnoyers
2021-06-18 20:09             ` Mathieu Desnoyers
2021-06-18 20:09             ` Mathieu Desnoyers
2021-06-19  6:02             ` Nicholas Piggin
2021-06-19  6:02               ` Nicholas Piggin
2021-06-19  6:02               ` Nicholas Piggin
2021-06-19 15:50               ` Andy Lutomirski
2021-06-19 15:50                 ` Andy Lutomirski
2021-06-19 15:50                 ` Andy Lutomirski
2021-06-20  2:10                 ` Nicholas Piggin
2021-06-20  2:10                   ` Nicholas Piggin
2021-06-20  2:10                   ` Nicholas Piggin
2021-06-17 15:16   ` Mathieu Desnoyers
2021-06-17 15:16     ` Mathieu Desnoyers
2021-06-17 15:16     ` Mathieu Desnoyers
2021-06-17 15:16     ` Mathieu Desnoyers
2021-06-18  0:13     ` Andy Lutomirski
2021-06-18  0:13       ` Andy Lutomirski
2021-06-18  0:13       ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b648efcb72feb257b9fe004bd132f581805ec0d6.1623813516.git.luto@kernel.org \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.