From: "Paul E. McKenney" <paulmck@kernel.org>
To: Michel Lespinasse <michel@lespinasse.org>
Cc: Andy Lutomirski <luto@kernel.org>, Linux-MM <linux-mm@kvack.org>,
Laurent Dufour <ldufour@linux.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Rik van Riel <riel@surriel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
Joel Fernandes <joelaf@google.com>,
Rom Lemarchand <romlem@google.com>,
Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 13/37] mm: implement speculative handling in __handle_mm_fault().
Date: Thu, 29 Apr 2021 11:34:12 -0700 [thread overview]
Message-ID: <20210429183412.GA278623@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <20210429155250.GV975577@paulmck-ThinkPad-P17-Gen-1>
On Thu, Apr 29, 2021 at 08:52:50AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 28, 2021 at 05:02:25PM -0700, Michel Lespinasse wrote:
> > On Wed, Apr 28, 2021 at 09:11:08AM -0700, Paul E. McKenney wrote:
> > > On Wed, Apr 28, 2021 at 08:13:53AM -0700, Andy Lutomirski wrote:
> > > > On Wed, Apr 28, 2021 at 8:05 AM Michel Lespinasse <michel@lespinasse.org> wrote:
> > > > >
> > > > > On Wed, Apr 07, 2021 at 08:36:01AM -0700, Andy Lutomirski wrote:
> > > > > > On 4/6/21 6:44 PM, Michel Lespinasse wrote:
> > > > > > > The page table tree is walked with local irqs disabled, which prevents
> > > > > > > page table reclamation (similarly to what fast GUP does). The logic is
> > > > > > > otherwise similar to the non-speculative path, but with additional
> > > > > > > restrictions: in the speculative path, we do not handle huge pages or
> > > > > > > wiring new pages tables.
> > > > > >
> > > > > > Not on most architectures. Quoting the actual comment in mm/gup.c:
> > > > > >
> > > > > > > * Before activating this code, please be aware that the following assumptions
> > > > > > > * are currently made:
> > > > > > > *
> > > > > > > * *) Either MMU_GATHER_RCU_TABLE_FREE is enabled, and tlb_remove_table() is used to
> > > > > > > * free pages containing page tables or TLB flushing requires IPI broadcast.
> > > > > >
> > > > > > On MMU_GATHER_RCU_TABLE_FREE architectures, you cannot make the
> > > > > > assumption that it is safe to dereference a pointer in a page table just
> > > > > > because irqs are off. You need RCU protection, too.
> > > > > >
> > > > > > You have the same error in the cover letter.
> > > > >
> > > > > Hi Andy,
> > > > >
> > > > > Thanks for your comment. At first I thought did not matter, because we
> > > > > only enable ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT on selected
> > > > > architectures, and I thought MMU_GATHER_RCU_TABLE_FREE is not set on
> > > > > these. But I was wrong - MMU_GATHER_RCU_TABLE_FREE is enabled on X86
> > > > > with paravirt. So I took another look at fast GUP to make sure I
> > > > > actually understand it.
> > > > >
> > > > > This brings a question about lockless_pages_from_mm() - I see it
> > > > > disabling interrupts, which it explains is necessary for disabling THP
> > > > > splitting IPIs, but I do not see it taking an RCU read lock as would
> > > > > be necessary for preventing paga table freeing on
> > > > > MMU_GATHER_RCU_TABLE_FREE configs. I figure local_irq_save()
> > > > > indirectly takes an rcu read lock somehow ? I think this is something
> > > > > I should also mention in my explanation, and I have not seen a good
> > > > > description of this on the fast GUP side...
> > > >
> > > > Sounds like a bug! That being said, based on my extremely limited
> > > > understanding of how the common RCU modes work, local_irq_save()
> > > > probably implies an RCU lock in at least some cases. Hi Paul!
> > >
> > > In modern kernels, local_irq_save() does have RCU reader semantics,
> > > meaning that synchronize_rcu() will wait for pre-exiting irq-disabled
> > > regions. It will also wait for pre-existing bh-disable, preempt-disable,
> > > and of course rcu_read_lock() sections of code.
> >
> > Thanks Paul for confirming / clarifying this. BTW, it would be good to
> > add this to the rcu header files, just so people have something to
> > reference to when they depend on such behavior (like fast GUP
> > currently does).
>
> There is this in the synchronize_rcu() header block comment:
>
> * synchronize_rcu() was waiting. RCU read-side critical sections are
> * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
> * In addition, regions of code across which interrupts, preemption, or
> * softirqs have been disabled also serve as RCU read-side critical
> * sections. This includes hardware interrupt handlers, softirq handlers,
> * and NMI handlers.
>
> I have pulled this into a separate paragraph to increase its visibility,
> and will check out other locations in comments and documentation.
Ditto for call_rcu() and the separate paragraph.
The rcu_read_lock_bh() and rcu_read_lock_sched() header comments noted
that these act as RCU read-side critical sections, but I added similar
verbiage to rcu_dereference_bh_check() and rcu_dereference_sched_check().
Please see below for the resulting commit.
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
commit 97262c64c2cf807bf06825e454c4bedd228fadfb
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Thu Apr 29 11:18:01 2021 -0700
rcu: Improve comments describing RCU read-side critical sections
There are a number of places that call out the fact that preempt-disable
regions of code now act as RCU read-side critical sections, where
preempt-disable regions of code include irq-disable regions of code,
bh-disable regions of code, hardirq handlers, and NMI handlers. However,
someone relying solely on (for example) the call_rcu() header comment
might well have no idea that preempt-disable regions of code have RCU
semantics.
This commit therefore updates the header comments for
call_rcu(), synchronize_rcu(), rcu_dereference_bh_check(), and
rcu_dereference_sched_check() to call out these new(ish) forms of RCU
readers.
Reported-by: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index a10480f2b4ef..c01b04ad64c4 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -532,7 +532,10 @@ do { \
* @p: The pointer to read, prior to dereferencing
* @c: The conditions under which the dereference will take place
*
- * This is the RCU-bh counterpart to rcu_dereference_check().
+ * This is the RCU-bh counterpart to rcu_dereference_check(). However,
+ * please note that in recent kernels, synchronize_rcu() waits for
+ * local_bh_disable() regions of code in addition to regions of code
+ * demarked by rcu_read_lock() and rcu_read_unlock().
*/
#define rcu_dereference_bh_check(p, c) \
__rcu_dereference_check((p), (c) || rcu_read_lock_bh_held(), __rcu)
@@ -543,6 +546,9 @@ do { \
* @c: The conditions under which the dereference will take place
*
* This is the RCU-sched counterpart to rcu_dereference_check().
+ * However, please note that in recent kernels, synchronize_rcu() waits
+ * for preemption-disabled regions of code in addition to regions of code
+ * demarked by rcu_read_lock() and rcu_read_unlock().
*/
#define rcu_dereference_sched_check(p, c) \
__rcu_dereference_check((p), (c) || rcu_read_lock_sched_held(), \
@@ -634,6 +640,12 @@ do { \
* sections, invocation of the corresponding RCU callback is deferred
* until after the all the other CPUs exit their critical sections.
*
+ * In recent kernels, synchronize_rcu() and call_rcu() also wait for
+ * regions of code with preemption disabled, including regions of code
+ * with interrupts or softirqs disabled. If your kernel is old enough
+ * for synchronize_sched() to be defined, only code enclosed within
+ * rcu_read_lock() and rcu_read_unlock() are guaranteed to be waited for.
+ *
* Note, however, that RCU callbacks are permitted to run concurrently
* with new RCU read-side critical sections. One way that this can happen
* is via the following sequence of events: (1) CPU 0 enters an RCU
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9ea1d4eef1ad..0e76bf47d92b 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3071,12 +3071,13 @@ __call_rcu(struct rcu_head *head, rcu_callback_t func)
* period elapses, in other words after all pre-existing RCU read-side
* critical sections have completed. However, the callback function
* might well execute concurrently with RCU read-side critical sections
- * that started after call_rcu() was invoked. RCU read-side critical
- * sections are delimited by rcu_read_lock() and rcu_read_unlock(), and
- * may be nested. In addition, regions of code across which interrupts,
- * preemption, or softirqs have been disabled also serve as RCU read-side
- * critical sections. This includes hardware interrupt handlers, softirq
- * handlers, and NMI handlers.
+ * that started after call_rcu() was invoked.
+ *
+ * RCU read-side critical sections are delimited by rcu_read_lock() and
+ * rcu_read_unlock(), and may be nested. In addition, regions of code
+ * across which interrupts, preemption, or softirqs have been disabled
+ * also serve as RCU read-side critical sections. This includes hardware
+ * interrupt handlers, softirq handlers, and NMI handlers.
*
* Note that all CPUs must agree that the grace period extended beyond
* all pre-existing RCU read-side critical section. On systems with more
@@ -3771,12 +3772,13 @@ static int rcu_blocking_is_gp(void)
* read-side critical sections have completed. Note, however, that
* upon return from synchronize_rcu(), the caller might well be executing
* concurrently with new RCU read-side critical sections that began while
- * synchronize_rcu() was waiting. RCU read-side critical sections are
- * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
- * In addition, regions of code across which interrupts, preemption, or
- * softirqs have been disabled also serve as RCU read-side critical
- * sections. This includes hardware interrupt handlers, softirq handlers,
- * and NMI handlers.
+ * synchronize_rcu() was waiting.
+ *
+ * RCU read-side critical sections are delimited by rcu_read_lock() and
+ * rcu_read_unlock(), and may be nested. In addition, regions of code
+ * across which interrupts, preemption, or softirqs have been disabled
+ * also serve as RCU read-side critical sections. This includes hardware
+ * interrupt handlers, softirq handlers, and NMI handlers.
*
* Note that this guarantee implies further memory-ordering guarantees.
* On systems with more than one CPU, when synchronize_rcu() returns,
next prev parent reply other threads:[~2021-04-29 18:34 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210407014502.24091-1-michel@lespinasse.org>
2021-04-07 1:44 ` [RFC PATCH 01/37] mmap locking API: mmap_lock_is_contended returns a bool Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 02/37] mmap locking API: name the return values Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 03/37] do_anonymous_page: use update_mmu_tlb() Michel Lespinasse
2021-04-07 2:06 ` Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 04/37] do_anonymous_page: reduce code duplication Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 05/37] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 06/37] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 07/37] mm: add FAULT_FLAG_SPECULATIVE flag Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 08/37] mm: add do_handle_mm_fault() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 09/37] mm: add per-mm mmap sequence counter for speculative page fault handling Michel Lespinasse
2021-04-07 14:47 ` Peter Zijlstra
2021-04-07 20:50 ` Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 10/37] mm: rcu safe vma freeing Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 11/37] x86/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-07 14:48 ` Peter Zijlstra
2021-04-07 15:35 ` Matthew Wilcox
2021-04-07 20:32 ` Michel Lespinasse
2021-04-07 20:14 ` Michel Lespinasse
2021-04-07 20:18 ` Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 12/37] mm: refactor __handle_mm_fault() / handle_pte_fault() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 13/37] mm: implement speculative handling in __handle_mm_fault() Michel Lespinasse
2021-04-07 15:36 ` Andy Lutomirski
2021-04-28 14:58 ` Michel Lespinasse
2021-04-28 15:13 ` Andy Lutomirski
2021-04-28 16:11 ` Paul E. McKenney
2021-04-29 0:02 ` Michel Lespinasse
2021-04-29 0:05 ` Andy Lutomirski
2021-04-29 16:12 ` Matthew Wilcox
2021-04-29 18:04 ` Andy Lutomirski
2021-04-29 19:14 ` Michel Lespinasse
2021-04-29 19:34 ` Matthew Wilcox
2021-04-29 23:56 ` Michel Lespinasse
2021-04-29 15:52 ` Paul E. McKenney
2021-04-29 18:34 ` Paul E. McKenney [this message]
2021-04-29 18:49 ` Matthew Wilcox
2021-05-03 3:14 ` Paul E. McKenney
2021-04-29 21:17 ` Michel Lespinasse
2021-05-03 3:40 ` Paul E. McKenney
2021-05-03 4:34 ` Michel Lespinasse
2021-05-03 16:32 ` Paul E. McKenney
2021-04-07 1:44 ` [RFC PATCH 14/37] mm: add pte_map_lock() and pte_spinlock() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 15/37] mm: implement speculative handling in do_anonymous_page() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 16/37] mm: enable speculative fault handling through do_anonymous_page() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 17/37] mm: implement speculative handling in do_numa_page() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 18/37] mm: enable speculative fault " Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 19/37] mm: implement speculative handling in wp_page_copy() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 20/37] mm: implement and enable speculative fault handling in handle_pte_fault() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 21/37] mm: implement speculative handling in do_swap_page() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 22/37] mm: enable speculative fault handling through do_swap_page() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 23/37] mm: rcu safe vma->vm_file freeing Michel Lespinasse
2021-04-08 5:12 ` [mm] 87b1c39af4: nvml.blk_rw_mt_TEST0_check_pmem_debug.fail kernel test robot
2021-04-07 1:44 ` [RFC PATCH 24/37] mm: implement speculative handling in __do_fault() Michel Lespinasse
2021-04-07 2:35 ` Matthew Wilcox
2021-04-07 2:53 ` Michel Lespinasse
2021-04-07 3:01 ` Matthew Wilcox
2021-04-07 14:40 ` Peter Zijlstra
2021-04-07 21:20 ` Michel Lespinasse
2021-04-07 21:27 ` Matthew Wilcox
2021-04-08 7:00 ` Peter Zijlstra
2021-04-08 7:13 ` Matthew Wilcox
2021-04-08 8:18 ` Peter Zijlstra
2021-04-08 8:37 ` Michel Lespinasse
2021-04-08 11:28 ` Matthew Wilcox
2021-04-07 1:44 ` [RFC PATCH 25/37] mm: implement speculative handling in filemap_fault() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 26/37] mm: implement speculative fault handling in finish_fault() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 27/37] mm: implement speculative handling in do_fault_around() Michel Lespinasse
2021-04-07 2:37 ` Matthew Wilcox
2021-04-07 1:44 ` [RFC PATCH 28/37] mm: implement speculative handling in filemap_map_pages() Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 29/37] fs: list file types that support speculative faults Michel Lespinasse
2021-04-07 2:39 ` Matthew Wilcox
2021-04-07 1:44 ` [RFC PATCH 30/37] mm: enable speculative fault handling for supported file types Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 31/37] ext4: implement speculative fault handling Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 32/37] f2fs: " Michel Lespinasse
2021-04-07 1:44 ` [RFC PATCH 33/37] mm: enable speculative fault handling only for multithreaded user space Michel Lespinasse
2021-04-07 2:48 ` Matthew Wilcox
2021-04-07 1:44 ` [RFC PATCH 34/37] mm: rcu safe vma freeing " Michel Lespinasse
2021-04-07 2:50 ` Matthew Wilcox
2021-04-08 7:53 ` Michel Lespinasse
2021-04-07 1:45 ` [RFC PATCH 35/37] mm: spf statistics Michel Lespinasse
2021-04-07 1:45 ` [RFC PATCH 36/37] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-07 1:45 ` [RFC PATCH 37/37] arm64/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-21 1:44 ` [RFC PATCH 00/37] Speculative page faults Chinwen Chang
2021-06-28 22:14 ` Axel Rasmussen
2021-07-21 11:33 ` vjitta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210429183412.GA278623@paulmck-ThinkPad-P17-Gen-1 \
--to=paulmck@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=joelaf@google.com \
--cc=ldufour@linux.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mhocko@suse.com \
--cc=michel@lespinasse.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=romlem@google.com \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).