linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michel Lespinasse <michel@lespinasse.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Kernel <linux-kernel@vger.kernel.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Rik van Riel <riel@surriel.com>,
	Paul McKenney <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH 00/29] Speculative page faults (anon vmas only)
Date: Fri, 9 Jul 2021 12:41:03 +0200	[thread overview]
Message-ID: <fef963b1-cb28-9e75-f2b0-6744a2520e54@redhat.com> (raw)
In-Reply-To: <3047d699-2793-e051-e1eb-deef7c5764a8@redhat.com>

On 17.06.21 15:46, David Hildenbrand wrote:
> On 30.04.21 21:52, Michel Lespinasse wrote:
>> This patchset is my take on speculative page faults (spf).
>> It builds on ideas that have been previously proposed by Laurent Dufour,
>> Peter Zijlstra and others before. While Laurent's previous proposal
>> was rejected around the time of LSF/MM 2019, I am hoping we can revisit
>> this now based on what I think is a simpler and more bisectable approach,
>> much improved scaling numbers in the anonymous vma case, and the Android
>> use case that has since emerged. I will expand on these points towards
>> the end of this message.
>>
>> The patch series applies on top of linux v5.12;
>> a git tree is also available:
>> git fetch https://github.com/lespinasse/linux.git v5.12-spf-anon
>>
>> I believe these patches should be considered for merging.
>> My github also has a v5.12-spf branch which extends this mechanism
>> for handling file mapped vmas too; however I believe these are less
>> mature and I am not submitting them for inclusion at this point.
>>
>>
>> Compared to the previous (RFC) proposal, I have split out / left out
>> the file VMA handling parts, fixed some config specific build issues,
>> added a few more comments and modified the speculative fault handling
>> to use rcu_read_lock() rather than local_irq_disable() in the
>> MMU_GATHER_RCU_TABLE_FREE case.
>>
>>
>> Classical page fault processing takes the mmap read lock in order to
>> prevent races with mmap writers. In contrast, speculative fault
>> processing does not take the mmap read lock, and instead verifies,
>> when the results of the page fault are about to get committed and
>> become visible to other threads, that no mmap writers have been
>> running concurrently with the page fault. If the check fails,
>> speculative updates do not get committed and the fault is retried
>> in the usual, non-speculative way (with the mmap read lock held).
>>
>> The concurrency check is implemented using a per-mm mmap sequence count.
>> The counter is incremented at the beginning and end of each mmap write
>> operation. If the counter is initially observed to have an even value,
>> and has the same value later on, the observer can deduce that no mmap
>> writers have been running concurrently with it between those two times.
>> This is similar to a seqlock, except that readers never spin on the
>> counter value (they would instead revert to taking the mmap read lock),
>> and writers are allowed to sleep. One benefit of this approach is that
>> it requires no writer side changes, just some hooks in the mmap write
>> lock APIs that writers already use.
>>
>> The first step of a speculative page fault is to look up the vma and
>> read its contents (currently by making a copy of the vma, though in
>> principle it would be sufficient to only read the vma attributes that
>> are used in page faults). The mmap sequence count is used to verify
>> that there were no mmap writers concurrent to the lookup and copy steps.
>> Note that walking rbtrees while there may potentially be concurrent
>> writers is not an entirely new idea in linux, as latched rbtrees
>> are already doing this. This is safe as long as the lookup is
>> followed by a sequence check to verify that concurrency did not
>> actually occur (and abort the speculative fault if it did).
>>
>> The next step is to walk down the existing page table tree to find the
>> current pte entry. This is done with interrupts disabled to avoid
>> races with munmap(). Again, not an entirely new idea, as this repeats
>> a pattern already present in fast GUP. Similar precautions are also
>> taken when taking the page table lock.
> 
> Hi Michel,
> 
> I just started working on a project to reclaim page tables inside
> running processes that are no longer needed (for example, empty after
> madvise(DISCARD)). Long story short, there are scenarios where we want
> to scan for such page tables asynchronously to free up memory (which can
> be quite significant in some use cases).
> 
> Now that I (mostly) understood the complex locking, I'm looking for
> other mm features that might be "problematic" in that regard and require
> properly planning to get right (or let them run mutually exclusive).
> 
> As I essentially rip out page tables from the page table hierarchy to
> free them (in the simplest case within a VMA to get started), I
> certainly need the mmap lock in read right now to scan the page table
> hierarchy, and the mmap lock in write when actually removing a page
> table. This is similar handling as khugepagd when collapsing a THP and
> removing a page table. Of course, we could use any kind of
> synchronization mechanism (-> rcu) to make sure nobody is using a page
> table anymore before actually freeing it.
> 
> 1. I now wonder how your code actually protects against e.g., khugepaged
> and how it could protect against page table reclaim. Will we be using
> RCU while walking the page tables? That would make life easier.
> 
> 2. You mention "interrupts disabled to avoid races with munmap()". Can
> you elaborate how that is supposed to work? Shouldn't we rather be using
> RCU than manually disabling interrupts? What is the rationale?

Answering my questions, I assume this works just like gup_fast 
lockless_pages_from_mm(), whereby we rely on an IPI when clearing the 
TLB before actually freeing the page (-> mmu gather).

-- 
Thanks,

David / dhildenb


      reply	other threads:[~2021-07-09 10:41 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 19:52 [PATCH 00/29] Speculative page faults (anon vmas only) Michel Lespinasse
2021-04-30 19:52 ` [PATCH 01/29] mm: export dump_mm Michel Lespinasse
2021-04-30 19:52 ` [PATCH 02/29] mmap locking API: mmap_lock_is_contended returns a bool Michel Lespinasse
2021-04-30 19:52 ` [PATCH 03/29] mmap locking API: name the return values Michel Lespinasse
2021-04-30 19:52 ` [PATCH 04/29] do_anonymous_page: use update_mmu_tlb() Michel Lespinasse
2021-06-10  0:38   ` Suren Baghdasaryan
2021-04-30 19:52 ` [PATCH 05/29] do_anonymous_page: reduce code duplication Michel Lespinasse
2021-04-30 19:52 ` [PATCH 06/29] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 07/29] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 08/29] mm: add FAULT_FLAG_SPECULATIVE flag Michel Lespinasse
2021-06-10  0:58   ` Suren Baghdasaryan
2021-04-30 19:52 ` [PATCH 09/29] mm: add do_handle_mm_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 10/29] mm: add per-mm mmap sequence counter for speculative page fault handling Michel Lespinasse
2021-04-30 19:52 ` [PATCH 11/29] mm: rcu safe vma freeing Michel Lespinasse
2021-04-30 19:52 ` [PATCH 12/29] x86/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 19:52 ` [PATCH 13/29] mm: add speculative_page_walk_begin() and speculative_page_walk_end() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 14/29] mm: refactor __handle_mm_fault() / handle_pte_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 15/29] mm: implement speculative handling in __handle_mm_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 16/29] mm: add pte_map_lock() and pte_spinlock() Michel Lespinasse
2021-04-30 23:33   ` kernel test robot
2021-04-30 23:45   ` kernel test robot
2021-04-30 19:52 ` [PATCH 17/29] mm: implement speculative handling in do_anonymous_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 18/29] mm: enable speculative fault handling through do_anonymous_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 19/29] mm: implement speculative handling in do_numa_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 20/29] mm: enable speculative fault " Michel Lespinasse
2021-04-30 19:52 ` [PATCH 21/29] mm: implement speculative handling in wp_page_copy() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 22/29] mm: implement and enable speculative fault handling in handle_pte_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 23/29] mm: implement speculative handling in do_swap_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 24/29] mm: enable speculative fault handling through do_swap_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 25/29] mm: disable speculative faults for single threaded user space Michel Lespinasse
2021-04-30 19:52 ` [PATCH 26/29] mm: disable rcu safe vma freeing " Michel Lespinasse
2021-04-30 19:52 ` [PATCH 27/29] mm: anon spf statistics Michel Lespinasse
2021-04-30 22:52   ` kernel test robot
2021-04-30 19:52 ` [PATCH 28/29] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 29/29] arm64/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 19:52 ` [PATCH 30/31] powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 31/31] powerpc/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 22:46 ` [PATCH 00/29] Speculative page faults (anon vmas only) Michel Lespinasse
2021-05-03 18:11   ` Michel Lespinasse
2021-05-17 17:57     ` Paul E. McKenney
2021-05-20 22:10       ` Suren Baghdasaryan
2021-05-20 23:08         ` Paul E. McKenney
2021-06-01  7:41         ` Michel Lespinasse
2021-06-01 20:18           ` Paul E. McKenney
2021-06-01 20:23         ` Paul E. McKenney
2021-06-14  7:04         ` Michel Lespinasse
2021-05-01 19:56 ` Theodore Ts'o
2021-05-01 21:19   ` Michel Lespinasse
2021-06-17 13:46 ` David Hildenbrand
2021-07-09 10:41   ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fef963b1-cb28-9e75-f2b0-6744a2520e54@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=joelaf@google.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).