All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michel Lespinasse <michel@lespinasse.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Kernel <linux-kernel@vger.kernel.org>
Cc: Laurent Dufour <ldufour@linux.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Matthew Wilcox <willy@infradead.org>,
	Rik van Riel <riel@surriel.com>,
	Paul McKenney <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Andy Lutomirski <luto@kernel.org>
Subject: Re: [PATCH 00/29] Speculative page faults (anon vmas only)
Date: Fri, 9 Jul 2021 12:41:03 +0200	[thread overview]
Message-ID: <fef963b1-cb28-9e75-f2b0-6744a2520e54@redhat.com> (raw)
In-Reply-To: <3047d699-2793-e051-e1eb-deef7c5764a8@redhat.com>

On 17.06.21 15:46, David Hildenbrand wrote:
> On 30.04.21 21:52, Michel Lespinasse wrote:
>> This patchset is my take on speculative page faults (spf).
>> It builds on ideas that have been previously proposed by Laurent Dufour,
>> Peter Zijlstra and others before. While Laurent's previous proposal
>> was rejected around the time of LSF/MM 2019, I am hoping we can revisit
>> this now based on what I think is a simpler and more bisectable approach,
>> much improved scaling numbers in the anonymous vma case, and the Android
>> use case that has since emerged. I will expand on these points towards
>> the end of this message.
>>
>> The patch series applies on top of linux v5.12;
>> a git tree is also available:
>> git fetch https://github.com/lespinasse/linux.git v5.12-spf-anon
>>
>> I believe these patches should be considered for merging.
>> My github also has a v5.12-spf branch which extends this mechanism
>> for handling file mapped vmas too; however I believe these are less
>> mature and I am not submitting them for inclusion at this point.
>>
>>
>> Compared to the previous (RFC) proposal, I have split out / left out
>> the file VMA handling parts, fixed some config specific build issues,
>> added a few more comments and modified the speculative fault handling
>> to use rcu_read_lock() rather than local_irq_disable() in the
>> MMU_GATHER_RCU_TABLE_FREE case.
>>
>>
>> Classical page fault processing takes the mmap read lock in order to
>> prevent races with mmap writers. In contrast, speculative fault
>> processing does not take the mmap read lock, and instead verifies,
>> when the results of the page fault are about to get committed and
>> become visible to other threads, that no mmap writers have been
>> running concurrently with the page fault. If the check fails,
>> speculative updates do not get committed and the fault is retried
>> in the usual, non-speculative way (with the mmap read lock held).
>>
>> The concurrency check is implemented using a per-mm mmap sequence count.
>> The counter is incremented at the beginning and end of each mmap write
>> operation. If the counter is initially observed to have an even value,
>> and has the same value later on, the observer can deduce that no mmap
>> writers have been running concurrently with it between those two times.
>> This is similar to a seqlock, except that readers never spin on the
>> counter value (they would instead revert to taking the mmap read lock),
>> and writers are allowed to sleep. One benefit of this approach is that
>> it requires no writer side changes, just some hooks in the mmap write
>> lock APIs that writers already use.
>>
>> The first step of a speculative page fault is to look up the vma and
>> read its contents (currently by making a copy of the vma, though in
>> principle it would be sufficient to only read the vma attributes that
>> are used in page faults). The mmap sequence count is used to verify
>> that there were no mmap writers concurrent to the lookup and copy steps.
>> Note that walking rbtrees while there may potentially be concurrent
>> writers is not an entirely new idea in linux, as latched rbtrees
>> are already doing this. This is safe as long as the lookup is
>> followed by a sequence check to verify that concurrency did not
>> actually occur (and abort the speculative fault if it did).
>>
>> The next step is to walk down the existing page table tree to find the
>> current pte entry. This is done with interrupts disabled to avoid
>> races with munmap(). Again, not an entirely new idea, as this repeats
>> a pattern already present in fast GUP. Similar precautions are also
>> taken when taking the page table lock.
> 
> Hi Michel,
> 
> I just started working on a project to reclaim page tables inside
> running processes that are no longer needed (for example, empty after
> madvise(DISCARD)). Long story short, there are scenarios where we want
> to scan for such page tables asynchronously to free up memory (which can
> be quite significant in some use cases).
> 
> Now that I (mostly) understood the complex locking, I'm looking for
> other mm features that might be "problematic" in that regard and require
> properly planning to get right (or let them run mutually exclusive).
> 
> As I essentially rip out page tables from the page table hierarchy to
> free them (in the simplest case within a VMA to get started), I
> certainly need the mmap lock in read right now to scan the page table
> hierarchy, and the mmap lock in write when actually removing a page
> table. This is similar handling as khugepagd when collapsing a THP and
> removing a page table. Of course, we could use any kind of
> synchronization mechanism (-> rcu) to make sure nobody is using a page
> table anymore before actually freeing it.
> 
> 1. I now wonder how your code actually protects against e.g., khugepaged
> and how it could protect against page table reclaim. Will we be using
> RCU while walking the page tables? That would make life easier.
> 
> 2. You mention "interrupts disabled to avoid races with munmap()". Can
> you elaborate how that is supposed to work? Shouldn't we rather be using
> RCU than manually disabling interrupts? What is the rationale?

Answering my questions, I assume this works just like gup_fast 
lockless_pages_from_mm(), whereby we rely on an IPI when clearing the 
TLB before actually freeing the page (-> mmu gather).

-- 
Thanks,

David / dhildenb


      reply	other threads:[~2021-07-09 10:41 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-30 19:52 [PATCH 00/29] Speculative page faults (anon vmas only) Michel Lespinasse
2021-04-30 19:52 ` [PATCH 01/29] mm: export dump_mm Michel Lespinasse
2021-04-30 19:52 ` [PATCH 02/29] mmap locking API: mmap_lock_is_contended returns a bool Michel Lespinasse
2021-04-30 19:52 ` [PATCH 03/29] mmap locking API: name the return values Michel Lespinasse
2021-04-30 19:52 ` [PATCH 04/29] do_anonymous_page: use update_mmu_tlb() Michel Lespinasse
2021-06-10  0:38   ` Suren Baghdasaryan
2021-06-10  0:38     ` Suren Baghdasaryan
2021-04-30 19:52 ` [PATCH 05/29] do_anonymous_page: reduce code duplication Michel Lespinasse
2021-04-30 19:52 ` [PATCH 06/29] mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 07/29] x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 08/29] mm: add FAULT_FLAG_SPECULATIVE flag Michel Lespinasse
2021-06-10  0:58   ` Suren Baghdasaryan
2021-06-10  0:58     ` Suren Baghdasaryan
2021-04-30 19:52 ` [PATCH 09/29] mm: add do_handle_mm_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 10/29] mm: add per-mm mmap sequence counter for speculative page fault handling Michel Lespinasse
2021-04-30 19:52 ` [PATCH 11/29] mm: rcu safe vma freeing Michel Lespinasse
2021-04-30 19:52 ` [PATCH 12/29] x86/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 19:52 ` [PATCH 13/29] mm: add speculative_page_walk_begin() and speculative_page_walk_end() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 14/29] mm: refactor __handle_mm_fault() / handle_pte_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 15/29] mm: implement speculative handling in __handle_mm_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 16/29] mm: add pte_map_lock() and pte_spinlock() Michel Lespinasse
2021-04-30 23:33   ` kernel test robot
2021-04-30 23:33     ` kernel test robot
2021-04-30 23:45   ` kernel test robot
2021-04-30 23:45     ` kernel test robot
2021-04-30 19:52 ` [PATCH 17/29] mm: implement speculative handling in do_anonymous_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 18/29] mm: enable speculative fault handling through do_anonymous_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 19/29] mm: implement speculative handling in do_numa_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 20/29] mm: enable speculative fault " Michel Lespinasse
2021-04-30 19:52 ` [PATCH 21/29] mm: implement speculative handling in wp_page_copy() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 22/29] mm: implement and enable speculative fault handling in handle_pte_fault() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 23/29] mm: implement speculative handling in do_swap_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 24/29] mm: enable speculative fault handling through do_swap_page() Michel Lespinasse
2021-04-30 19:52 ` [PATCH 25/29] mm: disable speculative faults for single threaded user space Michel Lespinasse
2021-04-30 19:52 ` [PATCH 26/29] mm: disable rcu safe vma freeing " Michel Lespinasse
2021-04-30 19:52 ` [PATCH 27/29] mm: anon spf statistics Michel Lespinasse
2021-04-30 22:52   ` kernel test robot
2021-04-30 22:52     ` kernel test robot
2021-04-30 19:52 ` [PATCH 28/29] arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 29/29] arm64/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 19:52 ` [PATCH 30/31] powerpc/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT Michel Lespinasse
2021-04-30 19:52 ` [PATCH 31/31] powerpc/mm: attempt speculative mm faults first Michel Lespinasse
2021-04-30 22:46 ` [PATCH 00/29] Speculative page faults (anon vmas only) Michel Lespinasse
2021-05-03 18:11   ` Michel Lespinasse
2021-05-17 17:57     ` Paul E. McKenney
2021-05-20 22:10       ` Suren Baghdasaryan
2021-05-20 22:10         ` Suren Baghdasaryan
2021-05-20 23:08         ` Paul E. McKenney
2021-06-01  7:41         ` Michel Lespinasse
2021-06-01 20:18           ` Paul E. McKenney
2021-06-01 20:23         ` Paul E. McKenney
2021-06-14  7:04         ` Michel Lespinasse
2021-05-01 19:56 ` Theodore Ts'o
2021-05-01 21:19   ` Michel Lespinasse
2021-06-17 13:46 ` David Hildenbrand
2021-07-09 10:41   ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fef963b1-cb28-9e75-f2b0-6744a2520e54@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=joelaf@google.com \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.