linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, michel@lespinasse.org,
	jglisse@google.com, mhocko@suse.com, vbabka@suse.cz,
	hannes@cmpxchg.org, mgorman@techsingularity.net,
	dave@stgolabs.net, liam.howlett@oracle.com, peterz@infradead.org,
	ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com,
	will@kernel.org, luto@kernel.org, songliubraving@fb.com,
	peterx@redhat.com, david@redhat.com, dhowells@redhat.com,
	hughd@google.com, bigeasy@linutronix.de,
	kent.overstreet@linux.dev, punit.agrawal@bytedance.com,
	lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com,
	chriscli@google.com, axelrasmussen@google.com, joelaf@google.com,
	minchan@google.com, rppt@kernel.org, jannh@google.com,
	shakeelb@google.com, tatashin@google.com, edumazet@google.com,
	gthelen@google.com, gurua@google.com, arjunroy@google.com,
	soheil@google.com, leewalsh@google.com, posk@google.com,
	michalechner92@googlemail.com, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
Date: Mon, 3 Apr 2023 20:49:22 +0100	[thread overview]
Message-ID: <ZCstwjKhn76/mxIa@casper.infradead.org> (raw)
In-Reply-To: <CAJuCfpG_ZWJs3mZkL0z7m-bBe1SmeoTZydfFocZaRbHob_89Hg@mail.gmail.com>

On Fri, Feb 17, 2023 at 08:10:35AM -0800, Suren Baghdasaryan wrote:
> On Fri, Feb 17, 2023 at 8:05 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
> > > On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:
> > > > > > When vma->anon_vma is not set, page fault handler will set it by either
> > > > > > reusing anon_vma of an adjacent VMA if VMAs are compatible or by
> > > > > > allocating a new one. find_mergeable_anon_vma() walks VMA tree to find
> > > > > > a compatible adjacent VMA and that requires not only the faulting VMA
> > > > > > to be stable but also the tree structure and other VMAs inside that tree.
> > > > > > Therefore locking just the faulting VMA is not enough for this search.
> > > > > > Fall back to taking mmap_lock when vma->anon_vma is not set. This
> > > > > > situation happens only on the first page fault and should not affect
> > > > > > overall performance.
> > > > >
> > > > > I think I asked this before, but don't remember getting an aswer.
> > > > > Why do we defer setting anon_vma to the first fault?  Why don't we
> > > > > set it up at mmap time?
> > > >
> > > > Yeah, I remember that conversation Matthew and I could not find the
> > > > definitive answer at the time. I'll look into that again or maybe
> > > > someone can answer it here.
> > >
> > > After looking into it again I'm still under the impression that
> > > vma->anon_vma is populated lazily (during the first page fault rather
> > > than at mmap time) to avoid doing extra work for areas which are never
> > > faulted. Though I might be missing some important detail here.
> >
> > How often does userspace call mmap() and then _never_ fault on it?
> > I appreciate that userspace might mmap() gigabytes of address space and
> > then only end up using a small amount of it, so populating it lazily
> > makes sense.  But creating a region and never faulting on it?  The only
> > use-case I can think of is loading shared libraries:
> >
> > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> > (...)
> > mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000
> > mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000
> > mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000
> > mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000
> > mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000
> >
> > but that's a file-backed VMA, not an anon VMA.
> 
> Might the case of dup_mmap() while forking be the reason why a VMA in
> the child process might be never used while parent uses it (or visa
> versa)? Again, I'm not sure this is the reason but I can find no other
> good explanation.

I found an explanation!  Well, a partial one.  If we MAP_PRIVATE a file
mapping (like, er those ones up there) and only take read faults on it,
we can postpone allocation of the anon_vma indefinitely.  But once we
take a write fault in that VMA, we need to allocate an anon_vma for it
so that we can track the anonymous pages that have been allocated to
satisfy the copy-on-write (see do_cow_fault()).

However, I think in that caase, we could probably skip the
find_mergeable_anon_vma() step.  We don't today; we check whether
a->vm_file == b->vm_file in anon_vma_compatible, but I wonder if that
triggers often.


  reply	other threads:[~2023-04-03 19:50 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-16  5:17 [PATCH v3 00/35] Per-VMA locks Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 01/35] maple_tree: Be more cautious about dead nodes Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 02/35] maple_tree: Detect dead nodes in mas_start() Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 03/35] maple_tree: Fix freeing of nodes in rcu mode Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 04/35] maple_tree: remove extra smp_wmb() from mas_dead_leaves() Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 05/35] maple_tree: Fix write memory barrier of nodes once dead for RCU mode Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 06/35] maple_tree: Add smp_rmb() to dead node detection Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 07/35] maple_tree: Add RCU lock checking to rcu callback functions Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 08/35] mm: Enable maple tree RCU mode by default Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 09/35] mm: introduce CONFIG_PER_VMA_LOCK Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 10/35] mm: rcu safe VMA freeing Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 11/35] mm: move mmap_lock assert function definitions Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 12/35] mm: add per-VMA lock and helper functions to control it Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 13/35] mm: mark VMA as being written when changing vm_flags Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 14/35] mm/mmap: move VMA locking before vma_adjust_trans_huge call Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 15/35] mm/khugepaged: write-lock VMA while collapsing a huge page Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 16/35] mm/mmap: write-lock VMAs before merging, splitting or expanding them Suren Baghdasaryan
2023-02-23 14:51   ` Hyeonggon Yoo
2023-02-23 14:59     ` Hyeonggon Yoo
2023-02-23 17:46     ` Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it Suren Baghdasaryan
2023-02-23 20:20   ` Liam R. Howlett
2023-02-23 20:28     ` Liam R. Howlett
2023-02-23 21:16       ` Suren Baghdasaryan
2023-02-24  1:46         ` Liam R. Howlett
2023-02-24  2:06           ` Suren Baghdasaryan
2023-02-24 16:14             ` Liam R. Howlett
2023-02-24 16:19               ` Suren Baghdasaryan
2023-02-27 17:33                 ` Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 18/35] mm/mremap: write-lock VMA while remapping it to a new address range Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 19/35] mm: write-lock VMAs before removing them from VMA tree Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 20/35] mm: conditionally write-lock VMA in free_pgtables Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 21/35] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area Suren Baghdasaryan
2023-02-16 15:34   ` Liam R. Howlett
2023-02-16 19:36     ` Suren Baghdasaryan
2023-02-17 14:50       ` Liam R. Howlett
2023-02-17 15:54         ` Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 22/35] kernel/fork: assert no VMA readers during its destruction Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 23/35] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration Suren Baghdasaryan
2023-02-23 20:06   ` Liam R. Howlett
2023-02-23 20:29     ` Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 24/35] mm: introduce vma detached flag Suren Baghdasaryan
2023-02-23 20:08   ` Liam R. Howlett
2023-02-23 20:34     ` Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 25/35] mm: introduce lock_vma_under_rcu to be used from arch-specific code Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set Suren Baghdasaryan
2023-02-16 15:44   ` Matthew Wilcox
2023-02-16 19:43     ` Suren Baghdasaryan
2023-02-17  2:14       ` Suren Baghdasaryan
2023-02-17 10:21         ` Hyeonggon Yoo
2023-02-17 16:13           ` Suren Baghdasaryan
2023-02-17 18:49             ` Hyeonggon Yoo
2023-02-17 16:05         ` Matthew Wilcox
2023-02-17 16:10           ` Suren Baghdasaryan
2023-04-03 19:49             ` Matthew Wilcox [this message]
2023-02-16  5:17 ` [PATCH v3 27/35] mm: add FAULT_FLAG_VMA_LOCK flag Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 28/35] mm: prevent do_swap_page from handling page faults under VMA lock Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 29/35] mm: prevent userfaults to be handled under per-vma lock Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 30/35] mm: introduce per-VMA lock statistics Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 31/35] x86/mm: try VMA lock-based page fault handling first Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 32/35] arm64/mm: " Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 33/35] powerc/mm: " Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 34/35] mm/mmap: free vm_area_struct without call_rcu in exit_mmap Suren Baghdasaryan
2023-02-16  5:17 ` [PATCH v3 35/35] mm: separate vma->lock from vm_area_struct Suren Baghdasaryan
2023-02-24  9:21 ` [PATCH v3 00/35] Per-VMA locks freak07
2023-02-27 16:50   ` Davidlohr Bueso
2023-02-27 17:22     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCstwjKhn76/mxIa@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=arjunroy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=chriscli@google.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=gthelen@google.com \
    --cc=gurua@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=joelaf@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel-team@android.com \
    --cc=ldufour@linux.ibm.com \
    --cc=leewalsh@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=michalechner92@googlemail.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterjung1337@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=posk@google.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=soheil@google.com \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=tatashin@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).