linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Suren Baghdasaryan <surenb@google.com>, akpm@linux-foundation.org
Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com,
	hannes@cmpxchg.org, mgorman@techsingularity.net,
	dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com,
	peterz@infradead.org, ldufour@linux.ibm.com,
	laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org,
	songliubraving@fb.com, peterx@redhat.com, david@redhat.com,
	dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de,
	kent.overstreet@linux.dev, punit.agrawal@bytedance.com,
	lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com,
	axelrasmussen@google.com, joelaf@google.com, minchan@google.com,
	jannh@google.com, shakeelb@google.com, tatashin@google.com,
	edumazet@google.com, gthelen@google.com, gurua@google.com,
	arjunroy@google.com, soheil@google.com, hughlynch@google.com,
	leewalsh@google.com, posk@google.com, linux-mm@kvack.org,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock
Date: Tue, 10 Jan 2023 09:04:18 +0100	[thread overview]
Message-ID: <5874fea2-fc3b-5e5d-50ac-e413a11819a5@suse.cz> (raw)
In-Reply-To: <20230109205336.3665937-42-surenb@google.com>

On 1/9/23 21:53, Suren Baghdasaryan wrote:
> rw_semaphore is a sizable structure of 40 bytes and consumes
> considerable space for each vm_area_struct. However vma_lock has
> two important specifics which can be used to replace rw_semaphore
> with a simpler structure:
> 1. Readers never wait. They try to take the vma_lock and fall back to
> mmap_lock if that fails.
> 2. Only one writer at a time will ever try to write-lock a vma_lock
> because writers first take mmap_lock in write mode.
> Because of these requirements, full rw_semaphore functionality is not
> needed and we can replace rw_semaphore with an atomic variable.
> When a reader takes read lock, it increments the atomic unless the
> value is negative. If that fails read-locking is aborted and mmap_lock
> is used instead.
> When writer takes write lock, it resets atomic value to -1 if the
> current value is 0 (no readers). Since all writers take mmap_lock in
> write mode first, there can be only one writer at a time. If there
> are readers, writer will place itself into a wait queue using new
> mm_struct.vma_writer_wait waitqueue head. The last reader to release
> the vma_lock will signal the writer to wake up.
> vm_lock_seq is also moved into vma_lock and along with atomic_t they
> are nicely packed and consume 8 bytes, bringing the overhead from
> vma_lock from 44 to 16 bytes:
> 
>     slabinfo before the changes:
>      <name>            ... <objsize> <objperslab> <pagesperslab> : ...
>     vm_area_struct    ...    152   53    2 : ...
> 
>     slabinfo with vma_lock:
>      <name>            ... <objsize> <objperslab> <pagesperslab> : ...
>     rw_semaphore      ...      8  512    1 : ...

I guess the cache is called vma_lock, not rw_semaphore?

>     vm_area_struct    ...    160   51    2 : ...
> 
> Assuming 40000 vm_area_structs, memory consumption would be:
> baseline: 6040kB
> vma_lock (vm_area_structs+vma_lock): 6280kB+316kB=6596kB
> Total increase: 556kB
> 
> atomic_t might overflow if there are many competing readers, therefore
> vma_read_trylock() implements an overflow check and if that occurs it
> restors the previous value and exits with a failure to lock.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

This patch is indeed an interesting addition indeed, but I can't help but
think it obsoletes the previous one :) We allocate an extra 8 bytes slab
object for the lock, and the pointer to it is also 8 bytes, and requires an
indirection. The vma_lock cache is not cacheline aligned (otherwise it would
be a major waste), so we have potential false sharing with up to 7 other
vma_lock's.
I'd expect if the vma_lock was placed with the relatively cold fields of
vm_area_struct, it shouldn't cause much cache ping pong when working with
that vma. Even if we don't cache align the vma to save memory (would be 192
bytes instead of 160 when aligned) and place the vma_lock and the cold
fields at the end of the vma, it may be false sharing the cacheline with the
next vma in the slab. But that's a single vma, not up to 7, so it shouldn't
be worse?



  reply	other threads:[~2023-01-10  8:05 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09 20:52 [PATCH 00/41] Per-VMA locks Suren Baghdasaryan
2023-01-09 20:52 ` [PATCH 01/41] maple_tree: Be more cautious about dead nodes Suren Baghdasaryan
2023-01-09 20:52 ` [PATCH 02/41] maple_tree: Detect dead nodes in mas_start() Suren Baghdasaryan
2023-01-09 20:52 ` [PATCH 03/41] maple_tree: Fix freeing of nodes in rcu mode Suren Baghdasaryan
2023-01-09 20:52 ` [PATCH 04/41] maple_tree: remove extra smp_wmb() from mas_dead_leaves() Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 05/41] maple_tree: Fix write memory barrier of nodes once dead for RCU mode Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 06/41] maple_tree: Add smp_rmb() to dead node detection Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 07/41] mm: Enable maple tree RCU mode by default Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 08/41] mm: introduce CONFIG_PER_VMA_LOCK Suren Baghdasaryan
2023-01-11  0:13   ` Davidlohr Bueso
2023-01-11  0:44     ` Suren Baghdasaryan
2023-01-11  8:23       ` Michal Hocko
2023-01-11  9:54         ` Ingo Molnar
     [not found]           ` <6be809f5554a4faaa22c287ba4224bd0@AcuMS.aculab.com>
2023-01-11 16:28             ` Suren Baghdasaryan
2023-01-11 16:44               ` Michal Hocko
2023-01-11 17:04                 ` Suren Baghdasaryan
2023-01-11 17:37                   ` Michal Hocko
2023-01-11 17:49                     ` Suren Baghdasaryan
2023-01-11 18:02                       ` Michal Hocko
2023-01-11 18:09                         ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 09/41] mm: rcu safe VMA freeing Suren Baghdasaryan
2023-01-17 14:25   ` Michal Hocko
2023-01-18  2:16     ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 10/41] mm: move mmap_lock assert function definitions Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 11/41] mm: export dump_mm() Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 12/41] mm: add per-VMA lock and helper functions to control it Suren Baghdasaryan
2023-01-17 15:04   ` Michal Hocko
2023-01-17 15:12     ` Michal Hocko
2023-01-17 21:21       ` Suren Baghdasaryan
2023-01-17 21:54         ` Matthew Wilcox
2023-01-17 22:33           ` Suren Baghdasaryan
2023-01-18  9:18           ` Michal Hocko
2023-01-17 21:08     ` Suren Baghdasaryan
2023-01-17 15:07   ` Michal Hocko
2023-01-17 21:09     ` Suren Baghdasaryan
2023-01-17 18:02   ` Jann Horn
2023-01-17 21:28     ` Suren Baghdasaryan
2023-01-17 21:45       ` Jann Horn
2023-01-17 22:36         ` Suren Baghdasaryan
2023-01-17 23:15           ` Matthew Wilcox
2023-11-22 14:04         ` Alexander Gordeev
2023-01-18 12:28     ` Michal Hocko
2023-01-18 13:23       ` Jann Horn
2023-01-18 15:11         ` Michal Hocko
2023-01-18 17:36           ` Suren Baghdasaryan
2023-01-18 21:28             ` Michal Hocko
2023-01-18 21:45               ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 13/41] mm: introduce vma->vm_flags modifier functions Suren Baghdasaryan
2023-01-11 15:47   ` Davidlohr Bueso
2023-01-11 17:36     ` Suren Baghdasaryan
2023-01-11 19:52       ` Davidlohr Bueso
2023-01-11 21:23         ` Suren Baghdasaryan
2023-01-17 15:09   ` Michal Hocko
2023-01-17 15:15     ` Michal Hocko
2023-01-18  2:07       ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 14/41] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 15/41] mm: replace vma->vm_flags direct modifications with modifier calls Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 16/41] mm: replace vma->vm_flags indirect modification in ksm_madvise Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 17/41] mm/mmap: move VMA locking before anon_vma_lock_write call Suren Baghdasaryan
2023-01-17 15:16   ` Michal Hocko
2023-01-18  2:01     ` Suren Baghdasaryan
2023-01-18  9:23       ` Michal Hocko
2023-01-18 18:09         ` Suren Baghdasaryan
2023-01-18 21:33           ` Michal Hocko
2023-01-18 21:48             ` Suren Baghdasaryan
2023-01-19  9:31               ` Michal Hocko
2023-01-19 18:53                 ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 18/41] mm/khugepaged: write-lock VMA while collapsing a huge page Suren Baghdasaryan
2023-01-17 15:25   ` Michal Hocko
2023-01-17 20:28     ` Jann Horn
2023-01-17 21:05       ` Suren Baghdasaryan
2023-01-18  9:40       ` Michal Hocko
2023-01-18 12:38         ` Jann Horn
2023-01-18 17:41         ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 19/41] mm/mmap: write-lock VMAs before merging, splitting or expanding them Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 20/41] mm/mmap: write-lock VMAs in vma_adjust Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 21/41] mm/mmap: write-lock VMAs affected by VMA expansion Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 22/41] mm/mremap: write-lock VMA while remapping it to a new address range Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 23/41] mm: write-lock VMAs before removing them from VMA tree Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 24/41] mm: conditionally write-lock VMA in free_pgtables Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 25/41] mm/mmap: write-lock adjacent VMAs if they can grow into unmapped area Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 26/41] kernel/fork: assert no VMA readers during its destruction Suren Baghdasaryan
2023-01-17 15:42   ` Michal Hocko
2023-01-18  1:53     ` Suren Baghdasaryan
2023-01-18  9:43       ` Michal Hocko
2023-01-18 18:06         ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 27/41] mm/mmap: prevent pagefault handler from racing with mmu_notifier registration Suren Baghdasaryan
2023-01-18 12:50   ` Jann Horn
2023-01-18 17:40     ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 28/41] mm: introduce lock_vma_under_rcu to be used from arch-specific code Suren Baghdasaryan
2023-01-17 15:47   ` Michal Hocko
2023-01-18  1:06     ` Suren Baghdasaryan
2023-01-18  2:44       ` Matthew Wilcox
2023-01-18 21:33         ` Suren Baghdasaryan
2023-01-17 21:03   ` Jann Horn
2023-01-17 23:18     ` Liam Howlett
2023-01-09 20:53 ` [PATCH 29/41] mm: fall back to mmap_lock if vma->anon_vma is not yet set Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 30/41] mm: add FAULT_FLAG_VMA_LOCK flag Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 31/41] mm: prevent do_swap_page from handling page faults under VMA lock Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 32/41] mm: prevent userfaults to be handled under per-vma lock Suren Baghdasaryan
2023-01-17 19:51   ` Jann Horn
2023-01-17 20:36     ` Jann Horn
2023-01-17 20:57       ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 33/41] mm: introduce per-VMA lock statistics Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 34/41] x86/mm: try VMA lock-based page fault handling first Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 35/41] arm64/mm: " Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 36/41] powerc/mm: " Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 37/41] mm: introduce mod_vm_flags_nolock Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 38/41] mm: avoid assertion in untrack_pfn Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 39/41] kernel/fork: throttle call_rcu() calls in vm_area_free Suren Baghdasaryan
2023-01-17 15:57   ` Michal Hocko
2023-01-18  1:19     ` Suren Baghdasaryan
2023-01-18  9:49       ` Michal Hocko
2023-01-18 18:04         ` Suren Baghdasaryan
2023-01-18 18:34           ` Paul E. McKenney
2023-01-18 19:01             ` Suren Baghdasaryan
2023-01-18 20:20               ` Paul E. McKenney
2023-01-19 12:52               ` Michal Hocko
2023-01-19 19:17                 ` Paul E. McKenney
2023-01-20  8:57                   ` Michal Hocko
2023-01-20 16:08                     ` Paul E. McKenney
2023-01-19 12:59   ` Michal Hocko
2023-01-19 18:52     ` Suren Baghdasaryan
2023-01-19 19:20       ` Paul E. McKenney
2023-01-19 19:47         ` Suren Baghdasaryan
2023-01-19 19:55           ` Paul E. McKenney
2023-01-20  8:52       ` Michal Hocko
2023-01-20 16:20         ` Suren Baghdasaryan
2023-01-20 16:45           ` Suren Baghdasaryan
2023-01-20 16:49             ` Matthew Wilcox
2023-01-20 17:08               ` Liam R. Howlett
2023-01-20 17:17                 ` Suren Baghdasaryan
2023-01-20 17:32                   ` Matthew Wilcox
2023-01-20 17:50                     ` Suren Baghdasaryan
2023-01-20 19:23                       ` Liam R. Howlett
2023-01-23  9:56                       ` Michal Hocko
2023-01-23 16:22                         ` Suren Baghdasaryan
2023-01-23 16:55                           ` Michal Hocko
2023-01-23 17:07                             ` Suren Baghdasaryan
2023-01-23 17:16                               ` Michal Hocko
2023-01-23 17:46                                 ` Suren Baghdasaryan
2023-01-23 18:23                                   ` Matthew Wilcox
2023-01-23 18:47                                     ` Suren Baghdasaryan
2023-01-23 19:18                                     ` Michal Hocko
2023-01-23 19:30                                       ` Matthew Wilcox
2023-01-23 19:57                                         ` Suren Baghdasaryan
2023-01-23 20:00                                         ` Michal Hocko
2023-01-23 20:08                                           ` Suren Baghdasaryan
2023-01-23 20:38                                           ` Liam R. Howlett
2023-01-20 17:21               ` Paul E. McKenney
2023-01-20 18:42                 ` Suren Baghdasaryan
2023-01-23  9:59           ` Michal Hocko
2023-01-23 17:43             ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 40/41] mm: separate vma->lock from vm_area_struct Suren Baghdasaryan
2023-01-17 18:33   ` Jann Horn
2023-01-17 19:01     ` Suren Baghdasaryan
2023-01-09 20:53 ` [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock Suren Baghdasaryan
2023-01-10  8:04   ` Vlastimil Babka [this message]
2023-01-10 17:05     ` Suren Baghdasaryan
2023-01-16 11:14   ` Hyeonggon Yoo
2023-01-16 22:36     ` Suren Baghdasaryan
2023-01-17  4:14     ` Matthew Wilcox
2023-01-17  4:34       ` Suren Baghdasaryan
2023-01-17  5:46         ` Matthew Wilcox
2023-01-17  5:58           ` Suren Baghdasaryan
2023-01-17 18:23             ` Matthew Wilcox
2023-01-17 18:28               ` Suren Baghdasaryan
2023-01-17 20:31                 ` Michal Hocko
2023-01-17 21:00                   ` Suren Baghdasaryan
     [not found]   ` <20230116140649.2012-1-hdanton@sina.com>
2023-01-16 23:08     ` Suren Baghdasaryan
2023-01-16 23:11       ` Suren Baghdasaryan
     [not found]       ` <20230117031632.2321-1-hdanton@sina.com>
2023-01-17  4:52         ` Suren Baghdasaryan
     [not found]           ` <20230117083355.2374-1-hdanton@sina.com>
2023-01-17 18:21             ` Suren Baghdasaryan
2023-01-17 18:27               ` Matthew Wilcox
2023-01-17 18:31                 ` Suren Baghdasaryan
     [not found]                 ` <20230118062639.2839-1-hdanton@sina.com>
2023-01-18 18:35                   ` Matthew Wilcox
2023-01-17 18:11   ` Jann Horn
2023-01-17 18:26     ` Suren Baghdasaryan
2023-01-17 18:31       ` Matthew Wilcox
2023-01-17 18:36         ` Jann Horn
2023-01-17 18:49           ` Suren Baghdasaryan
2023-01-17 18:36         ` Suren Baghdasaryan
2023-01-17 18:48           ` Matthew Wilcox
2023-01-17 18:55             ` Suren Baghdasaryan
2023-01-17 18:59               ` Jann Horn
2023-01-17 19:06                 ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5874fea2-fc3b-5e5d-50ac-e413a11819a5@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=arjunroy@google.com \
    --cc=axelrasmussen@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=gthelen@google.com \
    --cc=gurua@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=hughlynch@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=joelaf@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel-team@android.com \
    --cc=laurent.dufour@fr.ibm.com \
    --cc=ldufour@linux.ibm.com \
    --cc=leewalsh@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=minchan@google.com \
    --cc=paulmck@kernel.org \
    --cc=peterjung1337@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=posk@google.com \
    --cc=punit.agrawal@bytedance.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=soheil@google.com \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=tatashin@google.com \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).