linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Gerald Shaefer <gerald.schaefer@linux.ibm.com>,
	 Vasily Gorbik <gor@linux.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	 Mike Rapoport <rppt@kernel.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Yang Shi <shy828301@gmail.com>,
	 Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,  Yu Zhao <yuzhao@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	 Ralph Campbell <rcampbell@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	 Steven Price <steven.price@arm.com>,
	SeongJae Park <sj@kernel.org>,
	 Lorenzo Stoakes <lstoakes@gmail.com>,
	Huang Ying <ying.huang@intel.com>,
	 Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Christophe Leroy <christophe.leroy@csgroup.eu>,
	 Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>,
	 Axel Rasmussen <axelrasmussen@google.com>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	 Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	 Christoph Hellwig <hch@infradead.org>,
	Song Liu <song@kernel.org>,
	 Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	 Russell King <linux@armlinux.org.uk>,
	 "David S. Miller" <davem@davemloft.net>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	 Heiko Carstens <hca@linux.ibm.com>,
	 Christian Borntraeger <borntraeger@linux.ibm.com>,
	 Claudio Imbrenda <imbrenda@linux.ibm.com>,
	 Alexander Gordeev <agordeev@linux.ibm.com>,
	Jann Horn <jannh@google.com>,
	 Vishal Moola <vishal.moola@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	 linux-arm-kernel@lists.infradead.org,
	sparclinux@vger.kernel.org,  linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH v2 00/12] mm: free retracted page table by RCU
Date: Tue, 20 Jun 2023 00:35:49 -0700 (PDT)	[thread overview]
Message-ID: <54cb04f-3762-987f-8294-91dafd8ebfb0@google.com> (raw)

Here is v2 third series of patches to mm (and a few architectures), based
on v6.4-rc5 with the preceding two series applied: in which khugepaged
takes advantage of pte_offset_map[_lock]() allowing for pmd transitions.
Differences from v1 are noted patch by patch below

This follows on from the v2 "arch: allow pte_offset_map[_lock]() to fail"
https://lore.kernel.org/linux-mm/a4963be9-7aa6-350-66d0-2ba843e1af44@google.com/
series of 23 posted on 2023-06-08 (and now in mm-stable - thank you),
and the v2 "mm: allow pte_offset_map[_lock]() to fail"
https://lore.kernel.org/linux-mm/c1c9a74a-bc5b-15ea-e5d2-8ec34bc921d@google.com/
series of 32 posted on 2023-06-08 (and now in mm-stable - thank you),
and replaces the v1 "mm: free retracted page table by RCU"
https://lore.kernel.org/linux-mm/35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com/
series of 12 posted on 2023-05-28 (which was bad on powerpc and s390).

The first two series were "independent": neither depending for build or
correctness on the other, but both series had to be in before this third
series is added to make the effective changes; and it would probably be
best to hold this series back until the following release, since it might
now reveal missed imbalances which the first series hoped to fix.

What is it all about?  Some mmap_lock avoidance i.e. latency reduction.
Initially just for the case of collapsing shmem or file pages to THPs:
the usefulness of MADV_COLLAPSE on shmem is being limited by that
mmap_write_lock it currently requires.

Likely to be relied upon later in other contexts e.g. freeing of
empty page tables (but that's not work I'm doing).  mmap_write_lock
avoidance when collapsing to anon THPs?  Perhaps, but again that's not
work I've done: a quick attempt was not as easy as the shmem/file case.

These changes (though of course not these exact patches) have been in
Google's data centre kernel for three years now: we do rely upon them.

Based on the preceding two series over v6.4-rc5, or any v6.4-rc; and
almost good on current mm-everything or current linux-next - just one
patch conflicts, the 10/12: I'll reply to that one with its
mm-everything or linux-next equivalent (ptent replacing *pte).

01/12 mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s
      v2: same as v1
02/12 mm/pgtable: add PAE safety to __pte_offset_map()
      v2: rename to pmdp_get_lockless_start/end() per Matthew;
          so use inlines without _irq_save(flags) macro oddity;
          add pmdp_get_lockless_sync() for use later in 09/12.
03/12 arm: adjust_pte() use pte_offset_map_nolock()
      v2: same as v1
04/12 powerpc: assert_pte_locked() use pte_offset_map_nolock()
      v2: same as v1
05/12 powerpc: add pte_free_defer() for pgtables sharing page
      v2: fix rcu_head usage to cope with concurrent deferrals;
          add para to commit message explaining rcu_head issue.
06/12 sparc: add pte_free_defer() for pte_t *pgtable_t
      v2: use page_address() instead of less common page_to_virt();
          add para to commit message explaining simple conversion;
          changed title since sparc64 pgtables do not share page.
07/12 s390: add pte_free_defer() for pgtables sharing page
      v2: complete rewrite, integrated with s390's existing pgtable
          management; temporarily using a global mm_pgtable_list_lock,
          to be restored to per-mm spinlock in a later followup patch.
08/12 mm/pgtable: add pte_free_defer() for pgtable as page
      v2: add comment on rcu_head to "Page table pages", per JannH
09/12 mm/khugepaged: retract_page_tables() without mmap or vma lock
      v2: repeat checks under ptl because UFFD, per PeterX and JannH;
          bring back mmu_notifier calls for PMD, per JannH and Jason;
          pmdp_get_lockless_sync() to issue missing interrupt if PAE.
10/12 mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock()
      v2: first check VMA, in case page tables torn down, per JannH;
          pmdp_get_lockless_sync() to issue missing interrupt if PAE;
          moved mmu_notifier after step 1, reworked final goto labels.
11/12 mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps()
      v2: same as v1
12/12 mm: delete mmap_write_trylock() and vma_try_start_write()
      v2: same as v1

 arch/arm/mm/fault-armv.c            |   3 +-
 arch/powerpc/include/asm/pgalloc.h  |   4 +
 arch/powerpc/mm/pgtable-frag.c      |  51 ++++
 arch/powerpc/mm/pgtable.c           |  16 +-
 arch/s390/include/asm/pgalloc.h     |   4 +
 arch/s390/mm/pgalloc.c              | 205 +++++++++----
 arch/sparc/include/asm/pgalloc_64.h |   4 +
 arch/sparc/mm/init_64.c             |  16 +
 include/linux/mm.h                  |  17 --
 include/linux/mm_types.h            |   6 +-
 include/linux/mmap_lock.h           |  10 -
 include/linux/pgtable.h             |  10 +-
 mm/khugepaged.c                     | 481 +++++++++++-------------------
 mm/pgtable-generic.c                |  53 +++-
 14 files changed, 467 insertions(+), 413 deletions(-)

Hugh


             reply	other threads:[~2023-06-20  7:36 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20  7:35 Hugh Dickins [this message]
2023-06-20  7:40 ` [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-06-20  7:42 ` [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-06-20  7:43 ` [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-06-20  7:45 ` [PATCH v2 04/12] powerpc: assert_pte_locked() " Hugh Dickins
2023-06-20  7:47 ` [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-20 11:45   ` Jason Gunthorpe
2023-06-20 19:54     ` Hugh Dickins
2023-06-20 23:52       ` Jason Gunthorpe
2023-06-22  2:36         ` Hugh Dickins
2023-06-27 17:01           ` Jason Gunthorpe
2023-06-27 20:53             ` Hugh Dickins
2023-06-20  7:49 ` [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins
2023-06-20  7:51 ` [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-28 19:16   ` Gerald Schaefer
2023-06-29  5:08     ` Hugh Dickins
2023-06-29 15:22       ` Jason Gunthorpe
2023-06-29 15:56         ` Gerald Schaefer
2023-06-30  6:00           ` Hugh Dickins
2023-07-02  4:32             ` Hugh Dickins
2023-07-04 13:40               ` Alexander Gordeev
2023-07-04 16:03                 ` Hugh Dickins
2023-07-04 15:19               ` Gerald Schaefer
2023-07-04 17:03                 ` Hugh Dickins
2023-07-05 12:55                   ` Gerald Schaefer
2023-07-06  1:20                     ` Hugh Dickins
2023-07-06 15:02                       ` Gerald Schaefer
2023-07-06 19:45                         ` Hugh Dickins
2023-07-10 17:21                     ` Jason Gunthorpe
2023-07-05  6:46               ` Alexander Gordeev
2023-07-06  0:52                 ` Hugh Dickins
2023-07-07 14:37                   ` Gerald Schaefer
2023-07-03 16:10             ` Gerald Schaefer
2023-06-29 13:59     ` Alexander Gordeev
2023-06-29 15:43       ` Gerald Schaefer
2023-06-30 13:38   ` Claudio Imbrenda
2023-06-30 15:28     ` Hugh Dickins
2023-06-30 16:25       ` Claudio Imbrenda
2023-06-30 19:22         ` Hugh Dickins
2023-07-03 11:00           ` Claudio Imbrenda
2023-07-03 21:29             ` Jason Gunthorpe
2023-06-20  7:53 ` [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-06-20  7:54 ` [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-06-20  7:56 ` [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-06-20  8:04   ` [PATCH mm " Hugh Dickins
2023-06-20  7:58 ` [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-06-20  7:59 ` [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54cb04f-3762-987f-8294-91dafd8ebfb0@google.com \
    --to=hughd@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=zackr@vmware.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).