From: Hugh Dickins <hughd@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Kravetz <mike.kravetz@oracle.com>, Mike Rapoport <rppt@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Suren Baghdasaryan <surenb@google.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Yang Shi <shy828301@gmail.com>, Mel Gorman <mgorman@techsingularity.net>, Peter Xu <peterx@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>, Alistair Popple <apopple@nvidia.com>, Ralph Campbell <rcampbell@nvidia.com>, Ira Weiny <ira.weiny@intel.com>, Steven Price <steven.price@arm.com>, SeongJae Park <sj@kernel.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Huang Ying <ying.huang@intel.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Christophe Leroy <christophe.leroy@csgroup.eu>, Zack Rusin <zackr@vmware.com>, Jason Gunthorpe <jgg@ziepe.ca>, Axel Rasmussen <axelrasmussen@google.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Miaohe Lin <linmiaohe@huawei.com>, Minchan Kim <minchan@kernel.org>, Christoph Hellwig <hch@infradead.org>, Song Liu <song@kernel.org>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Russell King <linux@armlinux.org.uk>, "David S. Miller" <davem@davemloft.net>, Michael Ellerman <mpe@ellerman.id.au>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Heiko Carstens <hca@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Alexander Gordeev <agordeev@linux.ibm.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>, Jann Horn <jannh@google.com>, Vishal Moola <vishal.moola@gmail.com>, Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>, linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v3 00/13] mm: free retracted page table by RCU Date: Tue, 11 Jul 2023 21:27:09 -0700 (PDT) [thread overview] Message-ID: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> (raw) Here is v3 of the series of patches to mm (and a few architectures), based on v6.5-rc1 which includes the preceding two series (thank you!): in which khugepaged takes advantage of pte_offset_map[_lock]() allowing for pmd transitions. Differences from v1 and v2 are noted patch by patch below. This replaces the v2 "mm: free retracted page table by RCU" https://lore.kernel.org/linux-mm/54cb04f-3762-987f-8294-91dafd8ebfb0@google.com/ series of 12 posted on 2023-06-20. What is it all about? Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs: the usefulness of MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it currently requires. Likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. Based on v6.5-rc1; and almost good on current mm-unstable or current linux-next - just one patch conflicts, the 12/13: I'll reply to that one with its mm-unstable or linux-next equivalent (vma_assert_locked() has been added next to where vma_try_start_write() is being removed). 01/13 mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s v3: same as v1 02/13 mm/pgtable: add PAE safety to __pte_offset_map() v3: same as v2 v2: rename to pmdp_get_lockless_start/end() per Matthew; so use inlines without _irq_save(flags) macro oddity; add pmdp_get_lockless_sync() for use later in 09/13. 03/13 arm: adjust_pte() use pte_offset_map_nolock() v3: same as v1 04/13 powerpc: assert_pte_locked() use pte_offset_map_nolock() v3: same as v1 05/13 powerpc: add pte_free_defer() for pgtables sharing page v3: much simpler version, following suggestion by Jason v2: fix rcu_head usage to cope with concurrent deferrals; add para to commit message explaining rcu_head issue. 06/13 sparc: add pte_free_defer() for pte_t *pgtable_t v3: same as v2 v2: use page_address() instead of less common page_to_virt(); add para to commit message explaining simple conversion; changed title since sparc64 pgtables do not share page. 07/13 s390: add pte_free_defer() for pgtables sharing page v3: much simpler version, following suggestion by Gerald v2: complete rewrite, integrated with s390's existing pgtable management; temporarily using a global mm_pgtable_list_lock, to be restored to per-mm spinlock in a later followup patch. 08/13 mm/pgtable: add pte_free_defer() for pgtable as page v3: same as v2 v2: add comment on rcu_head to "Page table pages", per JannH 09/13 mm/khugepaged: retract_page_tables() without mmap or vma lock v3: same as v2 v2: repeat checks under ptl because UFFD, per PeterX and JannH; bring back mmu_notifier calls for PMD, per JannH and Jason; pmdp_get_lockless_sync() to issue missing interrupt if PAE. 10/13 mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() v3: updated to using ptent instead of *pte v2: first check VMA, in case page tables torn down, per JannH; pmdp_get_lockless_sync() to issue missing interrupt if PAE; moved mmu_notifier after step 1, reworked final goto labels. 11/13 mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() v3: rediffed v2: same as v1 12/13 mm: delete mmap_write_trylock() and vma_try_start_write() v3: rediffed (different diff needed for mm-unstable or linux-next) v2: same as v1 13/13 mm/pgtable: notes on pte_offset_map[_lock]() v3: new: JannH asked for more helpful comment, this is my attempt; could be moved to be the first in the series. arch/arm/mm/fault-armv.c | 3 +- arch/powerpc/include/asm/pgalloc.h | 4 + arch/powerpc/mm/pgtable-frag.c | 29 +- arch/powerpc/mm/pgtable.c | 16 +- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 80 ++++- arch/sparc/include/asm/pgalloc_64.h | 4 + arch/sparc/mm/init_64.c | 16 + include/linux/mm.h | 17 -- include/linux/mm_types.h | 4 + include/linux/mmap_lock.h | 10 - include/linux/pgtable.h | 10 +- mm/khugepaged.c | 481 +++++++++++------------------- mm/pgtable-generic.c | 97 +++++- 14 files changed, 404 insertions(+), 371 deletions(-) Hugh
WARNING: multiple messages have this Message-ID (diff)
From: Hugh Dickins <hughd@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Miaohe Lin <linmiaohe@huawei.com>, David Hildenbrand <david@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Yang Shi <shy828301@gmail.com>, Peter Xu <peterx@redhat.com>, linux-kernel@vger.kernel.org, Song Liu <song@kernel.org>, sparclinux@vger.kernel.org, Alexander Gordeev <agordeev@linux.ibm.com>, Claudio Imbrenda <imbrenda@linux.ibm.com>, Will Deacon <will@kernel.org>, linux-s390@vger.kernel.org, Yu Zhao <yuzhao@google.com>, Ira Weiny <ira.weiny@intel.com>, Alistair Popple <apopple@nvidia.com>, Russell King <linux@armlinux.org.uk>, Matthew Wilcox <willy@infradead.org>, Steven Price <steven.price@arm.com>, Christoph Hellwig <hch@infradead.org>, Jason Gunthorpe <jgg@ziepe.ca>, "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>, Zi Yan <ziy@nvidia.com>, Huang Ying <ying.huang@intel.com>, Axel Rasmussen <axelrasmussen@google.com>, Gerald Schaefer <gerald.schaefer@linux.ibm.com>, Christian Borntraeger <borntraeger@linux.ibm.com>, Thomas Hellstrom <thomas.hellstrom@linux.intel.com>, Ralph Campbell <rcampbell@nvidia.com>, Pasha Tatashin <pasha.tatashin@soleen.com>, Vasily Gorbik <gor@linux.ibm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Heiko Carstens <hca@linux.ibm.com>, Qi Zheng <zhengqi.arch@bytedance.com>, Suren Baghdasaryan <surenb@google.com>, Vlastimil Babka <vbabka@suse.cz>, linux-arm-kernel@lists.infradead.org, SeongJae Park <sj@kernel.org>, Lorenzo Stoakes <lstoakes@gmail.com>, Jann Horn <jannh@google.com>, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Naoya Horiguchi <naoya.horiguchi@nec.com>, Zack Rusin <zackr@vmware.com>, Vishal Moola <vishal.moola@gmail.com>, Minchan Kim <minchan@kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Mel Gorman <mgorman@techsingularity.net>, "David S. Miller" <davem@davemloft.net>, Mike Rapoport <rppt@kernel.org>, Mike Kravetz <mike.kravetz@oracle.com> Subject: [PATCH v3 00/13] mm: free retracted page table by RCU Date: Tue, 11 Jul 2023 21:27:09 -0700 (PDT) [thread overview] Message-ID: <7cd843a9-aa80-14f-5eb2-33427363c20@google.com> (raw) Here is v3 of the series of patches to mm (and a few architectures), based on v6.5-rc1 which includes the preceding two series (thank you!): in which khugepaged takes advantage of pte_offset_map[_lock]() allowing for pmd transitions. Differences from v1 and v2 are noted patch by patch below. This replaces the v2 "mm: free retracted page table by RCU" https://lore.kernel.org/linux-mm/54cb04f-3762-987f-8294-91dafd8ebfb0@google.com/ series of 12 posted on 2023-06-20. What is it all about? Some mmap_lock avoidance i.e. latency reduction. Initially just for the case of collapsing shmem or file pages to THPs: the usefulness of MADV_COLLAPSE on shmem is being limited by that mmap_write_lock it currently requires. Likely to be relied upon later in other contexts e.g. freeing of empty page tables (but that's not work I'm doing). mmap_write_lock avoidance when collapsing to anon THPs? Perhaps, but again that's not work I've done: a quick attempt was not as easy as the shmem/file case. These changes (though of course not these exact patches) have been in Google's data centre kernel for three years now: we do rely upon them. Based on v6.5-rc1; and almost good on current mm-unstable or current linux-next - just one patch conflicts, the 12/13: I'll reply to that one with its mm-unstable or linux-next equivalent (vma_assert_locked() has been added next to where vma_try_start_write() is being removed). 01/13 mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s v3: same as v1 02/13 mm/pgtable: add PAE safety to __pte_offset_map() v3: same as v2 v2: rename to pmdp_get_lockless_start/end() per Matthew; so use inlines without _irq_save(flags) macro oddity; add pmdp_get_lockless_sync() for use later in 09/13. 03/13 arm: adjust_pte() use pte_offset_map_nolock() v3: same as v1 04/13 powerpc: assert_pte_locked() use pte_offset_map_nolock() v3: same as v1 05/13 powerpc: add pte_free_defer() for pgtables sharing page v3: much simpler version, following suggestion by Jason v2: fix rcu_head usage to cope with concurrent deferrals; add para to commit message explaining rcu_head issue. 06/13 sparc: add pte_free_defer() for pte_t *pgtable_t v3: same as v2 v2: use page_address() instead of less common page_to_virt(); add para to commit message explaining simple conversion; changed title since sparc64 pgtables do not share page. 07/13 s390: add pte_free_defer() for pgtables sharing page v3: much simpler version, following suggestion by Gerald v2: complete rewrite, integrated with s390's existing pgtable management; temporarily using a global mm_pgtable_list_lock, to be restored to per-mm spinlock in a later followup patch. 08/13 mm/pgtable: add pte_free_defer() for pgtable as page v3: same as v2 v2: add comment on rcu_head to "Page table pages", per JannH 09/13 mm/khugepaged: retract_page_tables() without mmap or vma lock v3: same as v2 v2: repeat checks under ptl because UFFD, per PeterX and JannH; bring back mmu_notifier calls for PMD, per JannH and Jason; pmdp_get_lockless_sync() to issue missing interrupt if PAE. 10/13 mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() v3: updated to using ptent instead of *pte v2: first check VMA, in case page tables torn down, per JannH; pmdp_get_lockless_sync() to issue missing interrupt if PAE; moved mmu_notifier after step 1, reworked final goto labels. 11/13 mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() v3: rediffed v2: same as v1 12/13 mm: delete mmap_write_trylock() and vma_try_start_write() v3: rediffed (different diff needed for mm-unstable or linux-next) v2: same as v1 13/13 mm/pgtable: notes on pte_offset_map[_lock]() v3: new: JannH asked for more helpful comment, this is my attempt; could be moved to be the first in the series. arch/arm/mm/fault-armv.c | 3 +- arch/powerpc/include/asm/pgalloc.h | 4 + arch/powerpc/mm/pgtable-frag.c | 29 +- arch/powerpc/mm/pgtable.c | 16 +- arch/s390/include/asm/pgalloc.h | 4 + arch/s390/mm/pgalloc.c | 80 ++++- arch/sparc/include/asm/pgalloc_64.h | 4 + arch/sparc/mm/init_64.c | 16 + include/linux/mm.h | 17 -- include/linux/mm_types.h | 4 + include/linux/mmap_lock.h | 10 - include/linux/pgtable.h | 10 +- mm/khugepaged.c | 481 +++++++++++------------------- mm/pgtable-generic.c | 97 +++++- 14 files changed, 404 insertions(+), 371 deletions(-) Hugh
next reply other threads:[~2023-07-12 4:27 UTC|newest] Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-07-12 4:27 Hugh Dickins [this message] 2023-07-12 4:27 ` [PATCH v3 00/13] mm: free retracted page table by RCU Hugh Dickins 2023-07-12 4:30 ` [PATCH v3 01/13] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins 2023-07-12 4:30 ` Hugh Dickins 2023-07-12 4:32 ` [PATCH v3 02/13] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins 2023-07-12 4:32 ` Hugh Dickins 2023-07-12 4:33 ` [PATCH v3 03/13] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins 2023-07-12 4:33 ` Hugh Dickins 2023-07-12 4:34 ` [PATCH v3 04/13] powerpc: assert_pte_locked() " Hugh Dickins 2023-07-12 4:34 ` Hugh Dickins 2023-07-18 10:41 ` Aneesh Kumar K.V 2023-07-18 10:41 ` Aneesh Kumar K.V 2023-07-19 5:04 ` Hugh Dickins 2023-07-19 5:04 ` Hugh Dickins 2023-07-19 5:24 ` Aneesh Kumar K V 2023-07-19 5:24 ` Aneesh Kumar K V 2023-07-21 13:13 ` Jay Patel 2023-07-21 13:13 ` Jay Patel 2023-07-23 22:26 ` [PATCH v3 04/13 fix] powerpc: assert_pte_locked() use pte_offset_map_nolock(): fix Hugh Dickins 2023-07-23 22:26 ` Hugh Dickins 2023-07-12 4:35 ` [PATCH v3 05/13] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins 2023-07-12 4:35 ` Hugh Dickins 2023-07-12 4:37 ` [PATCH v3 06/13] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins 2023-07-12 4:37 ` Hugh Dickins 2023-07-12 4:38 ` [PATCH v3 07/13] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins 2023-07-12 4:38 ` Hugh Dickins 2023-07-13 4:47 ` Alexander Gordeev 2023-07-13 4:47 ` Alexander Gordeev 2023-07-19 14:25 ` Claudio Imbrenda 2023-07-19 14:25 ` Claudio Imbrenda 2023-07-23 22:29 ` [PATCH v3 07/13 fix] s390: add pte_free_defer() for pgtables sharing page: fix Hugh Dickins 2023-07-23 22:29 ` Hugh Dickins 2023-07-12 4:39 ` [PATCH v3 08/13] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins 2023-07-12 4:39 ` Hugh Dickins 2023-07-12 4:41 ` [PATCH v3 09/13] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins 2023-07-12 4:41 ` Hugh Dickins 2023-07-12 4:42 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins 2023-07-12 4:42 ` Hugh Dickins 2023-07-23 22:32 ` [PATCH v3 10/13 fix] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix Hugh Dickins 2023-07-23 22:32 ` Hugh Dickins 2023-08-03 9:17 ` [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Qi Zheng 2023-08-03 9:17 ` Qi Zheng 2023-08-06 3:55 ` Hugh Dickins 2023-08-06 3:55 ` Hugh Dickins 2023-08-07 2:21 ` Qi Zheng 2023-08-07 2:21 ` Qi Zheng 2023-08-06 3:59 ` [PATCH v3 10/13 fix2] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock(): fix2 Hugh Dickins 2023-08-06 3:59 ` Hugh Dickins 2023-08-14 20:36 ` [BUG] Re: [PATCH v3 10/13] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Jann Horn 2023-08-14 20:36 ` Jann Horn 2023-08-15 6:34 ` Hugh Dickins 2023-08-15 6:34 ` Hugh Dickins 2023-08-15 7:11 ` David Hildenbrand 2023-08-15 7:11 ` David Hildenbrand 2023-08-15 15:41 ` Hugh Dickins 2023-08-15 15:41 ` Hugh Dickins 2023-08-21 19:48 ` Hugh Dickins 2023-08-21 19:48 ` Hugh Dickins 2023-07-12 4:43 ` [PATCH v3 11/13] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins 2023-07-12 4:43 ` Hugh Dickins 2023-07-23 22:35 ` [PATCH v3 11/13 fix] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps(): fix Hugh Dickins 2023-07-23 22:35 ` Hugh Dickins 2023-07-12 4:44 ` [PATCH v3 12/13] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins 2023-07-12 4:44 ` Hugh Dickins 2023-07-12 4:48 ` [PATCH mm " Hugh Dickins 2023-07-12 4:48 ` Hugh Dickins 2023-07-12 4:46 ` [PATCH v3 13/13] mm/pgtable: notes on pte_offset_map[_lock]() Hugh Dickins 2023-07-12 4:46 ` Hugh Dickins
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=7cd843a9-aa80-14f-5eb2-33427363c20@google.com \ --to=hughd@google.com \ --cc=agordeev@linux.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.ibm.com \ --cc=anshuman.khandual@arm.com \ --cc=apopple@nvidia.com \ --cc=axelrasmussen@google.com \ --cc=borntraeger@linux.ibm.com \ --cc=christophe.leroy@csgroup.eu \ --cc=davem@davemloft.net \ --cc=david@redhat.com \ --cc=gerald.schaefer@linux.ibm.com \ --cc=gor@linux.ibm.com \ --cc=hca@linux.ibm.com \ --cc=hch@infradead.org \ --cc=imbrenda@linux.ibm.com \ --cc=ira.weiny@intel.com \ --cc=jannh@google.com \ --cc=jgg@ziepe.ca \ --cc=kirill.shutemov@linux.intel.com \ --cc=linmiaohe@huawei.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-s390@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=lstoakes@gmail.com \ --cc=mgorman@techsingularity.net \ --cc=mike.kravetz@oracle.com \ --cc=minchan@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=naoya.horiguchi@nec.com \ --cc=pasha.tatashin@soleen.com \ --cc=peterx@redhat.com \ --cc=peterz@infradead.org \ --cc=rcampbell@nvidia.com \ --cc=rppt@kernel.org \ --cc=shy828301@gmail.com \ --cc=sj@kernel.org \ --cc=song@kernel.org \ --cc=sparclinux@vger.kernel.org \ --cc=steven.price@arm.com \ --cc=surenb@google.com \ --cc=thomas.hellstrom@linux.intel.com \ --cc=vbabka@suse.cz \ --cc=vishal.moola@gmail.com \ --cc=will@kernel.org \ --cc=willy@infradead.org \ --cc=ying.huang@intel.com \ --cc=yuzhao@google.com \ --cc=zackr@vmware.com \ --cc=zhengqi.arch@bytedance.com \ --cc=ziy@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.