From: David Hildenbrand <david@redhat.com> To: Ryan Roberts <ryan.roberts@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Luis Chamberlain <mcgrof@kernel.org>, Itaru Kitayama <itaru.kitayama@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, John Hubbard <jhubbard@nvidia.com>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Barry Song <21cnbao@gmail.com>, Alistair Popple <apopple@nvidia.com> Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v9 04/10] mm: thp: Support allocation of anonymous multi-size THP Date: Tue, 12 Dec 2023 16:02:36 +0100 [thread overview] Message-ID: <2bebcf33-e8b7-468d-86cc-31d6eb355b66@redhat.com> (raw) In-Reply-To: <20231207161211.2374093-5-ryan.roberts@arm.com> On 07.12.23 17:12, Ryan Roberts wrote: > Introduce the logic to allow THP to be configured (through the new sysfs > interface we just added) to allocate large folios to back anonymous > memory, which are larger than the base page size but smaller than > PMD-size. We call this new THP extension "multi-size THP" (mTHP). > > mTHP continues to be PTE-mapped, but in many cases can still provide > similar benefits to traditional PMD-sized THP: Page faults are > significantly reduced (by a factor of e.g. 4, 8, 16, etc. depending on > the configured order), but latency spikes are much less prominent > because the size of each page isn't as huge as the PMD-sized variant and > there is less memory to clear in each page fault. The number of per-page > operations (e.g. ref counting, rmap management, lru list management) are > also significantly reduced since those ops now become per-folio. I'll note that with always-pte-mapped-thp it will be much easier to support incremental page clearing (e.g., zero only parts of the folio and map the remainder in a pro-non-like fashion whereby we'll zero on the next page fault). With a PMD-sized thp, you have to eventually place/rip out page tables to achieve that. > > Some architectures also employ TLB compression mechanisms to squeeze > more entries in when a set of PTEs are virtually and physically > contiguous and approporiately aligned. In this case, TLB misses will > occur less often. > > The new behaviour is disabled by default, but can be enabled at runtime > by writing to /sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled > (see documentation in previous commit). The long term aim is to change > the default to include suitable lower orders, but there are some risks > around internal fragmentation that need to be better understood first. > > Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> > Tested-by: John Hubbard <jhubbard@nvidia.com> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > include/linux/huge_mm.h | 6 ++- > mm/memory.c | 111 ++++++++++++++++++++++++++++++++++++---- > 2 files changed, 106 insertions(+), 11 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 609c153bae57..fa7a38a30fc6 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -68,9 +68,11 @@ extern struct kobj_attribute shmem_enabled_attr; > #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER) [...] > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +static struct folio *alloc_anon_folio(struct vm_fault *vmf) > +{ > + struct vm_area_struct *vma = vmf->vma; > + unsigned long orders; > + struct folio *folio; > + unsigned long addr; > + pte_t *pte; > + gfp_t gfp; > + int order; > + > + /* > + * If uffd is active for the vma we need per-page fault fidelity to > + * maintain the uffd semantics. > + */ > + if (unlikely(userfaultfd_armed(vma))) > + goto fallback; > + > + /* > + * Get a list of all the (large) orders below PMD_ORDER that are enabled > + * for this vma. Then filter out the orders that can't be allocated over > + * the faulting address and still be fully contained in the vma. > + */ > + orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, > + BIT(PMD_ORDER) - 1); > + orders = thp_vma_suitable_orders(vma, vmf->address, orders); > + > + if (!orders) > + goto fallback; > + > + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); > + if (!pte) > + return ERR_PTR(-EAGAIN); > + > + /* > + * Find the highest order where the aligned range is completely > + * pte_none(). Note that all remaining orders will be completely > + * pte_none(). > + */ > + order = highest_order(orders); > + while (orders) { > + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > + if (pte_range_none(pte + pte_index(addr), 1 << order)) > + break; > + order = next_order(&orders, order); > + } > + > + pte_unmap(pte); > + > + /* Try allocating the highest of the remaining orders. */ > + gfp = vma_thp_gfp_mask(vma); > + while (orders) { > + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > + folio = vma_alloc_folio(gfp, order, vma, addr, true); > + if (folio) { > + clear_huge_page(&folio->page, vmf->address, 1 << order); > + return folio; > + } > + order = next_order(&orders, order); > + } > + > +fallback: > + return vma_alloc_zeroed_movable_folio(vma, vmf->address); > +} > +#else > +#define alloc_anon_folio(vmf) \ > + vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address) > +#endif A neater alternative might be static struct folio *alloc_anon_folio(struct vm_fault *vmf) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* magic */ fallback: #endif return vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address): } [...] Acked-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb
WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com> To: Ryan Roberts <ryan.roberts@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Yin Fengwei <fengwei.yin@intel.com>, Yu Zhao <yuzhao@google.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Luis Chamberlain <mcgrof@kernel.org>, Itaru Kitayama <itaru.kitayama@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, John Hubbard <jhubbard@nvidia.com>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Barry Song <21cnbao@gmail.com>, Alistair Popple <apopple@nvidia.com> Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v9 04/10] mm: thp: Support allocation of anonymous multi-size THP Date: Tue, 12 Dec 2023 16:02:36 +0100 [thread overview] Message-ID: <2bebcf33-e8b7-468d-86cc-31d6eb355b66@redhat.com> (raw) In-Reply-To: <20231207161211.2374093-5-ryan.roberts@arm.com> On 07.12.23 17:12, Ryan Roberts wrote: > Introduce the logic to allow THP to be configured (through the new sysfs > interface we just added) to allocate large folios to back anonymous > memory, which are larger than the base page size but smaller than > PMD-size. We call this new THP extension "multi-size THP" (mTHP). > > mTHP continues to be PTE-mapped, but in many cases can still provide > similar benefits to traditional PMD-sized THP: Page faults are > significantly reduced (by a factor of e.g. 4, 8, 16, etc. depending on > the configured order), but latency spikes are much less prominent > because the size of each page isn't as huge as the PMD-sized variant and > there is less memory to clear in each page fault. The number of per-page > operations (e.g. ref counting, rmap management, lru list management) are > also significantly reduced since those ops now become per-folio. I'll note that with always-pte-mapped-thp it will be much easier to support incremental page clearing (e.g., zero only parts of the folio and map the remainder in a pro-non-like fashion whereby we'll zero on the next page fault). With a PMD-sized thp, you have to eventually place/rip out page tables to achieve that. > > Some architectures also employ TLB compression mechanisms to squeeze > more entries in when a set of PTEs are virtually and physically > contiguous and approporiately aligned. In this case, TLB misses will > occur less often. > > The new behaviour is disabled by default, but can be enabled at runtime > by writing to /sys/kernel/mm/transparent_hugepage/hugepage-XXkb/enabled > (see documentation in previous commit). The long term aim is to change > the default to include suitable lower orders, but there are some risks > around internal fragmentation that need to be better understood first. > > Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> > Tested-by: John Hubbard <jhubbard@nvidia.com> > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > include/linux/huge_mm.h | 6 ++- > mm/memory.c | 111 ++++++++++++++++++++++++++++++++++++---- > 2 files changed, 106 insertions(+), 11 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 609c153bae57..fa7a38a30fc6 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -68,9 +68,11 @@ extern struct kobj_attribute shmem_enabled_attr; > #define HPAGE_PMD_NR (1<<HPAGE_PMD_ORDER) [...] > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > +static struct folio *alloc_anon_folio(struct vm_fault *vmf) > +{ > + struct vm_area_struct *vma = vmf->vma; > + unsigned long orders; > + struct folio *folio; > + unsigned long addr; > + pte_t *pte; > + gfp_t gfp; > + int order; > + > + /* > + * If uffd is active for the vma we need per-page fault fidelity to > + * maintain the uffd semantics. > + */ > + if (unlikely(userfaultfd_armed(vma))) > + goto fallback; > + > + /* > + * Get a list of all the (large) orders below PMD_ORDER that are enabled > + * for this vma. Then filter out the orders that can't be allocated over > + * the faulting address and still be fully contained in the vma. > + */ > + orders = thp_vma_allowable_orders(vma, vma->vm_flags, false, true, true, > + BIT(PMD_ORDER) - 1); > + orders = thp_vma_suitable_orders(vma, vmf->address, orders); > + > + if (!orders) > + goto fallback; > + > + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); > + if (!pte) > + return ERR_PTR(-EAGAIN); > + > + /* > + * Find the highest order where the aligned range is completely > + * pte_none(). Note that all remaining orders will be completely > + * pte_none(). > + */ > + order = highest_order(orders); > + while (orders) { > + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > + if (pte_range_none(pte + pte_index(addr), 1 << order)) > + break; > + order = next_order(&orders, order); > + } > + > + pte_unmap(pte); > + > + /* Try allocating the highest of the remaining orders. */ > + gfp = vma_thp_gfp_mask(vma); > + while (orders) { > + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); > + folio = vma_alloc_folio(gfp, order, vma, addr, true); > + if (folio) { > + clear_huge_page(&folio->page, vmf->address, 1 << order); > + return folio; > + } > + order = next_order(&orders, order); > + } > + > +fallback: > + return vma_alloc_zeroed_movable_folio(vma, vmf->address); > +} > +#else > +#define alloc_anon_folio(vmf) \ > + vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address) > +#endif A neater alternative might be static struct folio *alloc_anon_folio(struct vm_fault *vmf) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE /* magic */ fallback: #endif return vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address): } [...] Acked-by: David Hildenbrand <david@redhat.com> -- Cheers, David / dhildenb _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-12-12 15:02 UTC|newest] Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-12-07 16:12 [PATCH v9 00/10] Multi-size THP for anonymous memory Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 01/10] mm: Allow deferred splitting of arbitrary anon large folios Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2024-01-13 22:42 ` Jiri Olsa 2024-01-13 22:42 ` Jiri Olsa 2024-01-14 17:33 ` David Hildenbrand 2024-01-14 17:33 ` David Hildenbrand 2024-01-14 20:55 ` Jiri Olsa 2024-01-14 20:55 ` Jiri Olsa 2024-01-15 8:50 ` Ryan Roberts 2024-01-15 8:50 ` Ryan Roberts 2024-01-15 9:38 ` David Hildenbrand 2024-01-15 9:38 ` David Hildenbrand 2024-01-24 11:15 ` Sven Schnelle 2024-01-24 11:15 ` Sven Schnelle 2024-01-24 11:19 ` Jiri Olsa 2024-01-24 11:19 ` Jiri Olsa 2024-01-24 12:02 ` Ryan Roberts 2024-01-24 12:02 ` Ryan Roberts 2024-01-24 12:06 ` Jiri Olsa 2024-01-24 12:06 ` Jiri Olsa 2024-01-24 12:17 ` Ryan Roberts 2024-01-24 12:17 ` Ryan Roberts 2024-01-24 12:28 ` Sven Schnelle 2024-01-24 12:28 ` Sven Schnelle 2024-01-24 12:42 ` Ryan Roberts 2024-01-24 12:42 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 03/10] mm: thp: Introduce multi-size THP sysfs interface Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-12 14:54 ` David Hildenbrand 2023-12-12 14:54 ` David Hildenbrand 2023-12-12 15:32 ` Ryan Roberts 2023-12-12 15:32 ` Ryan Roberts 2023-12-12 16:27 ` Andrew Morton 2023-12-12 16:27 ` Andrew Morton 2023-12-07 16:12 ` [PATCH v9 04/10] mm: thp: Support allocation of anonymous multi-size THP Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-12 15:02 ` David Hildenbrand [this message] 2023-12-12 15:02 ` David Hildenbrand 2023-12-12 15:38 ` Ryan Roberts 2023-12-12 15:38 ` Ryan Roberts 2023-12-12 16:35 ` David Hildenbrand 2023-12-12 16:35 ` David Hildenbrand 2023-12-13 7:21 ` Dan Carpenter 2023-12-13 7:21 ` Dan Carpenter 2023-12-14 10:54 ` Ryan Roberts 2023-12-14 10:54 ` Ryan Roberts 2023-12-14 11:30 ` Dan Carpenter 2023-12-14 11:30 ` Dan Carpenter 2023-12-14 12:12 ` Ryan Roberts 2023-12-14 12:12 ` Ryan Roberts 2023-12-14 16:02 ` [PATCH] mm: Resolve some multi-size THP review nits Ryan Roberts 2023-12-14 16:02 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 05/10] selftests/mm/kugepaged: Restore thp settings at exit Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 06/10] selftests/mm: Factor out thp settings management Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 07/10] selftests/mm: Support multi-size THP interface in thp_settings Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 08/10] selftests/mm/khugepaged: Enlighten for multi-size THP Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 16:12 ` [PATCH v9 09/10] selftests/mm/cow: Generalize do_run_with_thp() helper Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2024-01-03 6:21 ` Itaru Kitayama 2024-01-03 6:21 ` Itaru Kitayama 2024-01-03 8:33 ` Ryan Roberts 2024-01-03 8:33 ` Ryan Roberts 2024-01-04 0:09 ` Itaru Kitayama 2024-01-04 0:09 ` Itaru Kitayama 2023-12-07 16:12 ` [PATCH v9 10/10] selftests/mm/cow: Add tests for anonymous multi-size THP Ryan Roberts 2023-12-07 16:12 ` Ryan Roberts 2023-12-07 22:05 ` [PATCH v9 00/10] Multi-size THP for anonymous memory Andrew Morton 2023-12-07 22:05 ` Andrew Morton 2023-12-11 11:51 ` Ryan Roberts 2023-12-11 11:51 ` Ryan Roberts
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2bebcf33-e8b7-468d-86cc-31d6eb355b66@redhat.com \ --to=david@redhat.com \ --cc=21cnbao@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=anshuman.khandual@arm.com \ --cc=apopple@nvidia.com \ --cc=catalin.marinas@arm.com \ --cc=fengwei.yin@intel.com \ --cc=hughd@google.com \ --cc=itaru.kitayama@gmail.com \ --cc=jhubbard@nvidia.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mcgrof@kernel.org \ --cc=rientjes@google.com \ --cc=ryan.roberts@arm.com \ --cc=shy828301@gmail.com \ --cc=vbabka@suse.cz \ --cc=wangkefeng.wang@huawei.com \ --cc=willy@infradead.org \ --cc=ying.huang@intel.com \ --cc=yuzhao@google.com \ --cc=ziy@nvidia.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.