From: Joao Martins <joao.m.martins@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux MM <linux-mm@kvack.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Matthew Wilcox <willy@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Jane Chu <jane.chu@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Linux Doc Mailing List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps
Date: Wed, 28 Jul 2021 16:35:44 +0100 [thread overview]
Message-ID: <131e77ec-6de4-8401-e7b0-7ff12abac04c@oracle.com> (raw)
In-Reply-To: <CAPcyv4jPWSeP3jOKiEy0ko4Yy5SgAFmuD64ABgv=cRxHaQM7ew@mail.gmail.com>
On 7/28/21 7:55 AM, Dan Williams wrote:
> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> A compound pagemap is a dev_pagemap with @align > PAGE_SIZE and it
>
> Maybe s/compound devmap/compound devmap/ per the other planned usage
> of "devmap" in the implementation?
>
Yeap. I am replacing pagemap with devmap -- hopefully better done than
the s/align/geometry which there's still some leftovers in this series.
>> means that pages are mapped at a given huge page alignment and utilize
>> uses compound pages as opposed to order-0 pages.
>>
>> Take advantage of the fact that most tail pages look the same (except
>> the first two) to minimize struct page overhead. Allocate a separate
>> page for the vmemmap area which contains the head page and separate for
>> the next 64 pages. The rest of the subsections then reuse this tail
>> vmemmap page to initialize the rest of the tail pages.
>>
>> Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and
>> when initializing compound pagemap with big enough @align (e.g. 1G
>
> s/@align/@geometry/?
>
Yeap (and the previous mention too in the hunk before this one).
>> PUD) it will cross various sections.
>
> s/will cross various/may cross multiple/
>
OK
>> To be able to reuse tail pages
>> across sections belonging to the same gigantic page, fetch the
>> @range being mapped (nr_ranges + 1). If the section being mapped is
>> not offset 0 of the @align, then lookup the PFN of the struct page
>> address that precedes it and use that to populate the entire
>> section.
>
> This sounds like code being read aloud. I would just say something like:
>
> "The vmemmap code needs to consult @pgmap so that multiple sections
> that all map the same tail data can refer back to the first copy of
> that data for a given gigantic page."
>
Fixed.
>>
>> On compound pagemaps with 2M align, this mechanism lets 6 pages be
>> saved out of the 8 necessary PFNs necessary to set the subsection's
>> 512 struct pages being mapped. On a 1G compound pagemap it saves
>> 4094 pages.
>>
>> Altmap isn't supported yet, given various restrictions in altmap pfn
>> allocator, thus fallback to the already in use vmemmap_populate(). It
>> is worth noting that altmap for devmap mappings was there to relieve the
>> pressure of inordinate amounts of memmap space to map terabytes of pmem.
>> With compound pages the motivation for altmaps for pmem gets reduced.
>
> Looks good just some minor comments / typo fixes, and some requests
> for a few more helper functions.
>
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Documentation/vm/vmemmap_dedup.rst | 27 +++++-
>> include/linux/mm.h | 2 +-
>> mm/memremap.c | 1 +
>> mm/sparse-vmemmap.c | 133 +++++++++++++++++++++++++++--
>> 4 files changed, 151 insertions(+), 12 deletions(-)
>>
>> diff --git a/Documentation/vm/vmemmap_dedup.rst b/Documentation/vm/vmemmap_dedup.rst
>> index 215ae2ef3bce..42830a667c2a 100644
>> --- a/Documentation/vm/vmemmap_dedup.rst
>> +++ b/Documentation/vm/vmemmap_dedup.rst
>> @@ -2,9 +2,12 @@
>>
>> .. _vmemmap_dedup:
>>
>> -==================================
>> -Free some vmemmap pages of HugeTLB
>> -==================================
>> +=================================================
>> +Free some vmemmap pages of HugeTLB and Device DAX
>
> How about "A vmemmap diet for HugeTLB and Device DAX"
>
> ...because in the HugeTLB case it is dynamically remapping and freeing
> the pages after the fact, while Device-DAX is avoiding the allocation
> in the first instance.
>
Yeap. Better title indeed, fixed.
[...]
>> +static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
>> + unsigned long start,
>> + unsigned long end, int node,
>> + struct dev_pagemap *pgmap)
>> +{
>> + unsigned long offset, size, addr;
>> +
>> + /*
>> + * For compound pages bigger than section size (e.g. x86 1G compound
>> + * pages with 2M subsection size) fill the rest of sections as tail
>> + * pages.
>> + *
>> + * Note that memremap_pages() resets @nr_range value and will increment
>> + * it after each range successful onlining. Thus the value or @nr_range
>> + * at section memmap populate corresponds to the in-progress range
>> + * being onlined here.
>> + */
>> + offset = PFN_PHYS(start_pfn) - pgmap->ranges[pgmap->nr_range].start;
>> + if (!IS_ALIGNED(offset, pgmap_geometry(pgmap)) &&
>> + pgmap_geometry(pgmap) > SUBSECTION_SIZE) {
>
> How about moving the last 3 lines plus the comment to a helper so this
> becomes something like:
>
> if (compound_section_index(start_pfn, pgmap))
>
> ...where it is clear that for the Nth section in a compound page where
> N is > 0, it can lookup the page data to reuse.
>
Definitely more readable.
Here's what I have so far (already with the change
of pgmap_geometry() to be nr of pages):
+/*
+ * For compound pages bigger than section size (e.g. x86 1G compound
+ * pages with 2M subsection size) fill the rest of sections as tail
+ * pages.
+ *
+ * Note that memremap_pages() resets @nr_range value and will increment
+ * it after each range successful onlining. Thus the value or @nr_range
+ * at section memmap populate corresponds to the in-progress range
+ * being onlined here.
+ */
+static bool compound_section_index(unsigned long start_pfn,
+ struct dev_pagemap *pgmap)
+{
+ unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
+ unsigned long offset = PFN_PHYS(start_pfn) -
+ pgmap->ranges[pgmap->nr_range].start;
+
+ return !IS_ALIGNED(offset, geometry_size) &&
+ geometry_size > SUBSECTION_SIZE;
+}
+
static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
unsigned long start,
unsigned long end, int node,
struct dev_pagemap *pgmap)
{
- unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
unsigned long offset, size, addr;
- /*
- * For compound pages bigger than section size (e.g. x86 1G compound
- * pages with 2M subsection size) fill the rest of sections as tail
- * pages.
- *
- * Note that memremap_pages() resets @nr_range value and will increment
- * it after each range successful onlining. Thus the value or @nr_range
- * at section memmap populate corresponds to the in-progress range
- * being onlined here.
- */
- offset = PFN_PHYS(start_pfn) - pgmap->ranges[pgmap->nr_range].start;
- if (!IS_ALIGNED(offset, geometry_size) &&
- geometry_size > SUBSECTION_SIZE) {
+ if (compound_section_index(start_pfn, pgmap)) {
pte_t *ptep;
addr = start - PAGE_SIZE;
>
>> + pte_t *ptep;
>> +
>> + addr = start - PAGE_SIZE;
>> +
>> + /*
>> + * Sections are populated sequently and in sucession meaning
>> + * this section being populated wouldn't start if the
>> + * preceding one wasn't successful. So there is a guarantee that
>> + * the previous struct pages are mapped when trying to lookup
>> + * the last tail page.
>
> I think you can cut this down to:
>
> "Assuming sections are populated sequentially, the previous section's
> page data can be reused."
>
OK.
> ...and maybe this can be a helper like:
>
> compound_section_tail_page()?
>
It makes this patch more readable.
Albeit doing this means we might need a compound_section_tail_huge_page (...)
>
>> + * the last tail page.
>
>> + ptep = pte_offset_kernel(pmd_off_k(addr), addr);
>> + if (!ptep)
>> + return -ENOMEM;
>> +
>> + /*
>> + * Reuse the page that was populated in the prior iteration
>> + * with just tail struct pages.
>> + */
>> + return vmemmap_populate_range(start, end, node,
>> + pte_page(*ptep));
>> + }
The last patch separates the above check and uses the PMD (and the @offset) to reuse the
PMD of the compound_section_tail_page(). So this might mean that we introduce
in the last patch some sort of compound_section_tail_huge_page() for the pmd page.
So far it the second change doesn't seem to translate an obvious improvement in readability.
Pasted below, Here's compound_section_tail_page() [...]
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d7419b5d54d7..31f94802c095 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -673,6 +673,23 @@ static bool __meminit compound_section_index(unsigned long start_pfn,
geometry_size > SUBSECTION_SIZE;
}
+static struct page * __meminit compound_section_tail_page(unsigned long addr)
+{
+ pte_t *ptep;
+
+ addr -= PAGE_SIZE;
+
+ /*
+ * Assuming sections are populated sequentially, the previous section's
+ * page data can be reused.
+ */
+ ptep = pte_offset_kernel(pmd_off_k(addr), addr);
+ if (!ptep)
+ return NULL;
+
+ return pte_page(*ptep);
+}
+
static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
unsigned long start,
unsigned long end, int node,
@@ -681,27 +698,17 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long
start_pfn,
unsigned long offset, size, addr;
if (compound_section_index(start_pfn, pgmap)) {
- pte_t *ptep;
-
- addr = start - PAGE_SIZE;
+ struct page *page;
- /*
- * Sections are populated sequently and in sucession meaning
- * this section being populated wouldn't start if the
- * preceding one wasn't successful. So there is a guarantee that
- * the previous struct pages are mapped when trying to lookup
- * the last tail page.
- */
- ptep = pte_offset_kernel(pmd_off_k(addr), addr);
- if (!ptep)
+ page = compound_section_tail_page(start);
+ if (!page)
return -ENOMEM;
/*
* Reuse the page that was populated in the prior iteration
* with just tail struct pages.
*/
- return vmemmap_populate_range(start, end, node,
- pte_page(*ptep));
+ return vmemmap_populate_range(start, end, node, page);
}
size = min(end - start, pgmap_geometry(pgmap) * sizeof(struct page));
[...] And here's compound_section_tail_huge_page() (for the last patch in the series):
@@ -690,6 +727,33 @@ static struct page * __meminit compound_section_tail_page(unsigned
long addr)
return pte_page(*ptep);
}
+static struct page * __meminit compound_section_tail_huge_page(unsigned long addr,
+ unsigned long offset, struct dev_pagemap *pgmap)
+{
+ unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
+ pmd_t *pmdp;
+
+ addr -= PAGE_SIZE;
+
+ /*
+ * Assuming sections are populated sequentially, the previous section's
+ * page data can be reused.
+ */
+ pmdp = pmd_off_k(addr);
+ if (!pmdp)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Reuse the tail pages vmemmap pmd page
+ * See layout diagram in Documentation/vm/vmemmap_dedup.rst
+ */
+ if (offset % geometry_size > PFN_PHYS(PAGES_PER_SECTION))
+ return pmd_page(*pmdp);
+
+ /* No reusable PMD fallback to PTE tail page*/
+ return NULL;
+}
+
static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
unsigned long start,
unsigned long end, int node,
@@ -697,14 +761,22 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long
start_pfn,
{
unsigned long offset, size, addr;
- if (compound_section_index(start_pfn, pgmap)) {
- struct page *page;
+ if (compound_section_index(start_pfn, pgmap, &offset)) {
+ struct page *page, *hpage;
+
+ hpage = compound_section_tail_huge_page(addr, offset);
+ if (IS_ERR(hpage))
+ return -ENOMEM;
+ else if (hpage)
+ return vmemmap_populate_pmd_range(start, end, node,
+ hpage);
page = compound_section_tail_page(start);
if (!page)
return -ENOMEM;
/*
+ * Populate the tail pages vmemmap pmd page.
* Reuse the page that was populated in the prior iteration
* with just tail struct pages.
*/
next prev parent reply other threads:[~2021-07-28 15:36 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 19:35 [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-07-14 19:35 ` [PATCH v3 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-07-15 0:17 ` Dan Williams
2021-07-15 2:51 ` [External] " Muchun Song
2021-07-15 6:40 ` Christoph Hellwig
2021-07-15 9:19 ` Muchun Song
2021-07-15 13:17 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-07-15 0:19 ` Dan Williams
2021-07-15 2:53 ` [External] " Muchun Song
2021-07-15 13:17 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-07-15 0:20 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-07-15 1:08 ` Dan Williams
2021-07-15 12:52 ` Joao Martins
2021-07-15 13:06 ` Joao Martins
2021-07-15 19:48 ` Dan Williams
2021-07-30 16:13 ` Joao Martins
2021-07-22 0:38 ` Jane Chu
2021-07-22 10:56 ` Joao Martins
2021-07-15 12:59 ` Christoph Hellwig
2021-07-15 13:15 ` Joao Martins
2021-07-15 6:48 ` Christoph Hellwig
2021-07-15 13:15 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 05/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-07-28 5:56 ` Dan Williams
2021-07-28 9:43 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 06/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-07-28 6:04 ` Dan Williams
2021-07-28 10:48 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 07/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-07-15 2:47 ` [External] " Muchun Song
2021-07-15 13:16 ` Joao Martins
2021-07-28 6:09 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-07-28 6:55 ` Dan Williams
2021-07-28 15:35 ` Joao Martins [this message]
2021-07-28 18:03 ` Dan Williams
2021-07-28 18:54 ` Joao Martins
2021-07-28 20:04 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 09/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-07-28 7:28 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-07-28 16:08 ` Dan Williams
2021-07-28 16:12 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 10/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-07-28 7:29 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 11/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-07-28 7:30 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-08-06 12:28 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 12/14] device-dax: compound pagemap support Joao Martins
2021-07-14 23:36 ` Dan Williams
2021-07-15 12:00 ` Joao Martins
2021-07-27 23:51 ` Dan Williams
2021-07-28 9:36 ` Joao Martins
2021-07-28 18:51 ` Dan Williams
2021-07-28 18:59 ` Joao Martins
2021-07-28 19:03 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 13/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-07-28 19:55 ` Dan Williams
2021-07-28 20:07 ` Joao Martins
2021-07-28 20:23 ` Dan Williams
2021-08-25 19:10 ` Joao Martins
2021-08-25 19:15 ` Matthew Wilcox
2021-08-25 19:26 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins
2021-07-28 20:03 ` Dan Williams
2021-07-28 20:08 ` Joao Martins
2021-07-14 21:48 ` [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Andrew Morton
2021-07-14 23:47 ` Dan Williams
2021-07-22 2:24 ` Matthew Wilcox
2021-07-22 10:53 ` Joao Martins
2021-07-27 23:23 ` Dan Williams
2021-08-02 10:40 ` Joao Martins
2021-08-02 14:06 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=131e77ec-6de4-8401-e7b0-7ff12abac04c@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=jane.chu@oracle.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=nvdimm@lists.linux.dev \
--cc=songmuchun@bytedance.com \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).