nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>,
	Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps
Date: Wed, 28 Jul 2021 16:35:44 +0100	[thread overview]
Message-ID: <131e77ec-6de4-8401-e7b0-7ff12abac04c@oracle.com> (raw)
In-Reply-To: <CAPcyv4jPWSeP3jOKiEy0ko4Yy5SgAFmuD64ABgv=cRxHaQM7ew@mail.gmail.com>


On 7/28/21 7:55 AM, Dan Williams wrote:
> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> A compound pagemap is a dev_pagemap with @align > PAGE_SIZE and it
> 
> Maybe s/compound devmap/compound devmap/ per the other planned usage
> of "devmap" in the implementation?
> 
Yeap. I am replacing pagemap with devmap -- hopefully better done than
the s/align/geometry which there's still some leftovers in this series.

>> means that pages are mapped at a given huge page alignment and utilize
>> uses compound pages as opposed to order-0 pages.
>>
>> Take advantage of the fact that most tail pages look the same (except
>> the first two) to minimize struct page overhead. Allocate a separate
>> page for the vmemmap area which contains the head page and separate for
>> the next 64 pages. The rest of the subsections then reuse this tail
>> vmemmap page to initialize the rest of the tail pages.
>>
>> Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and
>> when initializing compound pagemap with big enough @align (e.g. 1G
> 
> s/@align/@geometry/?
> 
Yeap (and the previous mention too in the hunk before this one).

>> PUD) it will cross various sections.
> 
> s/will cross various/may cross multiple/
> 
OK

>> To be able to reuse tail pages
>> across sections belonging to the same gigantic page, fetch the
>> @range being mapped (nr_ranges + 1).  If the section being mapped is
>> not offset 0 of the @align, then lookup the PFN of the struct page
>> address that precedes it and use that to populate the entire
>> section.
> 
> This sounds like code being read aloud. I would just say something like:
> 
> "The vmemmap code needs to consult @pgmap so that multiple sections
> that all map the same tail data can refer back to the first copy of
> that data for a given gigantic page."
> 
Fixed.

>>
>> On compound pagemaps with 2M align, this mechanism lets 6 pages be
>> saved out of the 8 necessary PFNs necessary to set the subsection's
>> 512 struct pages being mapped. On a 1G compound pagemap it saves
>> 4094 pages.
>>
>> Altmap isn't supported yet, given various restrictions in altmap pfn
>> allocator, thus fallback to the already in use vmemmap_populate().  It
>> is worth noting that altmap for devmap mappings was there to relieve the
>> pressure of inordinate amounts of memmap space to map terabytes of pmem.
>> With compound pages the motivation for altmaps for pmem gets reduced.
> 
> Looks good just some minor comments / typo fixes, and some requests
> for a few more helper functions.
> 
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  Documentation/vm/vmemmap_dedup.rst |  27 +++++-
>>  include/linux/mm.h                 |   2 +-
>>  mm/memremap.c                      |   1 +
>>  mm/sparse-vmemmap.c                | 133 +++++++++++++++++++++++++++--
>>  4 files changed, 151 insertions(+), 12 deletions(-)
>>
>> diff --git a/Documentation/vm/vmemmap_dedup.rst b/Documentation/vm/vmemmap_dedup.rst
>> index 215ae2ef3bce..42830a667c2a 100644
>> --- a/Documentation/vm/vmemmap_dedup.rst
>> +++ b/Documentation/vm/vmemmap_dedup.rst
>> @@ -2,9 +2,12 @@
>>
>>  .. _vmemmap_dedup:
>>
>> -==================================
>> -Free some vmemmap pages of HugeTLB
>> -==================================
>> +=================================================
>> +Free some vmemmap pages of HugeTLB and Device DAX
> 
> How about "A vmemmap diet for HugeTLB and Device DAX"
> 
> ...because in the HugeTLB case it is dynamically remapping and freeing
> the pages after the fact, while Device-DAX is avoiding the allocation
> in the first instance.
> 
Yeap. Better title indeed, fixed.

[...]

>> +static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
>> +                                                    unsigned long start,
>> +                                                    unsigned long end, int node,
>> +                                                    struct dev_pagemap *pgmap)
>> +{
>> +       unsigned long offset, size, addr;
>> +
>> +       /*
>> +        * For compound pages bigger than section size (e.g. x86 1G compound
>> +        * pages with 2M subsection size) fill the rest of sections as tail
>> +        * pages.
>> +        *
>> +        * Note that memremap_pages() resets @nr_range value and will increment
>> +        * it after each range successful onlining. Thus the value or @nr_range
>> +        * at section memmap populate corresponds to the in-progress range
>> +        * being onlined here.
>> +        */
>> +       offset = PFN_PHYS(start_pfn) - pgmap->ranges[pgmap->nr_range].start;
>> +       if (!IS_ALIGNED(offset, pgmap_geometry(pgmap)) &&
>> +           pgmap_geometry(pgmap) > SUBSECTION_SIZE) {
> 
> How about moving the last 3 lines plus the comment to a helper so this
> becomes something like:
> 
> if (compound_section_index(start_pfn, pgmap))
> 
> ...where it is clear that for the Nth section in a compound page where
> N is > 0, it can lookup the page data to reuse.
> 
Definitely more readable.

Here's what I have so far (already with the change
of pgmap_geometry() to be nr of pages):

+/*
+ * For compound pages bigger than section size (e.g. x86 1G compound
+ * pages with 2M subsection size) fill the rest of sections as tail
+ * pages.
+ *
+ * Note that memremap_pages() resets @nr_range value and will increment
+ * it after each range successful onlining. Thus the value or @nr_range
+ * at section memmap populate corresponds to the in-progress range
+ * being onlined here.
+ */
+static bool compound_section_index(unsigned long start_pfn,
+                                  struct dev_pagemap *pgmap)
+{
+       unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
+       unsigned long offset = PFN_PHYS(start_pfn) -
+               pgmap->ranges[pgmap->nr_range].start;
+
+       return !IS_ALIGNED(offset, geometry_size) &&
+               geometry_size > SUBSECTION_SIZE;
+}
+
 static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
                                                     unsigned long start,
                                                     unsigned long end, int node,
                                                     struct dev_pagemap *pgmap)
 {
-       unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
        unsigned long offset, size, addr;

-       /*
-        * For compound pages bigger than section size (e.g. x86 1G compound
-        * pages with 2M subsection size) fill the rest of sections as tail
-        * pages.
-        *
-        * Note that memremap_pages() resets @nr_range value and will increment
-        * it after each range successful onlining. Thus the value or @nr_range
-        * at section memmap populate corresponds to the in-progress range
-        * being onlined here.
-        */
-       offset = PFN_PHYS(start_pfn) - pgmap->ranges[pgmap->nr_range].start;
-       if (!IS_ALIGNED(offset, geometry_size) &&
-           geometry_size > SUBSECTION_SIZE) {
+       if (compound_section_index(start_pfn, pgmap)) {
                pte_t *ptep;

                addr = start - PAGE_SIZE;


> 
>> +               pte_t *ptep;
>> +
>> +               addr = start - PAGE_SIZE;
>> +
>> +               /*
>> +                * Sections are populated sequently and in sucession meaning
>> +                * this section being populated wouldn't start if the
>> +                * preceding one wasn't successful. So there is a guarantee that
>> +                * the previous struct pages are mapped when trying to lookup
>> +                * the last tail page.
> 
> I think you can cut this down to:
> 
> "Assuming sections are populated sequentially, the previous section's
> page data can be reused."
> 
OK.

> ...and maybe this can be a helper like:
> 
> compound_section_tail_page()?
> 
It makes this patch more readable.

Albeit doing this means we might need a compound_section_tail_huge_page (...)

> 
>> +                * the last tail page.
> 
>> +               ptep = pte_offset_kernel(pmd_off_k(addr), addr);
>> +               if (!ptep)
>> +                       return -ENOMEM;
>> +
>> +               /*
>> +                * Reuse the page that was populated in the prior iteration
>> +                * with just tail struct pages.
>> +                */
>> +               return vmemmap_populate_range(start, end, node,
>> +                                             pte_page(*ptep));
>> +       }

The last patch separates the above check and uses the PMD (and the @offset) to reuse the
PMD of the compound_section_tail_page(). So this might mean that we introduce
in the last patch some sort of compound_section_tail_huge_page() for the pmd page.
So far it the second change doesn't seem to translate an obvious improvement in readability.

Pasted below, Here's compound_section_tail_page() [...]

diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d7419b5d54d7..31f94802c095 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -673,6 +673,23 @@ static bool __meminit compound_section_index(unsigned long start_pfn,
                geometry_size > SUBSECTION_SIZE;
 }

+static struct page * __meminit compound_section_tail_page(unsigned long addr)
+{
+       pte_t *ptep;
+
+       addr -= PAGE_SIZE;
+
+       /*
+        * Assuming sections are populated sequentially, the previous section's
+        * page data can be reused.
+        */
+       ptep = pte_offset_kernel(pmd_off_k(addr), addr);
+       if (!ptep)
+               return NULL;
+
+       return pte_page(*ptep);
+}
+
 static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
                                                     unsigned long start,
                                                     unsigned long end, int node,
@@ -681,27 +698,17 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long
start_pfn,
        unsigned long offset, size, addr;

        if (compound_section_index(start_pfn, pgmap)) {
-               pte_t *ptep;
-
-               addr = start - PAGE_SIZE;
+               struct page *page;

-               /*
-                * Sections are populated sequently and in sucession meaning
-                * this section being populated wouldn't start if the
-                * preceding one wasn't successful. So there is a guarantee that
-                * the previous struct pages are mapped when trying to lookup
-                * the last tail page.
-                */
-               ptep = pte_offset_kernel(pmd_off_k(addr), addr);
-               if (!ptep)
+               page = compound_section_tail_page(start);
+               if (!page)
                        return -ENOMEM;

                /*
                 * Reuse the page that was populated in the prior iteration
                 * with just tail struct pages.
                 */
-               return vmemmap_populate_range(start, end, node,
-                                             pte_page(*ptep));
+               return vmemmap_populate_range(start, end, node, page);
        }

        size = min(end - start, pgmap_geometry(pgmap) * sizeof(struct page));




[...] And here's compound_section_tail_huge_page() (for the last patch in the series):


@@ -690,6 +727,33 @@ static struct page * __meminit compound_section_tail_page(unsigned
long addr)
        return pte_page(*ptep);
 }

+static struct page * __meminit compound_section_tail_huge_page(unsigned long addr,
+                               unsigned long offset, struct dev_pagemap *pgmap)
+{
+       unsigned long geometry_size = pgmap_geometry(pgmap) << PAGE_SHIFT;
+       pmd_t *pmdp;
+
+       addr -= PAGE_SIZE;
+
+       /*
+        * Assuming sections are populated sequentially, the previous section's
+        * page data can be reused.
+        */
+       pmdp = pmd_off_k(addr);
+       if (!pmdp)
+               return ERR_PTR(-ENOMEM);
+
+       /*
+        * Reuse the tail pages vmemmap pmd page
+        * See layout diagram in Documentation/vm/vmemmap_dedup.rst
+        */
+       if (offset % geometry_size > PFN_PHYS(PAGES_PER_SECTION))
+               return pmd_page(*pmdp);
+
+       /* No reusable PMD fallback to PTE tail page*/
+       return NULL;
+}
+
 static int __meminit vmemmap_populate_compound_pages(unsigned long start_pfn,
                                                     unsigned long start,
                                                     unsigned long end, int node,
@@ -697,14 +761,22 @@ static int __meminit vmemmap_populate_compound_pages(unsigned long
start_pfn,
 {
        unsigned long offset, size, addr;

-       if (compound_section_index(start_pfn, pgmap)) {
-               struct page *page;
+       if (compound_section_index(start_pfn, pgmap, &offset)) {
+               struct page *page, *hpage;
+
+               hpage = compound_section_tail_huge_page(addr, offset);
+               if (IS_ERR(hpage))
+                       return -ENOMEM;
+               else if (hpage)
+                       return vmemmap_populate_pmd_range(start, end, node,
+                                                         hpage);

                page = compound_section_tail_page(start);
                if (!page)
                        return -ENOMEM;

                /*
+                * Populate the tail pages vmemmap pmd page.
                 * Reuse the page that was populated in the prior iteration
                 * with just tail struct pages.
                 */

  reply	other threads:[~2021-07-28 15:36 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 19:35 [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-07-14 19:35 ` [PATCH v3 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-07-15  0:17   ` Dan Williams
2021-07-15  2:51   ` [External] " Muchun Song
2021-07-15  6:40     ` Christoph Hellwig
2021-07-15  9:19       ` Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-07-15  0:19   ` Dan Williams
2021-07-15  2:53   ` [External] " Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-07-15  0:20   ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-07-15  1:08   ` Dan Williams
2021-07-15 12:52     ` Joao Martins
2021-07-15 13:06       ` Joao Martins
2021-07-15 19:48       ` Dan Williams
2021-07-30 16:13         ` Joao Martins
2021-07-22  0:38       ` Jane Chu
2021-07-22 10:56         ` Joao Martins
2021-07-15 12:59     ` Christoph Hellwig
2021-07-15 13:15       ` Joao Martins
2021-07-15  6:48   ` Christoph Hellwig
2021-07-15 13:15     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 05/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-07-28  5:56   ` Dan Williams
2021-07-28  9:43     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 06/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-07-28  6:04   ` Dan Williams
2021-07-28 10:48     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 07/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-07-15  2:47   ` [External] " Muchun Song
2021-07-15 13:16     ` Joao Martins
2021-07-28  6:09   ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-07-28  6:55   ` Dan Williams
2021-07-28 15:35     ` Joao Martins [this message]
2021-07-28 18:03       ` Dan Williams
2021-07-28 18:54         ` Joao Martins
2021-07-28 20:04           ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 09/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-07-28  7:28   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-28 16:08       ` Dan Williams
2021-07-28 16:12         ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 10/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-07-28  7:29   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 11/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-07-28  7:30   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-08-06 12:28       ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 12/14] device-dax: compound pagemap support Joao Martins
2021-07-14 23:36   ` Dan Williams
2021-07-15 12:00     ` Joao Martins
2021-07-27 23:51       ` Dan Williams
2021-07-28  9:36         ` Joao Martins
2021-07-28 18:51           ` Dan Williams
2021-07-28 18:59             ` Joao Martins
2021-07-28 19:03               ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 13/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-07-28 19:55   ` Dan Williams
2021-07-28 20:07     ` Joao Martins
2021-07-28 20:23       ` Dan Williams
2021-08-25 19:10         ` Joao Martins
2021-08-25 19:15           ` Matthew Wilcox
2021-08-25 19:26             ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins
2021-07-28 20:03   ` Dan Williams
2021-07-28 20:08     ` Joao Martins
2021-07-14 21:48 ` [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Andrew Morton
2021-07-14 23:47   ` Dan Williams
2021-07-22  2:24   ` Matthew Wilcox
2021-07-22 10:53     ` Joao Martins
2021-07-27 23:23       ` Dan Williams
2021-08-02 10:40         ` Joao Martins
2021-08-02 14:06           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=131e77ec-6de4-8401-e7b0-7ff12abac04c@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).