nvdimm.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Vishal Verma <vishal.l.verma@intel.com>,
	 Dave Jiang <dave.jiang@intel.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Matthew Wilcox <willy@infradead.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	John Hubbard <jhubbard@nvidia.com>,
	 Jane Chu <jane.chu@oracle.com>,
	Muchun Song <songmuchun@bytedance.com>,
	 Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Jonathan Corbet <corbet@lwn.net>,
	Linux NVDIMM <nvdimm@lists.linux.dev>,
	 Linux Doc Mailing List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v3 12/14] device-dax: compound pagemap support
Date: Tue, 27 Jul 2021 16:51:43 -0700	[thread overview]
Message-ID: <CAPcyv4hRQhG+0ika-wbxSFYrpmMJHxxX456qE64PMxDoxS+Fwg@mail.gmail.com> (raw)
In-Reply-To: <f7217b61-c845-eaed-501e-c9e7067a6b87@oracle.com>

On Thu, Jul 15, 2021 at 5:01 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> On 7/15/21 12:36 AM, Dan Williams wrote:
> > On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@oracle.com> wrote:
> >>
> >> Use the newly added compound pagemap facility which maps the assigned dax
> >> ranges as compound pages at a page size of @align. Currently, this means,
> >> that region/namespace bootstrap would take considerably less, given that
> >> you would initialize considerably less pages.
> >>
> >> On setups with 128G NVDIMMs the initialization with DRAM stored struct
> >> pages improves from ~268-358 ms to ~78-100 ms with 2M pages, and to less
> >> than a 1msec with 1G pages.
> >>
> >> dax devices are created with a fixed @align (huge page size) which is
> >> enforced through as well at mmap() of the device. Faults, consequently
> >> happen too at the specified @align specified at the creation, and those
> >> don't change through out dax device lifetime. MCEs poisons a whole dax
> >> huge page, as well as splits occurring at the configured page size.
> >>
> >
> > Hi Joao,
> >
> > With this patch I'm hitting the following with the 'device-dax' test [1].
> >
> Ugh, I can reproduce it too -- apologies for the oversight.

No worries.

>
> This patch is not the culprit, the flaw is early in the series, specifically the fourth patch.
>
> It needs this chunk below change on the fourth patch due to the existing elevated page ref
> count at zone device memmap init. put_page() called here in memunmap_pages():
>
> for (i = 0; i < pgmap->nr_ranges; i++)
>         for_each_device_pfn(pfn, pgmap, i)
>                 put_page(pfn_to_page(pfn));
>
> ... on a zone_device compound memmap would otherwise always decrease head page refcount by
> @geometry pfn amount (leading to the aforementioned splat you reported).
>
> diff --git a/mm/memremap.c b/mm/memremap.c
> index b0e7b8cf3047..79a883af788e 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -102,15 +102,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
>         return (range->start + range_len(range)) >> PAGE_SHIFT;
>  }
>
> -static unsigned long pfn_next(unsigned long pfn)
> +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn)
>  {
>         if (pfn % 1024 == 0)
>                 cond_resched();
> -       return pfn + 1;
> +       return pfn + pgmap_pfn_geometry(pgmap);

The cond_resched() would need to be fixed up too to something like:

if (pfn % (1024 << pgmap_geometry_order(pgmap)))
    cond_resched();

...because the goal is to take a break every 1024 iterations, not
every 1024 pfns.

>  }
>
>  #define for_each_device_pfn(pfn, map, i) \
> -       for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
> +       for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
>
>  static void dev_pagemap_kill(struct dev_pagemap *pgmap)
>  {
>
> It could also get this hunk below, but it is sort of redundant provided we won't touch
> tail page refcount through out the devmap pages lifetime. This setting of tail pages
> refcount to zero was in pre-v5.14 series, but it got removed under the assumption it comes
> from the page allocator (where tail pages are already zeroed in refcount).

Wait, devmap pages never see the page allocator?

>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 96975edac0a8..469a7aa5cf38 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6623,6 +6623,7 @@ static void __ref memmap_init_compound(struct page *page, unsigned
> long pfn,
>                 __init_zone_device_page(page + i, pfn + i, zone_idx,
>                                         nid, pgmap);
>                 prep_compound_tail(page, i);
> +               set_page_count(page + i, 0);

Looks good to me and perhaps a for elevated tail page refcount at
teardown as a sanity check that the tail pages was never pinned
directly?

>
>                 /*
>                  * The first and second tail pages need to

  reply	other threads:[~2021-07-27 23:51 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-14 19:35 [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-07-14 19:35 ` [PATCH v3 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-07-15  0:17   ` Dan Williams
2021-07-15  2:51   ` [External] " Muchun Song
2021-07-15  6:40     ` Christoph Hellwig
2021-07-15  9:19       ` Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-07-15  0:19   ` Dan Williams
2021-07-15  2:53   ` [External] " Muchun Song
2021-07-15 13:17     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-07-15  0:20   ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-07-15  1:08   ` Dan Williams
2021-07-15 12:52     ` Joao Martins
2021-07-15 13:06       ` Joao Martins
2021-07-15 19:48       ` Dan Williams
2021-07-30 16:13         ` Joao Martins
2021-07-22  0:38       ` Jane Chu
2021-07-22 10:56         ` Joao Martins
2021-07-15 12:59     ` Christoph Hellwig
2021-07-15 13:15       ` Joao Martins
2021-07-15  6:48   ` Christoph Hellwig
2021-07-15 13:15     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 05/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-07-28  5:56   ` Dan Williams
2021-07-28  9:43     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 06/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-07-28  6:04   ` Dan Williams
2021-07-28 10:48     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 07/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-07-15  2:47   ` [External] " Muchun Song
2021-07-15 13:16     ` Joao Martins
2021-07-28  6:09   ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-07-28  6:55   ` Dan Williams
2021-07-28 15:35     ` Joao Martins
2021-07-28 18:03       ` Dan Williams
2021-07-28 18:54         ` Joao Martins
2021-07-28 20:04           ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 09/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-07-28  7:28   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-28 16:08       ` Dan Williams
2021-07-28 16:12         ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 10/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-07-28  7:29   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 11/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-07-28  7:30   ` Dan Williams
2021-07-28 15:56     ` Joao Martins
2021-08-06 12:28       ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 12/14] device-dax: compound pagemap support Joao Martins
2021-07-14 23:36   ` Dan Williams
2021-07-15 12:00     ` Joao Martins
2021-07-27 23:51       ` Dan Williams [this message]
2021-07-28  9:36         ` Joao Martins
2021-07-28 18:51           ` Dan Williams
2021-07-28 18:59             ` Joao Martins
2021-07-28 19:03               ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 13/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-07-28 19:55   ` Dan Williams
2021-07-28 20:07     ` Joao Martins
2021-07-28 20:23       ` Dan Williams
2021-08-25 19:10         ` Joao Martins
2021-08-25 19:15           ` Matthew Wilcox
2021-08-25 19:26             ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins
2021-07-28 20:03   ` Dan Williams
2021-07-28 20:08     ` Joao Martins
2021-07-14 21:48 ` [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Andrew Morton
2021-07-14 23:47   ` Dan Williams
2021-07-22  2:24   ` Matthew Wilcox
2021-07-22 10:53     ` Joao Martins
2021-07-27 23:23       ` Dan Williams
2021-08-02 10:40         ` Joao Martins
2021-08-02 14:06           ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4hRQhG+0ika-wbxSFYrpmMJHxxX456qE64PMxDoxS+Fwg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=dave.jiang@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).