From: Dan Williams <dan.j.williams@intel.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Linux MM <linux-mm@kvack.org>,
Vishal Verma <vishal.l.verma@intel.com>,
Dave Jiang <dave.jiang@intel.com>,
Naoya Horiguchi <naoya.horiguchi@nec.com>,
Matthew Wilcox <willy@infradead.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Jane Chu <jane.chu@oracle.com>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jonathan Corbet <corbet@lwn.net>,
Linux NVDIMM <nvdimm@lists.linux.dev>,
Linux Doc Mailing List <linux-doc@vger.kernel.org>
Subject: Re: [PATCH v3 12/14] device-dax: compound pagemap support
Date: Wed, 28 Jul 2021 11:51:45 -0700 [thread overview]
Message-ID: <CAPcyv4hmoS8QSfBY6z07w9Ywjdq8WkROFVn3b_1bsE9i9_j3UA@mail.gmail.com> (raw)
In-Reply-To: <156c4fb8-46c5-d8ae-b953-837b86516ded@oracle.com>
On Wed, Jul 28, 2021 at 2:36 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>
> On 7/28/21 12:51 AM, Dan Williams wrote:
> > On Thu, Jul 15, 2021 at 5:01 AM Joao Martins <joao.m.martins@oracle.com> wrote:
> >> On 7/15/21 12:36 AM, Dan Williams wrote:
> >>> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins <joao.m.martins@oracle.com> wrote:
> >> This patch is not the culprit, the flaw is early in the series, specifically the fourth patch.
> >>
> >> It needs this chunk below change on the fourth patch due to the existing elevated page ref
> >> count at zone device memmap init. put_page() called here in memunmap_pages():
> >>
> >> for (i = 0; i < pgmap->nr_ranges; i++)
> >> for_each_device_pfn(pfn, pgmap, i)
> >> put_page(pfn_to_page(pfn));
> >>
> >> ... on a zone_device compound memmap would otherwise always decrease head page refcount by
> >> @geometry pfn amount (leading to the aforementioned splat you reported).
> >>
> >> diff --git a/mm/memremap.c b/mm/memremap.c
> >> index b0e7b8cf3047..79a883af788e 100644
> >> --- a/mm/memremap.c
> >> +++ b/mm/memremap.c
> >> @@ -102,15 +102,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id)
> >> return (range->start + range_len(range)) >> PAGE_SHIFT;
> >> }
> >>
> >> -static unsigned long pfn_next(unsigned long pfn)
> >> +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn)
> >> {
> >> if (pfn % 1024 == 0)
> >> cond_resched();
> >> - return pfn + 1;
> >> + return pfn + pgmap_pfn_geometry(pgmap);
> >
> > The cond_resched() would need to be fixed up too to something like:
> >
> > if (pfn % (1024 << pgmap_geometry_order(pgmap)))
> > cond_resched();
> >
> > ...because the goal is to take a break every 1024 iterations, not
> > every 1024 pfns.
> >
>
> Ah, good point.
>
> >> }
> >>
> >> #define for_each_device_pfn(pfn, map, i) \
> >> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn))
> >> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn))
> >>
> >> static void dev_pagemap_kill(struct dev_pagemap *pgmap)
> >> {
> >>
> >> It could also get this hunk below, but it is sort of redundant provided we won't touch
> >> tail page refcount through out the devmap pages lifetime. This setting of tail pages
> >> refcount to zero was in pre-v5.14 series, but it got removed under the assumption it comes
> >> from the page allocator (where tail pages are already zeroed in refcount).
> >
> > Wait, devmap pages never see the page allocator?
> >
> "where tail pages are already zeroed in refcount" this actually meant 'freshly allocated
> pages' and I was referring to commit 7118fc2906e2 ("hugetlb: address ref count racing in
> prep_compound_gigantic_page") that removed set_page_count() because the setting of page
> ref count to zero was redundant.
Ah, maybe include that reference in the changelog?
>
> Albeit devmap pages don't come from page allocator, you know separate zone and these pages
> aren't part of the regular page pools (e.g. accessible via alloc_pages()), as you are
> aware. Unless of course, we reassign them via dax_kmem, but then the way we map the struct
> pages would be regular without any devmap stuff.
Got it. I think with the back reference to that commit (7118fc2906e2)
it resolves my confusion.
>
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 96975edac0a8..469a7aa5cf38 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -6623,6 +6623,7 @@ static void __ref memmap_init_compound(struct page *page, unsigned
> >> long pfn,
> >> __init_zone_device_page(page + i, pfn + i, zone_idx,
> >> nid, pgmap);
> >> prep_compound_tail(page, i);
> >> + set_page_count(page + i, 0);
> >
> > Looks good to me and perhaps a for elevated tail page refcount at
> > teardown as a sanity check that the tail pages was never pinned
> > directly?
> >
> Sorry didn't follow completely.
>
> You meant to set tail page refcount back to 1 at teardown if it was kept to 0 (e.g.
> memunmap_pages() after put_page()) or that the refcount is indeed kept to zero after the
> put_page() in memunmap_pages() ?
The latter, i.e. would it be worth it to check that a tail page did
not get accidentally pinned instead of a head page? I'm also ok to
leave out that sanity checking for now.
next prev parent reply other threads:[~2021-07-28 18:52 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-14 19:35 [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Joao Martins
2021-07-14 19:35 ` [PATCH v3 01/14] memory-failure: fetch compound_head after pgmap_pfn_valid() Joao Martins
2021-07-15 0:17 ` Dan Williams
2021-07-15 2:51 ` [External] " Muchun Song
2021-07-15 6:40 ` Christoph Hellwig
2021-07-15 9:19 ` Muchun Song
2021-07-15 13:17 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 02/14] mm/page_alloc: split prep_compound_page into head and tail subparts Joao Martins
2021-07-15 0:19 ` Dan Williams
2021-07-15 2:53 ` [External] " Muchun Song
2021-07-15 13:17 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 03/14] mm/page_alloc: refactor memmap_init_zone_device() page init Joao Martins
2021-07-15 0:20 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 04/14] mm/memremap: add ZONE_DEVICE support for compound pages Joao Martins
2021-07-15 1:08 ` Dan Williams
2021-07-15 12:52 ` Joao Martins
2021-07-15 13:06 ` Joao Martins
2021-07-15 19:48 ` Dan Williams
2021-07-30 16:13 ` Joao Martins
2021-07-22 0:38 ` Jane Chu
2021-07-22 10:56 ` Joao Martins
2021-07-15 12:59 ` Christoph Hellwig
2021-07-15 13:15 ` Joao Martins
2021-07-15 6:48 ` Christoph Hellwig
2021-07-15 13:15 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 05/14] mm/sparse-vmemmap: add a pgmap argument to section activation Joao Martins
2021-07-28 5:56 ` Dan Williams
2021-07-28 9:43 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 06/14] mm/sparse-vmemmap: refactor core of vmemmap_populate_basepages() to helper Joao Martins
2021-07-28 6:04 ` Dan Williams
2021-07-28 10:48 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 07/14] mm/hugetlb_vmemmap: move comment block to Documentation/vm Joao Martins
2021-07-15 2:47 ` [External] " Muchun Song
2021-07-15 13:16 ` Joao Martins
2021-07-28 6:09 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 08/14] mm/sparse-vmemmap: populate compound pagemaps Joao Martins
2021-07-28 6:55 ` Dan Williams
2021-07-28 15:35 ` Joao Martins
2021-07-28 18:03 ` Dan Williams
2021-07-28 18:54 ` Joao Martins
2021-07-28 20:04 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 09/14] mm/page_alloc: reuse tail struct pages for " Joao Martins
2021-07-28 7:28 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-07-28 16:08 ` Dan Williams
2021-07-28 16:12 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 10/14] device-dax: use ALIGN() for determining pgoff Joao Martins
2021-07-28 7:29 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 11/14] device-dax: ensure dev_dax->pgmap is valid for dynamic devices Joao Martins
2021-07-28 7:30 ` Dan Williams
2021-07-28 15:56 ` Joao Martins
2021-08-06 12:28 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 12/14] device-dax: compound pagemap support Joao Martins
2021-07-14 23:36 ` Dan Williams
2021-07-15 12:00 ` Joao Martins
2021-07-27 23:51 ` Dan Williams
2021-07-28 9:36 ` Joao Martins
2021-07-28 18:51 ` Dan Williams [this message]
2021-07-28 18:59 ` Joao Martins
2021-07-28 19:03 ` Dan Williams
2021-07-14 19:35 ` [PATCH v3 13/14] mm/gup: grab head page refcount once for group of subpages Joao Martins
2021-07-28 19:55 ` Dan Williams
2021-07-28 20:07 ` Joao Martins
2021-07-28 20:23 ` Dan Williams
2021-08-25 19:10 ` Joao Martins
2021-08-25 19:15 ` Matthew Wilcox
2021-08-25 19:26 ` Joao Martins
2021-07-14 19:35 ` [PATCH v3 14/14] mm/sparse-vmemmap: improve memory savings for compound pud geometry Joao Martins
2021-07-28 20:03 ` Dan Williams
2021-07-28 20:08 ` Joao Martins
2021-07-14 21:48 ` [PATCH v3 00/14] mm, sparse-vmemmap: Introduce compound pagemaps Andrew Morton
2021-07-14 23:47 ` Dan Williams
2021-07-22 2:24 ` Matthew Wilcox
2021-07-22 10:53 ` Joao Martins
2021-07-27 23:23 ` Dan Williams
2021-08-02 10:40 ` Joao Martins
2021-08-02 14:06 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPcyv4hmoS8QSfBY6z07w9Ywjdq8WkROFVn3b_1bsE9i9_j3UA@mail.gmail.com \
--to=dan.j.williams@intel.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=dave.jiang@intel.com \
--cc=jane.chu@oracle.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=naoya.horiguchi@nec.com \
--cc=nvdimm@lists.linux.dev \
--cc=songmuchun@bytedance.com \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).