linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
Date: Tue, 12 Jan 2021 01:18:20 -0800	[thread overview]
Message-ID: <CAPcyv4gXaUswZS_a4-oiQZWVZ4QDJrKps4XGs=xxLE0Ls=PSmg@mail.gmail.com> (raw)
In-Reply-To: <75bb1429-d133-d303-a67a-be16c654ada8@redhat.com>

On Thu, Jan 7, 2021 at 1:16 AM David Hildenbrand <david@redhat.com> wrote:
>
> [...]
>
> >>> Well, I would love to have no surprises either. So far there was not
> >>> actual argument why the pmem reserved space cannot be fully initialized.
> >>
> >> Yes, I'm still hoping Dan can clarify that.
> >
> > Complexity and effective utility (once pfn_to_online_page() is fixed)
> > are the roadblocks in my mind. The altmap is there to allow for PMEM
> > capacity to be used as memmap space, so there would need to be code to
> > break that circular dependency and allocate a memmap for the metadata
> > space from DRAM and the rest of the memmap space for the data capacity
> > from pmem itself. That memmap-for-pmem-metadata will still represent
> > offline pages. So once pfn_to_online_page() is fixed, what pfn-walker
> > is going to be doing pfn_to_page() on PMEM metadata? Secondly, there
>
> Assume I do
>
> pgmap = get_dev_pagemap(pfn, NULL);
> if (pgmap)
>         return pfn_to_page(pfn);
> return NULL;
>
> on a random pfn because I want to inspect ZONE_DEVICE PFNs.

I keep getting hung up on the motivation to do random pfn inspection?

The problems we have found to date have required different solutions.
The KVM bug didn't use get_dev_pagemap() to inspect the pfn because it
could rely on the fact that the page already had an elevated reference
count. The get_user_pages() path only looks up ZONE_DEVICE pfns when
it see {pte,pmd,pud}_devmap set in the page table entry. pfn walkers
have been a problem, but with pfn_to_online_page() fixed what is the
remaining motivation to inspect ZONE_DEVICE pfns?

> IIUC, the memmap I get might usually be initialized, except we're
> hitting a PFN in the reserved altmap space. Correct?

The pagemap currently returns true for every pfn in the range
including those in the altmap.



>
> Do we expect this handling to not be valid - i.e., initialization to be
> of no utility? (to make it fly we would have to explicitly check for
> PFNs in the altmap reserved space, which wouldn't be too problematic)
>
> > is a PMEM namespace mode called "raw" that eschews DAX and 'struct
> > page' for pmem and just behaves like a memory-backed block device. The
> > end result is still that pfn walkers need to discover if a PMEM pfn
> > has a page, so I don't see what "sometimes there's an
> > memmap-for-pmem-metadata" buys us?
>
> Right, but that's as easy as doing a pfn_valid() first.
>
>
> Let me try to express what I (and I think Michal) mean:
>
> In pagemap_range(), we
>
> 1. move_pfn_range_to_zone()->memmap_init_zone(): initialize the memmap
> of the PMEM device used as memmap itself ("self host", confusing). We
> skip initializing the memmap for the the reserved altmap space.
>
> 2. memmap_init_zone_device(): initialize the memmap of everything
> outside of the altmap space.
>
> IIUC, the memmap of the reserved altmap space remains uninitialized.
> What speaks against just initializing that part in e.g., 2. or similarly
> after 1.?
>
>
> I'm planning on documenting the result of this discussion in the code,
> so people don't have to scratch their head whenever stumbling over the
> altmap reserved space.
>
> >
> >>
> >>> On the other hand making sure that pfn_to_online_page sounds like the
> >>> right thing to do. And having an explicit check for zone device there in
> >>> a slow path makes sense to me.
> >>
> >> As I said, I'd favor to simplify and just get rid of the special case,
> >> instead of coming up with increasingly complex ways to deal with it.
> >> pfn_to_online_page() used to be simple, essentially checking a single
> >> flag was sufficient in most setups.
> >
> > I think the logic to throw away System RAM that might collide with
> > PMEM and soft-reserved memory within a section is on the order of the
> > same code complexity as the patch proposed here, no? Certainly the
> > throw-away concept itself is easier to grasp, but I don't think that
> > would be reflected in the code patch to achieve it... willing to be
> > proved wrong with a patch.
>
> Well, at least it sounds easier to go over memblock holes and
> align-up/down some relevant PFNs to section boundaries, ending up with
> no affect to runtime performance later (e.g., pfn_to_online_page()). But
> I agree that most probably the devil is in the detail - e.g., handling
> different kind of holes (different flavors of "reserved") and syncing up
> other data structures (e.g., resource tree).
>
> I don't have time to look into that right now, but might look into it in
> the future. For now I'm fine with this approach, assuming we don't
> discover other corner cases that turn it even more complex. I'm happy
> that we finally talk about it and fix it!
>
>
> --
> Thanks,
>
> David / dhildenb
>

  reply	other threads:[~2021-01-12  9:19 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06  4:07 [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
2021-01-06  9:55 ` Michal Hocko
2021-01-12  9:15   ` Dan Williams
2021-01-06  9:56 ` David Hildenbrand
2021-01-06 10:42   ` Michal Hocko
2021-01-06 11:22     ` David Hildenbrand
2021-01-06 11:38       ` Michal Hocko
2021-01-06 20:02       ` Dan Williams
2021-01-07  9:15         ` David Hildenbrand
2021-01-12  9:18           ` Dan Williams [this message]
2021-01-12  9:44             ` David Hildenbrand
2021-01-12 20:52               ` Dan Williams
2021-01-06 10:04 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4gXaUswZS_a4-oiQZWVZ4QDJrKps4XGs=xxLE0Ls=PSmg@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).