All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Toshiki Fukasawa <t-fukasawa@vx.jp.nec.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"mhocko@kernel.org" <mhocko@kernel.org>,
	"adobriyan@gmail.com" <adobriyan@gmail.com>,
	"hch@lst.de" <hch@lst.de>,
	"longman@redhat.com" <longman@redhat.com>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"mst@redhat.com" <mst@redhat.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Junichi Nomura <j-nomura@ce.jp.nec.com>
Subject: Re: [RFC PATCH v2] mm: initialize struct pages reserved by ZONE_DEVICE driver.
Date: Mon, 9 Sep 2019 14:05:59 +0200	[thread overview]
Message-ID: <c6198acd-8ff7-c40c-cb4e-f0f12f841b38@redhat.com> (raw)
In-Reply-To: <CAPcyv4gA4mcDEPeCFokn_jy5gX62cK0U40EzL7M8c0iDO7U7bg@mail.gmail.com>

On 09.09.19 13:53, Dan Williams wrote:
> On Mon, Sep 9, 2019 at 1:11 AM David Hildenbrand <david@redhat.com> wrote:
> [..]
>>>> It seems that SECTION_IS_ONLINE and SECTION_MARKED_PRESENT can be used to
>>>> distinguish uninitialized struct pages if we can apply them to ZONE_DEVICE,
>>>> but that is no longer necessary with this approach.
>>>
>>> Let's take a step back here to understand the issues I am aware of. I
>>> think we should solve this for good now:
>>>
>>> A PFN walker takes a look at a random PFN at a random point in time. It
>>> finds a PFN with SECTION_MARKED_PRESENT && !SECTION_IS_ONLINE. The
>>> options are:
>>>
>>> 1. It is buddy memory (add_memory()) that has not been online yet. The
>>> memmap contains garbage. Don't access.
>>>
>>> 2. It is ZONE_DEVICE memory with a valid memmap. Access it.
>>>
>>> 3. It is ZONE_DEVICE memory with an invalid memmap, because the section
>>> is only partially present: E.g., device starts at offset 64MB within a
>>> section or the device ends at offset 64MB within a section. Don't access it.
>>>
>>> 4. It is ZONE_DEVICE memory with an invalid memmap, because the memmap
>>> was not initialized yet. memmap_init_zone_device() did not yet succeed
>>> after dropping the mem_hotplug lock in mm/memremap.c. Don't access it.
>>>
>>> 5. It is reserved ZONE_DEVICE memory ("pages mapped, but reserved for
>>> driver") with an invalid memmap. Don't access it.
>>>
>>> I can see that your patch tries to make #5 vanish by initializing the
>>> memmap, fair enough. #3 and #4 can't be detected. The PFN walker could
>>> still stumble over uninitialized memmaps.
>>>
>>
>> FWIW, I thinkg having something like pfn_zone_device(), similarly
>> implemented like pfn_zone_device_reserved() could be one solution to
>> most issues.
> 
> I've been thinking of a replacement for PTE_DEVMAP with section-level,
> or sub-section level flags. The section-level flag would still require
> a call to get_dev_pagemap() to validate that the pfn is not section in
> the subsection case which seems to be entirely too much overhead. If
> ZONE_DEVICE is to be a first class citizen in pfn walkers I think it
> would be worth the cost to double the size of subsection_map and to
> identify whether a sub-section is ZONE_DEVICE, or not.
> 
> Thoughts?
> 

I thought about this last week and came up with something like

1. Convert SECTION_IS_ONLINE to SECTION IS_ACTIVE

2. Make pfn_to_online_page() also check that it's not ZONE_DEVICE.
Online pfns are limited to !ZONE_DEVICE.

3. Extend subsection_map to an additional active_map

4. Set SECTION IS_ACTIVE *iff* the whole active_map is set. This keeps
most accesses of pfn_to_online_page() fast. If !SECTION IS_ACTIVE, check
the active_map.

5. Set sub-sections active/unactive in
move_pfn_range_to_zone()/remove_pfn_range_from_zone() - see "[PATCH v4
0/8] mm/memory_hotplug: Shrink zones before removing memory​" for the
latter.

6. Set boot memory properly active (this is a tricky bit :/ ).

However, it turned out too complex for my taste (and limited time to
spend on this), so I abandoned that idea for now. If somebody wants to
pick that up, fine.

-- 

Thanks,

David / dhildenb

  reply	other threads:[~2019-09-09 12:06 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06  8:09 [RFC PATCH v2] mm: initialize struct pages reserved by ZONE_DEVICE driver Toshiki Fukasawa
2019-09-06  8:45 ` David Hildenbrand
2019-09-06 10:02   ` Toshiki Fukasawa
2019-09-06 10:35     ` David Hildenbrand
2019-09-09  5:48       ` Toshiki Fukasawa
2019-09-09  7:46         ` David Hildenbrand
2019-09-09  8:10           ` David Hildenbrand
2019-09-09 11:53             ` Dan Williams
2019-09-09 11:53               ` Dan Williams
2019-09-09 12:05               ` David Hildenbrand [this message]
2019-09-10  9:21                 ` Dan Williams
2019-09-10  9:21                   ` Dan Williams
2019-09-10 10:10                   ` David Hildenbrand
2019-09-17  2:34           ` Toshiki Fukasawa
2019-09-17  7:13             ` David Hildenbrand
2019-09-17  9:32               ` Toshiki Fukasawa
2019-09-17 10:20                 ` David Hildenbrand
2019-09-18  2:16                   ` Toshiki Fukasawa
2019-09-17 15:49               ` Waiman Long
2019-09-17 16:21                 ` Qian Cai
2019-09-17 16:21                   ` Qian Cai
2019-09-17 17:04                   ` Waiman Long
2019-09-17 20:23                     ` David Hildenbrand
2019-09-18  2:28                     ` Toshiki Fukasawa
2019-09-18  7:30           ` David Hildenbrand
2019-09-10 14:01 ` Michal Hocko
2019-09-10 14:53   ` Dan Williams
2019-09-10 14:53     ` Dan Williams
2019-09-10 17:35     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6198acd-8ff7-c40c-cb4e-f0f12f841b38@redhat.com \
    --to=david@redhat.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=j-nomura@ce.jp.nec.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=sfr@canb.auug.org.au \
    --cc=t-fukasawa@vx.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.