All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
Date: Wed, 20 Jan 2021 11:30:07 +0100	[thread overview]
Message-ID: <20210120103007.GH9371@dhcp22.suse.cz> (raw)
In-Reply-To: <161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed 13-01-21 16:43:26, Dan Williams wrote:
> While pfn_to_online_page() is able to determine pfn_valid() at
> subsection granularity it is not able to reliably determine if a given
> pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with
> ZONE_DEVICE. This means that pfn_to_online_page() may return invalid
> @page objects. For example with a memory map like:
> 
> 100000000-1fbffffff : System RAM
>   142000000-143002e16 : Kernel code
>   143200000-143713fff : Kernel rodata
>   143800000-143b15b7f : Kernel data
>   144227000-144ffffff : Kernel bss
> 1fc000000-2fbffffff : Persistent Memory (legacy)
>   1fc000000-2fbffffff : namespace0.0
> 
> This command:
> 
> echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page
> 
> ...succeeds when it should fail. When it succeeds it touches
> an uninitialized page and may crash or cause other damage (see
> dissolve_free_huge_page()).
> 
> While the memory map above is contrived via the memmap=ss!nn kernel
> command line option, the collision happens in practice on shipping
> platforms. The memory controller resources that decode spans of
> physical address space are a limited resource. One technique
> platform-firmware uses to conserve those resources is to share a decoder
> across 2 devices to keep the address range contiguous. Unfortunately the
> unit of operation of a decoder is 64MiB while the Linux section size is
> 128MiB. This results in situations where, without subsection hotplug
> memory mappings with different lifetimes collide into one object that
> can only express one lifetime.

Thank you this is a very useful insight to have in the changelog.

> Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
> section that mixes ZONE_DEVICE pfns with other online pfns. With
> SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall
> back to a slow-path check for ZONE_DEVICE pfns in an online section. In
> the fast path online_section() for a full ZONE_DEVICE section returns
> false.
> 
> Because the collision case is rare, and for simplicity, the
> SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reported-by: Michal Hocko <mhocko@suse.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Acked-by: Michal Hocko <mhocko@suse.com>

I do not want to bikeshed but online_device_section is quite confusing.
device_mixed_section would sound like a better name to me.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, David Hildenbrand <david@redhat.com>,
	Oscar Salvador <osalvador@suse.de>,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
Date: Wed, 20 Jan 2021 11:30:07 +0100	[thread overview]
Message-ID: <20210120103007.GH9371@dhcp22.suse.cz> (raw)
In-Reply-To: <161058500675.1840162.7887862152161279354.stgit@dwillia2-desk3.amr.corp.intel.com>

On Wed 13-01-21 16:43:26, Dan Williams wrote:
> While pfn_to_online_page() is able to determine pfn_valid() at
> subsection granularity it is not able to reliably determine if a given
> pfn is also online if the section is mixes ZONE_{NORMAL,MOVABLE} with
> ZONE_DEVICE. This means that pfn_to_online_page() may return invalid
> @page objects. For example with a memory map like:
> 
> 100000000-1fbffffff : System RAM
>   142000000-143002e16 : Kernel code
>   143200000-143713fff : Kernel rodata
>   143800000-143b15b7f : Kernel data
>   144227000-144ffffff : Kernel bss
> 1fc000000-2fbffffff : Persistent Memory (legacy)
>   1fc000000-2fbffffff : namespace0.0
> 
> This command:
> 
> echo 0x1fc000000 > /sys/devices/system/memory/soft_offline_page
> 
> ...succeeds when it should fail. When it succeeds it touches
> an uninitialized page and may crash or cause other damage (see
> dissolve_free_huge_page()).
> 
> While the memory map above is contrived via the memmap=ss!nn kernel
> command line option, the collision happens in practice on shipping
> platforms. The memory controller resources that decode spans of
> physical address space are a limited resource. One technique
> platform-firmware uses to conserve those resources is to share a decoder
> across 2 devices to keep the address range contiguous. Unfortunately the
> unit of operation of a decoder is 64MiB while the Linux section size is
> 128MiB. This results in situations where, without subsection hotplug
> memory mappings with different lifetimes collide into one object that
> can only express one lifetime.

Thank you this is a very useful insight to have in the changelog.

> Update move_pfn_range_to_zone() to flag (SECTION_TAINT_ZONE_DEVICE) a
> section that mixes ZONE_DEVICE pfns with other online pfns. With
> SECTION_TAINT_ZONE_DEVICE to delineate, pfn_to_online_page() can fall
> back to a slow-path check for ZONE_DEVICE pfns in an online section. In
> the fast path online_section() for a full ZONE_DEVICE section returns
> false.
> 
> Because the collision case is rare, and for simplicity, the
> SECTION_TAINT_ZONE_DEVICE flag is never cleared once set.
> 
> Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reported-by: Michal Hocko <mhocko@suse.com>
> Reported-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: David Hildenbrand <david@redhat.com>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Acked-by: Michal Hocko <mhocko@suse.com>

I do not want to bikeshed but online_device_section is quite confusing.
device_mixed_section would sound like a better name to me.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-01-20 10:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-14  0:43 [PATCH v4 0/5] mm: Fix pfn_to_online_page() with respect to ZONE_DEVICE Dan Williams
2021-01-14  0:43 ` Dan Williams
2021-01-14  0:43 ` [PATCH v4 1/5] mm: Move pfn_to_online_page() out of line Dan Williams
2021-01-14  0:43   ` Dan Williams
2021-01-20 10:11   ` Michal Hocko
2021-01-20 10:11     ` Michal Hocko
2021-01-14  0:43 ` [PATCH v4 2/5] mm: Teach pfn_to_online_page() to consider subsection validity Dan Williams
2021-01-14  0:43   ` Dan Williams
2021-01-20 10:24   ` Michal Hocko
2021-01-20 10:24     ` Michal Hocko
2021-01-14  0:43 ` [PATCH v4 3/5] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions Dan Williams
2021-01-14  0:43   ` Dan Williams
2021-01-20 10:30   ` Michal Hocko [this message]
2021-01-20 10:30     ` Michal Hocko
2021-01-14  0:43 ` [PATCH v4 4/5] mm: Fix page reference leak in soft_offline_page() Dan Williams
2021-01-14  0:43   ` Dan Williams
2021-01-14  1:49   ` HORIGUCHI NAOYA(堀口 直也)
2021-01-14  1:49     ` HORIGUCHI NAOYA(堀口 直也)
2021-01-14  6:18     ` Dan Williams
2021-01-14  6:18       ` Dan Williams
2021-01-14  6:18       ` Dan Williams
2021-01-14  6:30       ` Oscar Salvador
2021-01-14  6:30         ` Oscar Salvador
2021-01-14  7:10       ` HORIGUCHI NAOYA(堀口 直也)
2021-01-14  7:10         ` HORIGUCHI NAOYA(堀口 直也)
2021-01-17 22:01   ` Andrew Morton
2021-01-17 22:01     ` Andrew Morton
2021-01-17 22:35     ` Dan Williams
2021-01-17 22:35       ` Dan Williams
2021-01-17 22:35       ` Dan Williams
2021-01-14  0:43 ` [PATCH v4 5/5] mm: Fix memory_failure() handling of dax-namespace metadata Dan Williams
2021-01-14  0:43   ` Dan Williams
2021-01-14  2:25   ` HORIGUCHI NAOYA(堀口 直也)
2021-01-14  2:25     ` HORIGUCHI NAOYA(堀口 直也)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210120103007.GH9371@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.