linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Mike Rapoport <rppt@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Baoquan He <bhe@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Michal Hocko <mhocko@kernel.org>,
	Mike Rapoport <rppt@linux.ibm.com>, Qian Cai <cai@lca.pw>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	stable@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v2 1/2] mm: memblock: enforce overlap of memory.memblock and memory.reserved
Date: Mon, 14 Dec 2020 08:58:20 -0500	[thread overview]
Message-ID: <X9dvfDcSrlEj5y6K@redhat.com> (raw)
In-Reply-To: <a512bd63-b171-3ed5-6996-2c99b6c9a226@redhat.com>

On Mon, Dec 14, 2020 at 12:18:07PM +0100, David Hildenbrand wrote:
> On 14.12.20 12:12, Mike Rapoport wrote:
> > On Mon, Dec 14, 2020 at 11:11:35AM +0100, David Hildenbrand wrote:
> >> On 09.12.20 22:43, Mike Rapoport wrote:
> >>> From: Mike Rapoport <rppt@linux.ibm.com>
> >>>
> >>> memblock does not require that the reserved memory ranges will be a subset
> >>> of memblock.memory.
> >>>
> >>> As the result there maybe reserved pages that are not in the range of any
> >>> zone or node because zone and node boundaries are detected based on
> >>> memblock.memory and pages that only present in memblock.reserved are not
> >>> taken into account during zone/node size detection.
> >>>
> >>> Make sure that all ranges in memblock.reserved are added to memblock.memory
> >>> before calculating node and zone boundaries.
> >>>
> >>> Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
> >>> Reported-by: Andrea Arcangeli <aarcange@redhat.com>
> >>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> >>> ---
> >>>  include/linux/memblock.h |  1 +
> >>>  mm/memblock.c            | 24 ++++++++++++++++++++++++
> >>>  mm/page_alloc.c          |  7 +++++++
> >>>  3 files changed, 32 insertions(+)
> >>>
> >>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> >>> index ef131255cedc..e64dae2dd1ce 100644
> >>> --- a/include/linux/memblock.h
> >>> +++ b/include/linux/memblock.h
> >>> @@ -120,6 +120,7 @@ int memblock_clear_nomap(phys_addr_t base, phys_addr_t size);
> >>>  unsigned long memblock_free_all(void);
> >>>  void reset_node_managed_pages(pg_data_t *pgdat);
> >>>  void reset_all_zones_managed_pages(void);
> >>> +void memblock_enforce_memory_reserved_overlap(void);
> >>>  
> >>>  /* Low level functions */
> >>>  void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags,
> >>> diff --git a/mm/memblock.c b/mm/memblock.c
> >>> index b68ee86788af..9277aca642b2 100644
> >>> --- a/mm/memblock.c
> >>> +++ b/mm/memblock.c
> >>> @@ -1857,6 +1857,30 @@ void __init_memblock memblock_trim_memory(phys_addr_t align)
> >>>  	}
> >>>  }
> >>>  
> >>> +/**
> >>> + * memblock_enforce_memory_reserved_overlap - make sure every range in
> >>> + * @memblock.reserved is covered by @memblock.memory
> >>> + *
> >>> + * The data in @memblock.memory is used to detect zone and node boundaries
> >>> + * during initialization of the memory map and the page allocator. Make
> >>> + * sure that every memory range present in @memblock.reserved is also added
> >>> + * to @memblock.memory even if the architecture specific memory
> >>> + * initialization failed to do so
> >>> + */
> >>> +void __init memblock_enforce_memory_reserved_overlap(void)
> >>> +{
> >>> +	phys_addr_t start, end;
> >>> +	int nid;
> >>> +	u64 i;
> >>> +
> >>> +	__for_each_mem_range(i, &memblock.reserved, &memblock.memory,
> >>> +			     NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, &nid) {
> >>> +		pr_warn("memblock: reserved range [%pa-%pa] is not in memory\n",
> >>> +			&start, &end);
> >>> +		memblock_add_node(start, (end - start), nid);
> >>> +	}
> >>> +}
> >>> +
> >>>  void __init_memblock memblock_set_current_limit(phys_addr_t limit)
> >>>  {
> >>>  	memblock.current_limit = limit;
> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>> index eaa227a479e4..dbc57dbbacd8 100644
> >>> --- a/mm/page_alloc.c
> >>> +++ b/mm/page_alloc.c
> >>> @@ -7436,6 +7436,13 @@ void __init free_area_init(unsigned long *max_zone_pfn)
> >>>  	memset(arch_zone_highest_possible_pfn, 0,
> >>>  				sizeof(arch_zone_highest_possible_pfn));
> >>>  
> >>> +	/*
> >>> +	 * Some architectures (e.g. x86) have reserved pages outside of
> >>> +	 * memblock.memory. Make sure these pages are taken into account
> >>> +	 * when detecting zone and node boundaries
> >>> +	 */
> >>> +	memblock_enforce_memory_reserved_overlap();
> >>> +
> >>>  	start_pfn = find_min_pfn_with_active_regions();
> >>>  	descending = arch_has_descending_max_zone_pfns();
> >>>  
> >>>
> >>
> >> CCing Dan.
> >>
> >> This implies that any memory that is E820_TYPE_SOFT_RESERVED that was
> >> reserved via memblock_reserve() will be added via memblock_add_node() as
> >> well, resulting in all such memory getting a memmap allocated right when
> >> booting up, right?
> >>
> >> IIRC, there are use cases where that is absolutely not desired.
> > 
> > Hmm, if this is the case we need entirely different solution to ensure
> > that we don't have partial pageblocks in a zone and we have all the
> > memory map initialized to a known state.
> > 
> >> Am I missing something? (@Dan?)
> > 
> > BTW, @Dan, why did you need to memblock_reserve(E820_TYPE_SOFT_RESERVED)
> > without memblock_add()ing it?
> 
> I suspect to cover cases where it might partially span memory sections
> (or even sub-sections). Maybe we should focus on initializing that part
> only - meaning, not adding all memory to .memory but only !section
> aligned pieces.

We had that information left in the memblock data structure with the
previous implementation in -mm (before adding all memblock.reserved to
memblock.memory). To avoid destroying that information we'll need a
new flag for each range that is not originally in memblock.memory:

===
What you suggest would require adding extra information to flag which
ranges must not have a direct mapping, but that information is already
in memblock today, for each range in memblock_reserved but not in
memblock.memory or did I misunderstand how that no-direct-map detail works?
===

I guess I was too optimistic that this was already implemented, thanks
for noticing.

For the record, I didn't have time to test the new implementation
yet. Since I'm running the "hack" on all machines things have been
stable on v5.9. I'm actually curious if the hack would also fail boot
on the CI system or not, that would help localize the issue into the
implicit memblock_add at least. The memblock debug output won't give
us a direct reproducer, but we can try to generate one by reproducing
the same e820 map in seabios.

Andrea


  reply	other threads:[~2020-12-14 14:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 21:43 [PATCH v2 0/2] mm: fix initialization of struct page for holes in memory layout Mike Rapoport
2020-12-09 21:43 ` [PATCH v2 1/2] mm: memblock: enforce overlap of memory.memblock and memory.reserved Mike Rapoport
2020-12-10  9:28   ` Greg KH
2020-12-14 10:11   ` David Hildenbrand
2020-12-14 11:12     ` Mike Rapoport
2020-12-14 11:18       ` David Hildenbrand
2020-12-14 13:58         ` Andrea Arcangeli [this message]
2020-12-09 21:43 ` [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout Mike Rapoport
2020-12-10  1:51   ` Andrea Arcangeli
2020-12-10  9:29   ` Greg KH
2021-01-04 19:03   ` Qian Cai
2021-01-05  8:24     ` Mike Rapoport
2021-01-05 18:45       ` Qian Cai
2021-01-06  8:05         ` Mike Rapoport
2021-01-06 21:04           ` Qian Cai
2021-01-10 15:39             ` Mike Rapoport
2021-01-11 15:06               ` Qian Cai
2021-01-11 17:47                 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9dvfDcSrlEj5y6K@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).