mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: akpm@linux-foundation.org, bhe@redhat.com, cai@lca.pw,
	david@redhat.com, mgorman@suse.de, mhocko@kernel.org,
	mm-commits@vger.kernel.org, stable@vger.kernel.org,
	vbabka@suse.cz
Subject: Re: + mm-initialize-struct-pages-in-reserved-regions-outside-of-the-zone-ranges.patch added to -mm tree
Date: Mon, 7 Dec 2020 18:50:37 +0200	[thread overview]
Message-ID: <20201207165037.GH1112728@linux.ibm.com> (raw)
In-Reply-To: <X80xQ6mvSwJZ0RvC@redhat.com>

On Sun, Dec 06, 2020 at 02:30:11PM -0500, Andrea Arcangeli wrote:
> On Sun, Dec 06, 2020 at 08:48:49AM +0200, Mike Rapoport wrote:
> > I don't see why we need all this complexity when a simple fixup was
> > enough.
> 
> Note the memblock bug crashed my systems:
> 
> 	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
> 
> Please note how zone_spans_pfn expands:
> 
> static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
> {
> 	return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
> }
> 
> For pfn_to_page(0), the zone_start_pfn is 1 for DMA zone. pfn is 0.

I don't know why you keep calling it "memblock bug" as pfn 0 was not
part of DMA zone at least for ten years.

If you backport this VM_BUG_ON_PAGE() to 2.6.36 that has no memblock on
x86 it will happily trigger there.

The problem, as I see it, in, hmm, questionable semantics for
!E820_TYPE_RAM ranges on x86.

I keep asking this simple question again and again and I still have no
clear "yes" or "no" answer:

	Is PFN 0 on x86 memory?

Let's step back for a second and instead of blaming memblock in lack of
enforcement for .memory and .reserved to overlap and talking about how
memblock core was not robust enough we'll get to the point where we have
clear defintions of what e820 ranges represent.

> > > +	/*
> > > +	 * memblock.reserved isn't enforced to overlap with
> > > +	 * memblock.memory so initialize the struct pages for
> > > +	 * memblock.reserved too in case it wasn't overlapping.
> > > +	 *
> > > +	 * If any struct page associated with a memblock.reserved
> > > +	 * range isn't overlapping with a zone range, it'll be left
> > > +	 * uninitialized, ideally with PagePoison, and it'll be a more
> > > +	 * easily detectable error.
> > > +	 */
> > > +	for_each_res_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > > +		start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > > +		end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > > +
> > > +		if (end_pfn > start_pfn)
> > > +			pgcnt += init_unavailable_range(start_pfn, end_pfn,
> > > +							zone, nid);
> > > +	}
> > 
> > This means we are going iterate over all memory allocated before
> > free_area_ini() from memblock extra time. One time here and another time
> > in reserve_bootmem_region().
> > And this can be substantial for CMA and alloc_large_system_hash().
> 
> The above loops over memory.reserved only.

Right, this loops over memory.reserved which happens to include CMA and
the memory allocated in alloc_large_system_hash(). This can be a lot.
And memmap_init() runs before threads are available so this cannot be
parallelized.
The deferred memmap initialization starts much later than this.

> > > @@ -7126,7 +7155,13 @@ unsigned long __init node_map_pfn_alignm
> > >   */
> > >  unsigned long __init find_min_pfn_with_active_regions(void)
> > >  {
> > > -	return PHYS_PFN(memblock_start_of_DRAM());
> > > +	/*
> > > +	 * reserved regions must be included so that their page
> > > +	 * structure can be part of a zone and obtain a valid zoneid
> > > +	 * before __SetPageReserved().
> > > +	 */
> > > +	return min(PHYS_PFN(memblock_start_of_DRAM()),
> > > +		   PHYS_PFN(memblock.reserved.regions[0].base));
> > 
> > So this implies that reserved memory starts before memory. Don't you
> > find this weird?
> 
> The above is a more complex way to write the below, which will make it
> more clear there's never overlap:
> 
> 		if (entry->type == E820_TYPE_SOFT_RESERVED)
> 			memblock_reserve(entry->addr, entry->size);
> 		else if (entry->type == E820_TYPE_RAM || entry->type == E820_TYPE_RESERVED_KERN)
> 			memblock_add(entry->addr, entry->size);

So what about E820_TYPE_ACPI or E820_TYPE_NVS?
How should they be treated?

It happend that we memblock_reserve() the first page, but what if the
BIOS marks pfn 1 as E820_TYPE_ACPI? It's neither in .memory nor in
.reserved. Does it belong to zone 0 and node 0?

Or, if, say, node 1 starts at 4G and the BIOS claimed several first pages
of node 1 as type 20?
What should we do then?

> Thanks,
> Andrea
> 

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2020-12-07 16:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-06  0:54 + mm-initialize-struct-pages-in-reserved-regions-outside-of-the-zone-ranges.patch added to -mm tree akpm
2020-12-06  6:48 ` Mike Rapoport
2020-12-06 19:30   ` Andrea Arcangeli
2020-12-07 16:50     ` Mike Rapoport [this message]
2020-12-08  2:57       ` Andrea Arcangeli
2020-12-08 21:46         ` Mike Rapoport
2020-12-08 23:13           ` Andrea Arcangeli
2020-12-09 21:42             ` Mike Rapoport
2020-12-07  8:58 ` David Hildenbrand
2020-12-07  9:45   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207165037.GH1112728@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).