linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Qian Cai <cai@lca.pw>,
	Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
Date: Wed, 25 Nov 2020 23:04:14 +0200	[thread overview]
Message-ID: <20201125210414.GO123287@linux.ibm.com> (raw)
In-Reply-To: <a4cc62ba-8066-3e9c-cead-98cd74d313dd@redhat.com>

On Wed, Nov 25, 2020 at 08:27:21PM +0100, David Hildenbrand wrote:
> On 25.11.20 19:28, Andrea Arcangeli wrote:
> > On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote:
> >
> > What would need to call pfn_zone in between first and second stage?
> > 
> > If something calls pfn_zone in between first and second stage isn't it
> > a feature if it crashes the kernel at boot?
> > 
> > Note: I suggested 0xff kernel crashing "until the second stage comes
> > around" during meminit at boot, not permanently.
> 
> Yes, then it makes sense - if we're able to come up with a way to
> initialize any memmap we might have - including actual memory holes that
> have a memmap.
> 
> > 
> > 		/*
> > 		 * Use a fake node/zone (0) for now. Some of these pages
> > 		 * (in memblock.reserved but not in memblock.memory) will
> > 		 * get re-initialized via reserve_bootmem_region() later.
> > 		 */
> > 
> > Specifically I relied on the comment "get re-initialized via
> > reserve_bootmem_region() later".
> 
> Yes, but there is a "Some of these" :)
> 
> Boot a VM with "-M 4000" and observe the memmap in the last section -
> they won't get initialized a second time.
> 
> > 
> > I assumed the second stage overwrites the 0,0 to the real zoneid/nid
> > value, which is clearly not happening, hence it'd be preferable to get
> > a crash at boot reliably.
> > 
> > Now I have CONFIG_DEFERRED_STRUCT_PAGE_INIT=n so the second stage
> > calling init_reserved_page(start_pfn) won't do much with
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT=n but I already tried to enable
> > CONFIG_DEFERRED_STRUCT_PAGE_INIT=y yesterday and it didn't help, the
> > page->flags were still wrong for reserved pages in the "Unknown E820
> > type" region.

I think the very root cause is how e820__memblock_setup() registers
memory with memblock:

		if (entry->type == E820_TYPE_SOFT_RESERVED)
			memblock_reserve(entry->addr, entry->size);

		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
			continue;

		memblock_add(entry->addr, entry->size);

From that point the system has inconsistent view of RAM in both
memblock.memory and memblock.reserved and, which is then translated to
memmap etc.

Unfortunately, simply adding all RAM to memblock is not possible as
there are systems that for them "the addresses listed in the reserved
range must never be accessed, or (as we discovered) even be reachable by
an active page table entry" [1].

[1] https://lore.kernel.org/lkml/20200528151510.GA6154@raspberrypi/

-- 
Sincerely yours,
Mike.

  parent reply	other threads:[~2020-11-25 21:04 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24  3:43 ` Baoquan He
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport [this message]
2020-11-25 21:38                 ` Andrea Arcangeli
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201125210414.GO123287@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).