From: Mike Rapoport <rppt@linux.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>,
Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Qian Cai <cai@lca.pw>,
Michal Hocko <mhocko@kernel.org>,
linux-kernel@vger.kernel.org, Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
Date: Thu, 3 Dec 2020 12:51:07 +0200 [thread overview]
Message-ID: <20201203105107.GR123287@linux.ibm.com> (raw)
In-Reply-To: <X8iERlMGfNCLxab5@redhat.com>
On Thu, Dec 03, 2020 at 01:23:02AM -0500, Andrea Arcangeli wrote:
> On Wed, Dec 02, 2020 at 07:39:23PM +0200, Mike Rapoport wrote:
> > Hmm, do you have any logs? It worked on my box and with various memory
> > configurations in qemu.
>
> It crashes in reserve_bootmem_region so no logs at all since the
> serial console isn't active yet.
>
> > I believe that the proper solution would be to have all the memory
> > reserved by the firmware added to memblock.memory, like all other
> > architectures do, but I'm not sure if we can easily and reliably
> > determine what of E820 reserved types are actually backed by RAM.
> > So this is not feasible as a short term solution.
> >
> > My patch [1], though, is an immediate improvement and I think it's worth
> > trying to fix off-by-one's that prevent your system from booting.
> >
> > [1] https://lore.kernel.org/lkml/20201201181502.2340-1-rppt@kernel.org
>
> Yes that's the patch I applied.
>
> Relevant points after debugging:
>
> 1) can you try again with DEBUG_VM=y?
Mea cupla :)
> 2) DEBUG_VM=y already enforces the memset(page, 0xff, sizeof(struct
> page)) on all struct pages allocated, exactly I was suggesting to
> do in the previous email. I wonder why we're not doing that even
> with DEBUG_VM=n, perhaps it's too slow for TiB systems. See
> page_init_poison().
>
> 3) given 2), your patch below by deleting "0,0" initialization
> achieves the debug feature to keep PagePoison forever on all
> uninitialized pages, imagine PagePoison is really
> NO_NODEID/NO_ZONEID and doesn't need handling other than a crash.
>
> - __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
>
> 4) because of the feature in 3) I now hit the PagePoison VM_BUG_ON
> because pfn 0 wasn't initialized when reserve_bootmem_region tries
> to call __SetPageReserved on a PagePoison page.
So this is the off-by-one a was talking about. My patch removed
initializaion of the hole before the memblock.memory
> 5) pfn 0 is the classical case where pfn 0 is in a reserved zone in
> memblock.reserve that doesn't overlap any memblock.memory zone.
And, IMHO, this is the fundamental BUG.
There is RAM at address 0, there is a DIMM there, so why on earth this
should not be a part of memblock.memory?
> The memblock_start_of_DRAM moves the start of the DMA zone above
> the pfn 0, so then pfn 0 already ends up in the no zone land David
> mentioned where it's not trivial to decide to give it a zoneid if
> it's not even spanned in the zone.
>
> Not just the zoneid, there's not even a nid.
>
> So we have a pfn with no zoneid, and no nid, and not part of the
> zone ranges, not part of the nid ranges not part of the
> memory.memblock.
We have a pfn that should have been in memblock.memory, in ZONE_DMA and
in the first node with memory. If it was not trimmed from there
> We can't really skip the initialization of the the pfn 0, it has to
> get the zoneid 0 treatment or if pfn 1 ends up in compaction code,
> we'd crash again. (although we'd crash with a nice PagePoison(page)
> == true behavior)
Agree. Struct page for pfn should get the same zoneid and nodeid as pfn 1.
For me the below fixup worked (both with and without DEBUG_VM=y).
From 84a1c2531374706f3592a638523278aa29aaa448 Mon Sep 17 00:00:00 2001
From: Mike Rapoport <rppt@linux.ibm.com>
Date: Thu, 3 Dec 2020 11:40:17 +0200
Subject: [PATCH] fixup for "mm: refactor initialization of stuct page for holes"
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
mm/page_alloc.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ce2bdaabdf96..86fde4424e87 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6227,7 +6227,8 @@ void __init __weak memmap_init(unsigned long size, int nid,
unsigned long zone,
unsigned long range_start_pfn)
{
- unsigned long start_pfn, end_pfn, next_pfn = 0;
+ static unsigned long hole_start_pfn;
+ unsigned long start_pfn, end_pfn;
unsigned long range_end_pfn = range_start_pfn + size;
u64 pgcnt = 0;
int i;
@@ -6235,7 +6236,6 @@ void __init __weak memmap_init(unsigned long size, int nid,
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
- next_pfn = clamp(next_pfn, range_start_pfn, range_end_pfn);
if (end_pfn > start_pfn) {
size = end_pfn - start_pfn;
@@ -6243,10 +6243,10 @@ void __init __weak memmap_init(unsigned long size, int nid,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
}
- if (next_pfn < start_pfn)
- pgcnt += init_unavailable_range(next_pfn, start_pfn,
- zone, nid);
- next_pfn = end_pfn;
+ if (hole_start_pfn < start_pfn)
+ pgcnt += init_unavailable_range(hole_start_pfn,
+ start_pfn, zone, nid);
+ hole_start_pfn = end_pfn;
}
/*
@@ -6256,8 +6256,8 @@ void __init __weak memmap_init(unsigned long size, int nid,
* considered initialized. Make sure that memmap has a well defined
* state.
*/
- if (next_pfn < range_end_pfn)
- pgcnt += init_unavailable_range(next_pfn, range_end_pfn,
+ if (hole_start_pfn < range_end_pfn)
+ pgcnt += init_unavailable_range(hole_start_pfn, range_end_pfn,
zone, nid);
if (pgcnt)
--
2.28.0
...
> From 89469f063c192ae70aea0bd6e1e2a7e99894e5b6 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Wed, 2 Dec 2020 23:23:26 -0500
> Subject: [PATCH 1/1] mm: initialize struct pages in reserved regions outside
> of the zone ranges
>
> pfn 0 wasn't initialized and the PagePoison remained set when
> reserve_bootmem_region() called __SetPageReserved, inducing a silent
> boot failure with DEBUG_VM (and correctly so, because the crash
> signaled the nodeid/nid of pfn 0 would be again wrong).
>
> Without this change, the pfn 0 part of a memblock.reserved range,
> isn't in any zone spanned range, nor in any nid spanned range, when we
> initialize the memblock.memory holes pfn 0 won't be included because
> it has no nid and no zoneid.
>
> There's no enforcement that all memblock.reserved ranges must overlap
> memblock.memory ranges, so the memblock.reserved ranges also require
> an explicit initialization and the zones and nid ranges need to be
> extended to include all memblock.reserved ranges with struct pages, or
> they'll be left uninitialized with PagePoison as it happened to pfn 0.
I rather think that we should enforce the overlap between
memblock.reserved and memblock.memory. If there are regions that cannot
be mapped, we have other ways to deal with that (e.g. MEMBLOCK_NOMAP)
instead of dropping these ranges from memblock.memory.
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
> include/linux/memblock.h | 17 +++++++++---
> mm/debug.c | 3 ++-
> mm/memblock.c | 4 +--
> mm/page_alloc.c | 57 ++++++++++++++++++++++++++++++++--------
> 4 files changed, 63 insertions(+), 18 deletions(-)
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2020-12-03 10:51 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24 3:43 ` Baoquan He
2020-04-24 13:45 ` Qian Cai
2020-05-05 12:43 ` Baoquan He
2020-05-05 13:20 ` Qian Cai
2020-05-11 1:21 ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45 ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45 ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53 ` Andrea Arcangeli
2020-11-23 11:26 ` David Hildenbrand
2020-11-23 13:01 ` Vlastimil Babka
2020-11-24 13:32 ` Mel Gorman
2020-11-24 20:56 ` Andrea Arcangeli
2020-11-25 10:30 ` Mel Gorman
2020-11-25 17:59 ` Andrea Arcangeli
2020-11-26 10:47 ` Mel Gorman
2020-12-06 2:26 ` Andrea Arcangeli
2020-12-06 23:47 ` Mel Gorman
2020-11-25 5:34 ` Andrea Arcangeli
2020-11-25 6:45 ` David Hildenbrand
2020-11-25 8:51 ` Mike Rapoport
2020-11-25 10:39 ` Mel Gorman
2020-11-25 11:04 ` David Hildenbrand
2020-11-25 11:41 ` David Hildenbrand
2020-11-25 18:47 ` Andrea Arcangeli
2020-11-25 13:33 ` Mel Gorman
2020-11-25 13:41 ` David Hildenbrand
2020-11-25 18:28 ` Andrea Arcangeli
2020-11-25 19:27 ` David Hildenbrand
2020-11-25 20:41 ` Andrea Arcangeli
2020-11-25 21:13 ` David Hildenbrand
2020-11-25 21:04 ` Mike Rapoport
2020-11-25 21:38 ` Andrea Arcangeli
2020-11-26 9:36 ` Mike Rapoport
2020-11-26 10:05 ` David Hildenbrand
2020-11-26 17:46 ` Mike Rapoport
2020-11-29 12:32 ` Mike Rapoport
2020-12-02 0:44 ` Andrea Arcangeli
2020-12-02 17:39 ` Mike Rapoport
2020-12-03 6:23 ` Andrea Arcangeli
2020-12-03 10:51 ` Mike Rapoport [this message]
2020-12-03 17:31 ` Andrea Arcangeli
2020-12-06 8:09 ` Mike Rapoport
2020-11-26 18:15 ` Andrea Arcangeli
2020-11-26 18:29 ` Andrea Arcangeli
2020-11-26 19:44 ` Mike Rapoport
2020-11-26 20:30 ` Andrea Arcangeli
2020-11-26 21:03 ` Mike Rapoport
2020-11-26 19:21 ` Andrea Arcangeli
2020-11-25 12:08 ` Vlastimil Babka
2020-11-25 13:32 ` David Hildenbrand
2020-11-25 14:13 ` Mike Rapoport
2020-11-25 14:42 ` David Hildenbrand
2020-11-26 10:51 ` Mel Gorman
2020-11-25 19:14 ` Andrea Arcangeli
2020-11-25 19:01 ` Andrea Arcangeli
2020-11-25 19:33 ` David Hildenbrand
2020-11-26 3:40 ` Andrea Arcangeli
2020-11-26 10:43 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201203105107.GR123287@linux.ibm.com \
--to=rppt@linux.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=cai@lca.pw \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).