linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Qian Cai <cai@lca.pw>,
	Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, Baoquan He <bhe@redhat.com>
Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
Date: Wed, 25 Nov 2020 16:38:16 -0500	[thread overview]
Message-ID: <X77OyM8utmWcq1Di@redhat.com> (raw)
In-Reply-To: <20201125210414.GO123287@linux.ibm.com>

On Wed, Nov 25, 2020 at 11:04:14PM +0200, Mike Rapoport wrote:
> I think the very root cause is how e820__memblock_setup() registers
> memory with memblock:
> 
> 		if (entry->type == E820_TYPE_SOFT_RESERVED)
> 			memblock_reserve(entry->addr, entry->size);
> 
> 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> 			continue;
> 
> 		memblock_add(entry->addr, entry->size);
> 
> From that point the system has inconsistent view of RAM in both
> memblock.memory and memblock.reserved and, which is then translated to
> memmap etc.
> 
> Unfortunately, simply adding all RAM to memblock is not possible as
> there are systems that for them "the addresses listed in the reserved
> range must never be accessed, or (as we discovered) even be reachable by
> an active page table entry" [1].
> 
> [1] https://lore.kernel.org/lkml/20200528151510.GA6154@raspberrypi/

It looks like what's missing is a blockmem_reserve which I don't think
would interfere at all with the issue above since it won't create
direct mapping and it'll simply invoke the second stage that wasn't
invoked here.

I guess this would have a better chance to have the second
initialization stage run in reserve_bootmem_region and it would likely
solve the problem without breaking E820_TYPE_RESERVED which is known
by the kernel:

> 		if (entry->type == E820_TYPE_SOFT_RESERVED)
> 			memblock_reserve(entry->addr, entry->size);
> 

+		if (entry->type == 20)
+			memblock_reserve(entry->addr, entry->size);

> 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> 			continue;
> 

This is however just to show the problem, I didn't check what type 20
is.

To me it doesn't look the root cause though, the root cause is that if
you don't call memblock_reserve the page->flags remains uninitialized.

I think the page_alloc.c need to be more robust and detect at least if
if holes within zones (but ideally all pfn_valid of all struct pages
in system even if beyond the end of the zone) aren't being initialized
in the second stage without relying on the arch code to remember to
call memblock_reserve.

In fact it's not clear why memblock_reserve even exists, that
information can be calculated reliably by page_alloc in function of
memblock.memory alone by walking all nodes and all zones. It doesn't
even seem to help in destroying the direct mapping,
reserve_bootmem_region just initializes the struct pages so it doesn't
need a special memeblock_reserved to find those ranges.

In fact it's scary that codes then does stuff like this trusting the
memblock_reserve is nearly complete information (which obviously isn't
given type 20 doesn't get queued and I got that type 20 in all my systems):

	for_each_reserved_mem_region(i, &start, &end) {
		if (addr >= start && addr_end <= end)
			return true;
	}

That code in irq-gic-v3-its.c should stop using
for_each_reserved_mem_region and start doing
pfn_valid(addr>>PAGE_SHIFT) if
PageReserved(pfn_to_page(addr>>PAGE_SHIFT)) instead.

At best memory.reserved should be calculated automatically by the
page_alloc.c based on the zone_start_pfn/zone_end_pfn and not passed
by the e820 caller, instead of adding the memory_reserve call for type
20 we should delete the memory_reserve function.

Thanks,
Andrea


  reply	other threads:[~2020-11-25 21:38 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23 21:25 compaction: VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)) Qian Cai
2020-04-24  3:43 ` Baoquan He
2020-04-24 13:45   ` Qian Cai
2020-05-05 12:43     ` Baoquan He
2020-05-05 13:20       ` Qian Cai
2020-05-11  1:21         ` Baoquan He
2020-04-26 14:41 ` Mike Rapoport
2020-04-27 13:45   ` Qian Cai
2020-11-21 19:45 ` [PATCH 0/1] VM_BUG_ON_PAGE(!zone_spans_pfn) in set_pfnblock_flags_mask Andrea Arcangeli
2020-11-21 19:45   ` [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Andrea Arcangeli
2020-11-21 19:53     ` Andrea Arcangeli
2020-11-23 11:26       ` David Hildenbrand
2020-11-23 13:01     ` Vlastimil Babka
2020-11-24 13:32       ` Mel Gorman
2020-11-24 20:56         ` Andrea Arcangeli
2020-11-25 10:30           ` Mel Gorman
2020-11-25 17:59             ` Andrea Arcangeli
2020-11-26 10:47               ` Mel Gorman
2020-12-06  2:26                 ` Andrea Arcangeli
2020-12-06 23:47                   ` Mel Gorman
2020-11-25  5:34       ` Andrea Arcangeli
2020-11-25  6:45         ` David Hildenbrand
2020-11-25  8:51           ` Mike Rapoport
2020-11-25 10:39           ` Mel Gorman
2020-11-25 11:04             ` David Hildenbrand
2020-11-25 11:41               ` David Hildenbrand
2020-11-25 18:47                 ` Andrea Arcangeli
2020-11-25 13:33               ` Mel Gorman
2020-11-25 13:41                 ` David Hildenbrand
2020-11-25 18:28           ` Andrea Arcangeli
2020-11-25 19:27             ` David Hildenbrand
2020-11-25 20:41               ` Andrea Arcangeli
2020-11-25 21:13                 ` David Hildenbrand
2020-11-25 21:04               ` Mike Rapoport
2020-11-25 21:38                 ` Andrea Arcangeli [this message]
2020-11-26  9:36                   ` Mike Rapoport
2020-11-26 10:05                     ` David Hildenbrand
2020-11-26 17:46                       ` Mike Rapoport
2020-11-29 12:32                         ` Mike Rapoport
2020-12-02  0:44                           ` Andrea Arcangeli
2020-12-02 17:39                             ` Mike Rapoport
2020-12-03  6:23                               ` Andrea Arcangeli
2020-12-03 10:51                                 ` Mike Rapoport
2020-12-03 17:31                                   ` Andrea Arcangeli
2020-12-06  8:09                                     ` Mike Rapoport
2020-11-26 18:15                       ` Andrea Arcangeli
2020-11-26 18:29                     ` Andrea Arcangeli
2020-11-26 19:44                       ` Mike Rapoport
2020-11-26 20:30                         ` Andrea Arcangeli
2020-11-26 21:03                           ` Mike Rapoport
2020-11-26 19:21                     ` Andrea Arcangeli
2020-11-25 12:08         ` Vlastimil Babka
2020-11-25 13:32           ` David Hildenbrand
2020-11-25 14:13             ` Mike Rapoport
2020-11-25 14:42               ` David Hildenbrand
2020-11-26 10:51                 ` Mel Gorman
2020-11-25 19:14               ` Andrea Arcangeli
2020-11-25 19:01           ` Andrea Arcangeli
2020-11-25 19:33             ` David Hildenbrand
2020-11-26  3:40         ` Andrea Arcangeli
2020-11-26 10:43           ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X77OyM8utmWcq1Di@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhe@redhat.com \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).