linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: Mike Rapoport <rppt@linux.ibm.com>, mgorman@suse.de, david@redhat.com
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	akpm@linux-foundation.org, cai@lca.pw, mhocko@kernel.org
Subject: Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()
Date: Tue, 26 May 2020 16:45:43 +0800	[thread overview]
Message-ID: <20200526084543.GG26955@MiWiFi-R3L-srv> (raw)
In-Reply-To: <20200522142053.GW1059226@linux.ibm.com>

On 05/22/20 at 05:20pm, Mike Rapoport wrote:
> Hello Baoquan,
> 
> On Fri, May 22, 2020 at 03:25:24PM +0800, Baoquan He wrote:
> > On 05/22/20 at 03:01pm, Baoquan He wrote:
> > > 
> > > So let's add these unavailable ranges into memblock and reserve them
> > > in init_unavailable_range() instead. With this change, they will be added
> > > into appropriate node and zone in memmap_init(), and initialized in
> > > reserve_bootmem_region() just like any other memblock reserved regions.
> > 
> > Seems this is not right. They can't get nid in init_unavailable_range().
> > Adding e820 ranges may let them get nid. But the hole range won't be
> > added to memblock, and still has the issue.
> > 
> > Nack this one for now, still considering.
> 
> Why won't we add  the e820 reserved ranges to memblock.memory during
> early boot as I suggested?
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c5399e80c59c..b0940c618ed9 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1301,8 +1301,11 @@ void __init e820__memblock_setup(void)
>  		if (end != (resource_size_t)end)
>  			continue;
>  
> -		if (entry->type == E820_TYPE_SOFT_RESERVED)
> +		if (entry->type == E820_TYPE_SOFT_RESERVED ||
> +		    entry->type == E820_TYPE_RESERVED) {
> +			memblock_add(entry->addr, entry->size);
>  			memblock_reserve(entry->addr, entry->size);
> +		}
>  
>  		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
>  			continue;
> 
> The setting of node later  in numa_init() will assign the proper node
> for these regions as it does for the usable memory.

Yes, if it's only related to e820 reserved region, this truly works.

However, it also has ACPI table regions. That's why I changed to call
the problematic area as firmware reserved ranges later.

Bisides, you can see below line, there's another reserved region which only
occupies one page in one memory seciton. If adding to memblock.memory, we also
will build struct mem_section and the relevant struct pages for the whole
section. And then the holes around that page will be added and initialized in
init_unavailable_mem(). numa_init() will assign proper node for memblock.memory
and memblock.reserved, but won't assign proper node for the holes.

~~~
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved
~~~

So I still think we should not add firmware reserved range into
memblock for fixing this issue.

And, the fix in the original patch seems necessary. You can see in
compaction code, the migration source is chosen from LRU pages or
movable pages, the migration target has to be got from Buddy. However,
only the min_pfn in fast_isolate_freepages(), it's calculated by
distance between cc->free_pfn - cc->migrate_pfn, we can't guarantee it's
safe, then use it as the target to handle.

Hi David,

Meanwhile, I checked the history of init_unavailable_mem(). Till below
commit From David, the unavailable ranges began to be added to zone 0
and node 0. Before that, we only zero the struct page of unavailable
ranges and mark it as Reserved. Am wondering if we have to add it to
node 0 and zone 0. From below commit, I don't get why. Could you help
clarify so that I get what I missed?

commit 4b094b7851bf4bf551ad456195d3f26e1c03bd74
Author: David Hildenbrand <david@redhat.com>
Date:   Mon Feb 3 17:33:55 2020 -0800

    mm/page_alloc.c: initialize memmap of unavailable memory directly

I have sent mail to Naoya trying to make clear question I have,
see if we can do something to clean up the mess here.


[    0.000000] BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000008bfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000008c000-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000028328fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000028329000-0x0000000028568fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000028569000-0x0000000028d85fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000028d86000-0x0000000028ee5fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000028ee6000-0x0000000029a04fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000029a05000-0x0000000029a08fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000029a09000-0x0000000029ee4fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000029ee5000-0x0000000029ef2fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000029ef3000-0x0000000029f22fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000029f23000-0x0000000029f23fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000029f24000-0x0000000029f24fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000029f25000-0x0000000029f28fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000029f29000-0x0000000029f29fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000029f2a000-0x0000000029f2bfff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000029f2c000-0x0000000029f2cfff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000029f2d000-0x0000000029f2ffff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000029f30000-0x0000000029ffdfff] usable
[    0.000000] BIOS-e820: [mem 0x0000000029ffe000-0x000000002a80afff] reserved
[    0.000000] BIOS-e820: [mem 0x000000002a80b000-0x000000002a80efff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000002a80f000-0x000000002a81ffff] reserved
[    0.000000] BIOS-e820: [mem 0x000000002a820000-0x000000002a822fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x000000002a823000-0x0000000033a22fff] usable
[    0.000000] BIOS-e820: [mem 0x0000000033a23000-0x0000000033a32fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000033a33000-0x0000000033a42fff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000033a43000-0x0000000033a52fff] ACPI data
[    0.000000] BIOS-e820: [mem 0x0000000033a53000-0x0000000077ffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000078000000-0x000000008fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fed80000-0x00000000fed80fff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000087effffff] usable
[    0.000000] BIOS-e820: [mem 0x000000087f000000-0x000000087fffffff] reserved


  reply	other threads:[~2020-05-26  8:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21  1:44 [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() Baoquan He
2020-05-21  9:26 ` Mike Rapoport
2020-05-21 15:52   ` Baoquan He
2020-05-21 17:18     ` Mike Rapoport
2020-05-22  7:01       ` Baoquan He
2020-05-22  7:25         ` Baoquan He
2020-05-22 14:20           ` Mike Rapoport
2020-05-26  8:45             ` Baoquan He [this message]
2020-05-26  8:55               ` David Hildenbrand
2020-05-26 11:32               ` Mike Rapoport
2020-05-26 11:49                 ` David Hildenbrand
2020-05-28  9:07                   ` Baoquan He
2020-05-28  9:08                     ` David Hildenbrand
2020-05-28 15:15                     ` Steve Wahl
2020-06-01 11:42                       ` Mike Rapoport
2020-06-01 13:31                         ` Baoquan He
2020-05-21  9:36 ` Mel Gorman
2020-05-21 13:38   ` Mike Rapoport
2020-05-21 15:41   ` Baoquan He
2020-05-28  8:59 [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() ^[ Baoquan He
2020-05-28  9:08 ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200526084543.GG26955@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).