Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	<linux-sh@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	<linux-mips@vger.kernel.org>, Max Filippov <jcmvbkbc@gmail.com>,
	"Paul Mackerras" <paulus@samba.org>, <sparclinux@vger.kernel.org>,
	<linux-riscv@lists.infradead.org>, Will Deacon <will@kernel.org>,
	"Stafford Horne" <shorne@gmail.com>, <linux-s390@vger.kernel.org>,
	<linux-c6x-dev@linux-c6x.org>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Michael Ellerman <mpe@ellerman.id.au>, <x86@kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	Mike Rapoport <rppt@linux.ibm.com>,
	<clang-built-linux@googlegroups.com>,
	Ingo Molnar <mingo@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	<uclinux-h8-devel@lists.sourceforge.jp>,
	<linux-xtensa@linux-xtensa.org>, <openrisc@lists.librecores.org>,
	Borislav Petkov <bp@alien8.de>,
	"Andy Lutomirski" <luto@kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	<linux-arm-kernel@lists.infradead.org>,
	Michal Simek <monstr@monstr.eu>, <linux-mm@kvack.org>,
	<linuxppc-dev@lists.ozlabs.org>, <linux-kernel@vger.kernel.org>,
	<iommu@lists.linux-foundation.org>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 04/15] arm64: numa: simplify dummy_numa_init()
Date: Wed, 29 Jul 2020 09:30:31 +0100
Message-ID: <20200729093031.0000316b@Huawei.com> (raw)
In-Reply-To: <20200728051153.1590-5-rppt@kernel.org>

On Tue, 28 Jul 2020 08:11:42 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> dummy_numa_init() loops over memblock.memory and passes nid=0 to
> numa_add_memblk() which essentially wraps memblock_set_node(). However,
> memblock_set_node() can cope with entire memory span itself, so the loop
> over memblock.memory regions is redundant.
> 
> Replace the loop with a single call to memblock_set_node() to the entire
> memory.

Hi Mike,

I had a similar patch I was going to post shortly so can add a bit more
on the advantages of this one.

Beyond cleaning up, it also fixes an issue with a buggy ACPI firmware in which the SRAT
table covers some but not all of the memory in the EFI memory map.  Stealing bits
from the draft cover letter I had for that...

> This issue can be easily triggered by having an SRAT table which fails
> to cover all elements of the EFI memory map.
> 
> This firmware error is detected and a warning printed. e.g.
> "NUMA: Warning: invalid memblk node 64 [mem 0x240000000-0x27fffffff]"
> At that point we fall back to dummy_numa_init().
> 
> However, the failed ACPI init has left us with our memblocks all broken
> up as we split them when trying to assign them to NUMA nodes.
> 
> We then iterate over the memblocks and add them to node 0.
> 
> for_each_memblock(memory, mblk) {
> 	ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
> 	if (!ret)
> 		continue;
> 	pr_err("NUMA init failed\n");
> 	return ret;
> }
> 
> numa_add_memblk() calls memblock_set_node() which merges regions that
> were previously split up during the earlier attempt to add them to different
> nodes during parsing of SRAT.
> 
> This means elements are moved in the memblock array and we can end up
> in a different memblock after the call to numa_add_memblk().
> Result is:
> 
> Unable to handle kernel paging request at virtual address 0000000000003a40
> Mem abort info:
>   ESR = 0x96000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000004
>   CM = 0, WnR = 0
> [0000000000003a40] user address but active_mm is swapper
> Internal error: Oops: 96000004 [#1] PREEMPT SMP
> 
> ...
> 
> Call trace:
>   sparse_init_nid+0x5c/0x2b0
>   sparse_init+0x138/0x170
>   bootmem_init+0x80/0xe0
>   setup_arch+0x2a0/0x5fc
>   start_kernel+0x8c/0x648
> 
> As an illustrative example:
> EFI table has one block of memory.
> memblks[0] = [0...0x2f]  so we start with a single memblock.
> 
> SRAT has
> [0x00...0x0f] in node 0
> [0x10...0x1f] in node 1
> but no entry covering 
> [0x20...0x2f].
> 
> Whilst parsing SRAT the single memblock is broken into 3.
> memblks[0] = [0x00...0x0f] in node 0
> memblks[1] = [0x10...0x1f] in node 1
> memblks[2] = [0x20...0x2f] in node MAX_NUM_NODES (invalid value)
> 
> A sanity check parse then detects the invalid section and acpi_numa_init
> fails.  We then fall back to the dummy path.
> 
> That iterates over the memblocks.  We'll use i an index in the array of memblocks
> 
> i = 0;
> memblks[0] = [0x00...0x0f] set to node0.
>    merge doesn't do anything because the neighbouring memblock is still in node1.
> 
> i = 1
> memblks[1] = [0x10...0x1f] set to node 0.
>    merge combines memblock 0 and 1 to give a new set of memblocks.
> 
> memblks[0] = [0x00..0x1f] in node 0
> memblks[1] = [0x20..0x2f] in node MAX_NUM_NODES.
> 
> i = 2 off the end of the now reduced array of memblocks, so exit the loop.
> (if we restart the loop here everything will be fine).
> 
> Later sparse_init_nid tries to use the node of the second memblock to index
> somethings and boom.


> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>

Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  arch/arm64/mm/numa.c | 13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index aafcee3e3f7e..0cbdbcc885fb 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -423,19 +423,16 @@ static int __init numa_init(int (*init_func)(void))
>   */
>  static int __init dummy_numa_init(void)
>  {
> +	phys_addr_t start = memblock_start_of_DRAM();
> +	phys_addr_t end = memblock_end_of_DRAM();
>  	int ret;
> -	struct memblock_region *mblk;
>  
>  	if (numa_off)
>  		pr_info("NUMA disabled\n"); /* Forced off on command line. */
> -	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> -		memblock_start_of_DRAM(), memblock_end_of_DRAM() - 1);
> -
> -	for_each_memblock(memory, mblk) {
> -		ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
> -		if (!ret)
> -			continue;
> +	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1);
>  
> +	ret = numa_add_memblk(0, start, end);
> +	if (ret) {
>  		pr_err("NUMA init failed\n");
>  		return ret;
>  	}




  reply index

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  5:11 [PATCH 00/15] memblock: seasonal cleaning^w cleanup Mike Rapoport
2020-07-28  5:11 ` [PATCH 01/15] KVM: PPC: Book3S HV: simplify kvm_cma_reserve() Mike Rapoport
2020-07-28  5:11 ` [PATCH 02/15] dma-contiguous: simplify cma_early_percent_memory() Mike Rapoport
2020-07-28  6:37   ` Christoph Hellwig
2020-07-28  5:11 ` [PATCH 03/15] arm, xtensa: simplify initialization of high memory pages Mike Rapoport
2020-07-28  8:09   ` Max Filippov
2020-07-28  5:11 ` [PATCH 04/15] arm64: numa: simplify dummy_numa_init() Mike Rapoport
2020-07-29  8:30   ` Jonathan Cameron [this message]
2020-07-30 12:03   ` Catalin Marinas
2020-07-28  5:11 ` [PATCH 05/15] h8300, nds32, openrisc: simplify detection of memory extents Mike Rapoport
2020-07-29 11:41   ` Stafford Horne
2020-07-28  5:11 ` [PATCH 06/15] powerpc: fadamp: simplify fadump_reserve_crash_area() Mike Rapoport
2020-07-30 12:15   ` Michael Ellerman
2020-08-01 10:18     ` Mike Rapoport
2020-08-01 10:53       ` Hari Bathini
2020-08-02 13:14       ` Michael Ellerman
2020-07-28  5:11 ` [PATCH 07/15] riscv: drop unneeded node initialization Mike Rapoport
2020-07-28  5:11 ` [PATCH 08/15] mircoblaze: drop unneeded NUMA and sparsemem initializations Mike Rapoport
2020-07-28  5:11 ` [PATCH 09/15] memblock: make for_each_memblock_type() iterator private Mike Rapoport
2020-07-30  1:52   ` Baoquan He
2020-07-28  5:11 ` [PATCH 10/15] memblock: make memblock_debug and related functionality private Mike Rapoport
2020-07-30  1:54   ` Baoquan He
2020-07-28  5:11 ` [PATCH 11/15] memblock: reduce number of parameters in for_each_mem_range() Mike Rapoport
2020-07-30  2:22   ` Baoquan He
2020-07-28  5:11 ` [PATCH 12/15] arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() Mike Rapoport
2020-07-28  5:11 ` [PATCH 13/15] arch, drivers: replace for_each_membock() with for_each_mem_range() Mike Rapoport
2020-07-28 15:02   ` Emil Renner Berthing
2020-07-28  5:11 ` [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved Mike Rapoport
2020-07-28 10:44   ` Ingo Molnar
2020-07-28 10:56     ` Mike Rapoport
2020-07-28 11:31       ` Ingo Molnar
2020-07-28 11:02   ` Baoquan He
2020-07-28 14:15     ` Mike Rapoport
2020-07-28 14:23       ` Baoquan He
2020-07-28  5:11 ` [PATCH 15/15] memblock: remove 'type' parameter from for_each_memblock() Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200729093031.0000316b@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=benh@kernel.crashing.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=clang-built-linux@googlegroups.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jcmvbkbc@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-c6x-dev@linux-c6x.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=monstr@monstr.eu \
    --cc=mpe@ellerman.id.au \
    --cc=openrisc@lists.librecores.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=shorne@gmail.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=uclinux-h8-devel@lists.sourceforge.jp \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git