All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
@ 2020-12-17 20:12 Roman Gushchin
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
  2020-12-20  6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport
  0 siblings, 2 replies; 49+ messages in thread
From: Roman Gushchin @ 2020-12-17 20:12 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel,
	kernel-team, Roman Gushchin

Currently cma areas without a fixed base are allocated close to the
end of the node. This placement is sub-optimal because of compaction:
it brings pages into the cma area. In particular, it can bring in hot
executable pages, even if there is a plenty of free memory on the
machine. This results in cma allocation failures.

Instead let's place cma areas close to the beginning of a node.
In this case the compaction will help to free cma areas, resulting
in better cma allocation success rates.

If there is enough memory let's try to allocate bottom-up starting
with 4GB to exclude any possible interference with DMA32. On smaller
machines or in a case of a failure, stick with the old behavior.

16GB vm, 2GB cma area:
With this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
[    0.002931] hugetlb_cma: reserved 2048 MiB on node 0

Without this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
[    0.002934] hugetlb_cma: reserved 2048 MiB on node 0

v2:
  - switched to memblock_set_bottom_up(true), by Mike
  - start with 4GB, by Mike

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/cma.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/mm/cma.c b/mm/cma.c
index 7f415d7cda9f..21fd40c092f0 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 			limit = highmem_start;
 		}
 
+		/*
+		 * If there is enough memory, try a bottom-up allocation first.
+		 * It will place the new cma area close to the start of the node
+		 * and guarantee that the compaction is moving pages out of the
+		 * cma area and not into it.
+		 * Avoid using first 4GB to not interfere with constrained zones
+		 * like DMA/DMA32.
+		 */
+		if (!memblock_bottom_up() &&
+		    memblock_end >= SZ_4G + size) {
+			memblock_set_bottom_up(true);
+			addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
+							limit, nid, true);
+			memblock_set_bottom_up(false);
+		}
+
 		if (!addr) {
 			addr = memblock_alloc_range_nid(size, alignment, base,
 					limit, nid, true);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin
@ 2020-12-17 20:12 ` Roman Gushchin
  2020-12-19 14:52     ` Wonhyuk Yang
                     ` (3 more replies)
  2020-12-20  6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport
  1 sibling, 4 replies; 49+ messages in thread
From: Roman Gushchin @ 2020-12-17 20:12 UTC (permalink / raw)
  To: Andrew Morton, Mike Rapoport, linux-mm
  Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel,
	kernel-team, Roman Gushchin

With kaslr the kernel image is placed at a random place, so starting
the bottom-up allocation with the kernel_end can result in an
allocation failure and a warning like this one:

[    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002921] ------------[ cut here ]------------
[    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
[    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
[    0.002937] Modules linked in:
[    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
[    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
[    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
[    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
[    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
[    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
[    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
[    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
[    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
[    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
[    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
[    0.002956] Call Trace:
[    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
[    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
[    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
[    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
[    0.002969]  ? native_set_fixmap+0x82/0xd0
[    0.002971]  ? flat_get_apic_id+0x5/0x10
[    0.002973]  ? register_lapic_address+0x8e/0x97
[    0.002975]  ? setup_arch+0x8a5/0xc3f
[    0.002978]  ? start_kernel+0x66/0x547
[    0.002980]  ? load_ucode_bsp+0x4c/0xcd
[    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
[    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
[    0.002988] ---[ end trace f151227d0b39be70 ]---

At the same time, the kernel image is protected with memblock_reserve(),
so we can just start searching at PAGE_SIZE. In this case the
bottom-up allocation has the same chances to success as a top-down
allocation, so there is no reason to fallback in the case of a
failure. All together it simplifies the logic.

Signed-off-by: Roman Gushchin <guro@fb.com>
---
 mm/memblock.c | 49 ++++++-------------------------------------------
 1 file changed, 6 insertions(+), 43 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index b68ee86788af..10bd7d1ef0f4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
  *
  * Find @size free area aligned to @align in the specified range and node.
  *
- * When allocation direction is bottom-up, the @start should be greater
- * than the end of the kernel image. Otherwise, it will be trimmed. The
- * reason is that we want the bottom-up allocation just near the kernel
- * image so it is highly likely that the allocated memory and the kernel
- * will reside in the same node.
- *
- * If bottom-up allocation failed, will try to allocate memory top-down.
- *
  * Return:
  * Found address on success, 0 on failure.
  */
@@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 					phys_addr_t end, int nid,
 					enum memblock_flags flags)
 {
-	phys_addr_t kernel_end, ret;
-
 	/* pump up @end */
 	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
 	    end == MEMBLOCK_ALLOC_KASAN)
@@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
 	/* avoid allocating the first page */
 	start = max_t(phys_addr_t, start, PAGE_SIZE);
 	end = max(start, end);
-	kernel_end = __pa_symbol(_end);
-
-	/*
-	 * try bottom-up allocation only when bottom-up mode
-	 * is set and @end is above the kernel image.
-	 */
-	if (memblock_bottom_up() && end > kernel_end) {
-		phys_addr_t bottom_up_start;
-
-		/* make sure we will allocate above the kernel */
-		bottom_up_start = max(start, kernel_end);
 
-		/* ok, try bottom-up allocation first */
-		ret = __memblock_find_range_bottom_up(bottom_up_start, end,
-						      size, align, nid, flags);
-		if (ret)
-			return ret;
-
-		/*
-		 * we always limit bottom-up allocation above the kernel,
-		 * but top-down allocation doesn't have the limit, so
-		 * retrying top-down allocation may succeed when bottom-up
-		 * allocation failed.
-		 *
-		 * bottom-up allocation is expected to be fail very rarely,
-		 * so we use WARN_ONCE() here to see the stack trace if
-		 * fail happens.
-		 */
-		WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
-			  "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
-	}
-
-	return __memblock_find_range_top_down(start, end, size, align, nid,
-					      flags);
+	if (memblock_bottom_up())
+		return __memblock_find_range_bottom_up(start, end, size, align,
+						       nid, flags);
+	else
+		return __memblock_find_range_top_down(start, end, size, align,
+						      nid, flags);
 }
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
@ 2020-12-19 14:52     ` Wonhyuk Yang
  2020-12-20  6:49   ` Mike Rapoport
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Wonhyuk Yang @ 2020-12-19 14:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Mike Rapoport, linux-mm, Joonsoo Kim,
	Rik van Riel, Michal Hocko, linux-kernel, kernel-team

Hi Roman,

On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
>
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
>
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.

I figure out that it was introduced by
commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode")

According to this commit, The purpose of bottom up allocation is to
allocate memory from the unhotpluggable node.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2020-12-19 14:52     ` Wonhyuk Yang
  0 siblings, 0 replies; 49+ messages in thread
From: Wonhyuk Yang @ 2020-12-19 14:52 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, Mike Rapoport, linux-mm, Joonsoo Kim,
	Rik van Riel, Michal Hocko, linux-kernel, kernel-team

Hi Roman,

On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote:
>
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
>
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
>
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.

I figure out that it was introduced by
commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode")

According to this commit, The purpose of bottom up allocation is to
allocate memory from the unhotpluggable node.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-19 14:52     ` Wonhyuk Yang
  (?)
@ 2020-12-19 17:05     ` Roman Gushchin
  -1 siblings, 0 replies; 49+ messages in thread
From: Roman Gushchin @ 2020-12-19 17:05 UTC (permalink / raw)
  To: Wonhyuk Yang
  Cc: Andrew Morton, Mike Rapoport, linux-mm, Joonsoo Kim,
	Rik van Riel, Michal Hocko, linux-kernel, kernel-team

On Sat, Dec 19, 2020 at 11:52:19PM +0900, Wonhyuk Yang wrote:
> Hi Roman,
> 
> On Fri, Dec 18, 2020 at 5:12 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > With kaslr the kernel image is placed at a random place, so starting
> > the bottom-up allocation with the kernel_end can result in an
> > allocation failure and a warning like this one:
> >
> > [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002921] ------------[ cut here ]------------
> > [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> > [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> > [    0.002956] Call Trace:
> > [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> > [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> > [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> > [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> > [    0.002969]  ? native_set_fixmap+0x82/0xd0
> > [    0.002971]  ? flat_get_apic_id+0x5/0x10
> > [    0.002973]  ? register_lapic_address+0x8e/0x97
> > [    0.002975]  ? setup_arch+0x8a5/0xc3f
> > [    0.002978]  ? start_kernel+0x66/0x547
> > [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> > [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> > [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> >
> > At the same time, the kernel image is protected with memblock_reserve(),
> > so we can just start searching at PAGE_SIZE. In this case the
> > bottom-up allocation has the same chances to success as a top-down
> > allocation, so there is no reason to fallback in the case of a
> > failure. All together it simplifies the logic.
> 
> I figure out that it was introduced by
> commit 79442ed189acb ("memblock.c: introduce bottom-up allocation mode")
> 
> According to this commit, The purpose of bottom up allocation is to
> allocate memory from the unhotpluggable node.

Hi Wonhyuk,

correct! And it remains this way, we just don't need to skip
all the memory before the kernel_end.

Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
@ 2020-12-20  6:48 ` Mike Rapoport
  2020-12-21 17:05   ` Roman Gushchin
  1 sibling, 1 reply; 49+ messages in thread
From: Mike Rapoport @ 2020-12-20  6:48 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Thu, Dec 17, 2020 at 12:12:13PM -0800, Roman Gushchin wrote:
> Currently cma areas without a fixed base are allocated close to the
> end of the node. This placement is sub-optimal because of compaction:
> it brings pages into the cma area. In particular, it can bring in hot
> executable pages, even if there is a plenty of free memory on the
> machine. This results in cma allocation failures.
> 
> Instead let's place cma areas close to the beginning of a node.
> In this case the compaction will help to free cma areas, resulting
> in better cma allocation success rates.
> 
> If there is enough memory let's try to allocate bottom-up starting
> with 4GB to exclude any possible interference with DMA32. On smaller
> machines or in a case of a failure, stick with the old behavior.
> 
> 16GB vm, 2GB cma area:
> With this patch:
> [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
> [    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
> [    0.002931] hugetlb_cma: reserved 2048 MiB on node 0
> 
> Without this patch:
> [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
> [    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
> [    0.002934] hugetlb_cma: reserved 2048 MiB on node 0
> 
> v2:
>   - switched to memblock_set_bottom_up(true), by Mike
>   - start with 4GB, by Mike
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

With one nit below 

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/cma.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 7f415d7cda9f..21fd40c092f0 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
>  			limit = highmem_start;
>  		}
>  
> +		/*
> +		 * If there is enough memory, try a bottom-up allocation first.
> +		 * It will place the new cma area close to the start of the node
> +		 * and guarantee that the compaction is moving pages out of the
> +		 * cma area and not into it.
> +		 * Avoid using first 4GB to not interfere with constrained zones
> +		 * like DMA/DMA32.
> +		 */
> +		if (!memblock_bottom_up() &&
> +		    memblock_end >= SZ_4G + size) {

This seems short enough to fit a single line

> +			memblock_set_bottom_up(true);
> +			addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
> +							limit, nid, true);
> +			memblock_set_bottom_up(false);
> +		}
> +
>  		if (!addr) {
>  			addr = memblock_alloc_range_nid(size, alignment, base,
>  					limit, nid, true);
> -- 
> 2.26.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
  2020-12-19 14:52     ` Wonhyuk Yang
@ 2020-12-20  6:49   ` Mike Rapoport
  2021-01-22  4:37       ` Thiago Jung Bauermann
  2021-02-28  4:18   ` Florian Fainelli
  2021-03-23 18:19   ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport
  3 siblings, 1 reply; 49+ messages in thread
From: Mike Rapoport @ 2020-12-20  6:49 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Thu, Dec 17, 2020 at 12:12:14PM -0800, Roman Gushchin wrote:
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
> 
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002937] Modules linked in:
> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> [    0.002988] ---[ end trace f151227d0b39be70 ]---
> 
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  mm/memblock.c | 49 ++++++-------------------------------------------
>  1 file changed, 6 insertions(+), 43 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index b68ee86788af..10bd7d1ef0f4 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -275,14 +275,6 @@ __memblock_find_range_top_down(phys_addr_t start, phys_addr_t end,
>   *
>   * Find @size free area aligned to @align in the specified range and node.
>   *
> - * When allocation direction is bottom-up, the @start should be greater
> - * than the end of the kernel image. Otherwise, it will be trimmed. The
> - * reason is that we want the bottom-up allocation just near the kernel
> - * image so it is highly likely that the allocated memory and the kernel
> - * will reside in the same node.
> - *
> - * If bottom-up allocation failed, will try to allocate memory top-down.
> - *
>   * Return:
>   * Found address on success, 0 on failure.
>   */
> @@ -291,8 +283,6 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>  					phys_addr_t end, int nid,
>  					enum memblock_flags flags)
>  {
> -	phys_addr_t kernel_end, ret;
> -
>  	/* pump up @end */
>  	if (end == MEMBLOCK_ALLOC_ACCESSIBLE ||
>  	    end == MEMBLOCK_ALLOC_KASAN)
> @@ -301,40 +291,13 @@ static phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t size,
>  	/* avoid allocating the first page */
>  	start = max_t(phys_addr_t, start, PAGE_SIZE);
>  	end = max(start, end);
> -	kernel_end = __pa_symbol(_end);
> -
> -	/*
> -	 * try bottom-up allocation only when bottom-up mode
> -	 * is set and @end is above the kernel image.
> -	 */
> -	if (memblock_bottom_up() && end > kernel_end) {
> -		phys_addr_t bottom_up_start;
> -
> -		/* make sure we will allocate above the kernel */
> -		bottom_up_start = max(start, kernel_end);
>  
> -		/* ok, try bottom-up allocation first */
> -		ret = __memblock_find_range_bottom_up(bottom_up_start, end,
> -						      size, align, nid, flags);
> -		if (ret)
> -			return ret;
> -
> -		/*
> -		 * we always limit bottom-up allocation above the kernel,
> -		 * but top-down allocation doesn't have the limit, so
> -		 * retrying top-down allocation may succeed when bottom-up
> -		 * allocation failed.
> -		 *
> -		 * bottom-up allocation is expected to be fail very rarely,
> -		 * so we use WARN_ONCE() here to see the stack trace if
> -		 * fail happens.
> -		 */
> -		WARN_ONCE(IS_ENABLED(CONFIG_MEMORY_HOTREMOVE),
> -			  "memblock: bottom-up allocation failed, memory hotremove may be affected\n");
> -	}
> -
> -	return __memblock_find_range_top_down(start, end, size, align, nid,
> -					      flags);
> +	if (memblock_bottom_up())
> +		return __memblock_find_range_bottom_up(start, end, size, align,
> +						       nid, flags);
> +	else
> +		return __memblock_find_range_top_down(start, end, size, align,
> +						      nid, flags);
>  }
>  
>  /**
> -- 
> 2.26.2
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-20  6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport
@ 2020-12-21 17:05   ` Roman Gushchin
  2020-12-23  4:06     ` Andrew Morton
  0 siblings, 1 reply; 49+ messages in thread
From: Roman Gushchin @ 2020-12-21 17:05 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Sun, Dec 20, 2020 at 08:48:48AM +0200, Mike Rapoport wrote:
> On Thu, Dec 17, 2020 at 12:12:13PM -0800, Roman Gushchin wrote:
> > Currently cma areas without a fixed base are allocated close to the
> > end of the node. This placement is sub-optimal because of compaction:
> > it brings pages into the cma area. In particular, it can bring in hot
> > executable pages, even if there is a plenty of free memory on the
> > machine. This results in cma allocation failures.
> > 
> > Instead let's place cma areas close to the beginning of a node.
> > In this case the compaction will help to free cma areas, resulting
> > in better cma allocation success rates.
> > 
> > If there is enough memory let's try to allocate bottom-up starting
> > with 4GB to exclude any possible interference with DMA32. On smaller
> > machines or in a case of a failure, stick with the old behavior.
> > 
> > 16GB vm, 2GB cma area:
> > With this patch:
> > [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
> > [    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
> > [    0.002931] hugetlb_cma: reserved 2048 MiB on node 0
> > 
> > Without this patch:
> > [    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
> > [    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
> > [    0.002934] hugetlb_cma: reserved 2048 MiB on node 0
> > 
> > v2:
> >   - switched to memblock_set_bottom_up(true), by Mike
> >   - start with 4GB, by Mike
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> With one nit below 
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> 
> > ---
> >  mm/cma.c | 16 ++++++++++++++++
> >  1 file changed, 16 insertions(+)
> > 
> > diff --git a/mm/cma.c b/mm/cma.c
> > index 7f415d7cda9f..21fd40c092f0 100644
> > --- a/mm/cma.c
> > +++ b/mm/cma.c
> > @@ -337,6 +337,22 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
> >  			limit = highmem_start;
> >  		}
> >  
> > +		/*
> > +		 * If there is enough memory, try a bottom-up allocation first.
> > +		 * It will place the new cma area close to the start of the node
> > +		 * and guarantee that the compaction is moving pages out of the
> > +		 * cma area and not into it.
> > +		 * Avoid using first 4GB to not interfere with constrained zones
> > +		 * like DMA/DMA32.
> > +		 */
> > +		if (!memblock_bottom_up() &&
> > +		    memblock_end >= SZ_4G + size) {
>

Hi Mike!

> This seems short enough to fit a single line

Indeed. An updated version below.

Thank you for the review of the series!

I assume it's simpler to route both patches through the mm tree.
What do you think?

Thanks!

--

From f88bd0a425c7181bd26a4cf900e6924a7b521419 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro@fb.com>
Date: Mon, 14 Dec 2020 20:20:52 -0800
Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up

Currently cma areas without a fixed base are allocated close to the
end of the node. This placement is sub-optimal because of compaction:
it brings pages into the cma area. In particular, it can bring in hot
executable pages, even if there is a plenty of free memory on the
machine. This results in cma allocation failures.

Instead let's place cma areas close to the beginning of a node.
In this case the compaction will help to free cma areas, resulting
in better cma allocation success rates.

If there is enough memory let's try to allocate bottom-up starting
with 4GB to exclude any possible interference with DMA32. On smaller
machines or in a case of a failure, stick with the old behavior.

16GB vm, 2GB cma area:
With this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002928] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002930] cma: Reserved 2048 MiB at 0x0000000100000000
[    0.002931] hugetlb_cma: reserved 2048 MiB on node 0

Without this patch:
[    0.000000] Command line: root=/dev/vda3 rootflags=subvol=/root systemd.unified_cgroup_hierarchy=1 enforcing=0 console=ttyS0,115200 hugetlb_cma=2G
[    0.002930] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
[    0.002933] cma: Reserved 2048 MiB at 0x00000003c0000000
[    0.002934] hugetlb_cma: reserved 2048 MiB on node 0

v3:
  - code alignment fix, by Mike
v2:
  - switched to memblock_set_bottom_up(true), by Mike
  - start with 4GB, by Mike

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
---
 mm/cma.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/mm/cma.c b/mm/cma.c
index 20c4f6f40037..4fe74c9d83b0 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -336,6 +336,21 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
 			limit = highmem_start;
 		}
 
+		/*
+		 * If there is enough memory, try a bottom-up allocation first.
+		 * It will place the new cma area close to the start of the node
+		 * and guarantee that the compaction is moving pages out of the
+		 * cma area and not into it.
+		 * Avoid using first 4GB to not interfere with constrained zones
+		 * like DMA/DMA32.
+		 */
+		if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
+			memblock_set_bottom_up(true);
+			addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
+							limit, nid, true);
+			memblock_set_bottom_up(false);
+		}
+
 		if (!addr) {
 			addr = memblock_alloc_range_nid(size, alignment, base,
 					limit, nid, true);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-21 17:05   ` Roman Gushchin
@ 2020-12-23  4:06     ` Andrew Morton
  2020-12-23 16:35       ` Roman Gushchin
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Morton @ 2020-12-23  4:06 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote:

> Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up

i386 allmodconfig:

In file included from ./include/vdso/const.h:5,
                 from ./include/linux/const.h:4,
                 from ./include/linux/bits.h:5,
                 from ./include/linux/bitops.h:6,
                 from ./include/linux/kernel.h:11,
                 from ./include/asm-generic/bug.h:20,
                 from ./arch/x86/include/asm/bug.h:93,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/mmdebug.h:5,
                 from ./include/linux/mm.h:9,
                 from ./include/linux/memblock.h:13,
                 from mm/cma.c:24:
mm/cma.c: In function ‘cma_declare_contiguous_nid’:
./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow]
 #define __AC(X,Y) (X##Y)
                   ^~~~~~
./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’
 #define _AC(X,Y) __AC(X,Y)
                  ^~~~
./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’
 #define SZ_4G    _AC(0x100000000, ULL)
                  ^~~
mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’
    addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
                                                     ^~~~~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-23  4:06     ` Andrew Morton
@ 2020-12-23 16:35       ` Roman Gushchin
  2020-12-23 22:10         ` Mike Rapoport
  0 siblings, 1 reply; 49+ messages in thread
From: Roman Gushchin @ 2020-12-23 16:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Rapoport, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote:
> On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote:
> 
> > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up
> 
> i386 allmodconfig:
> 
> In file included from ./include/vdso/const.h:5,
>                  from ./include/linux/const.h:4,
>                  from ./include/linux/bits.h:5,
>                  from ./include/linux/bitops.h:6,
>                  from ./include/linux/kernel.h:11,
>                  from ./include/asm-generic/bug.h:20,
>                  from ./arch/x86/include/asm/bug.h:93,
>                  from ./include/linux/bug.h:5,
>                  from ./include/linux/mmdebug.h:5,
>                  from ./include/linux/mm.h:9,
>                  from ./include/linux/memblock.h:13,
>                  from mm/cma.c:24:
> mm/cma.c: In function ‘cma_declare_contiguous_nid’:
> ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow]
>  #define __AC(X,Y) (X##Y)
>                    ^~~~~~
> ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’
>  #define _AC(X,Y) __AC(X,Y)
>                   ^~~~
> ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’
>  #define SZ_4G    _AC(0x100000000, ULL)
>                   ^~~
> mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’
>     addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
>                                                      ^~~~~
> 

I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size)
can't be true on a 32-bit platform, so the whole if clause can be compiled out.
Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0...

I have no better idea than wrapping everything into
#if BITS_PER_LONG > 32
#endif.

Thanks!

--

diff --git a/mm/cma.c b/mm/cma.c
index 4fe74c9d83b0..5d69b498603a 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
                 * Avoid using first 4GB to not interfere with constrained zones
                 * like DMA/DMA32.
                 */
+#if BITS_PER_LONG > 32
                if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
                        memblock_set_bottom_up(true);
                        addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
                                                        limit, nid, true);
                        memblock_set_bottom_up(false);
                }
+#endif
 
                if (!addr) {
                        addr = memblock_alloc_range_nid(size, alignment, base,

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-23 16:35       ` Roman Gushchin
@ 2020-12-23 22:10         ` Mike Rapoport
  2020-12-28 19:36           ` Roman Gushchin
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Rapoport @ 2020-12-23 22:10 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Wed, Dec 23, 2020 at 08:35:37AM -0800, Roman Gushchin wrote:
> On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote:
> > On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote:
> > 
> > > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up
> > 
> > i386 allmodconfig:
> > 
> > In file included from ./include/vdso/const.h:5,
> >                  from ./include/linux/const.h:4,
> >                  from ./include/linux/bits.h:5,
> >                  from ./include/linux/bitops.h:6,
> >                  from ./include/linux/kernel.h:11,
> >                  from ./include/asm-generic/bug.h:20,
> >                  from ./arch/x86/include/asm/bug.h:93,
> >                  from ./include/linux/bug.h:5,
> >                  from ./include/linux/mmdebug.h:5,
> >                  from ./include/linux/mm.h:9,
> >                  from ./include/linux/memblock.h:13,
> >                  from mm/cma.c:24:
> > mm/cma.c: In function ‘cma_declare_contiguous_nid’:
> > ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow]
> >  #define __AC(X,Y) (X##Y)
> >                    ^~~~~~
> > ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’
> >  #define _AC(X,Y) __AC(X,Y)
> >                   ^~~~
> > ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’
> >  #define SZ_4G    _AC(0x100000000, ULL)
> >                   ^~~
> > mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’
> >     addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
> >                                                      ^~~~~
> > 
> 
> I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size)
> can't be true on a 32-bit platform, so the whole if clause can be compiled out.
> Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0...
> 
> I have no better idea than wrapping everything into
> #if BITS_PER_LONG > 32
> #endif.

32-bit systems can have more than 32 bit in the physical address.
I think a better option would be to use CONFIG_PHYS_ADDR_T_64BIT
 
> Thanks!
> 
> --
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 4fe74c9d83b0..5d69b498603a 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
>                  * Avoid using first 4GB to not interfere with constrained zones
>                  * like DMA/DMA32.
>                  */
> +#if BITS_PER_LONG > 32
>                 if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
>                         memblock_set_bottom_up(true);
>                         addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
>                                                         limit, nid, true);
>                         memblock_set_bottom_up(false);
>                 }
> +#endif
>  
>                 if (!addr) {
>                         addr = memblock_alloc_range_nid(size, alignment, base,

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
  2020-12-23 22:10         ` Mike Rapoport
@ 2020-12-28 19:36           ` Roman Gushchin
  0 siblings, 0 replies; 49+ messages in thread
From: Roman Gushchin @ 2020-12-28 19:36 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, linux-mm, Joonsoo Kim, Rik van Riel, Michal Hocko,
	linux-kernel, kernel-team

On Thu, Dec 24, 2020 at 12:10:39AM +0200, Mike Rapoport wrote:
> On Wed, Dec 23, 2020 at 08:35:37AM -0800, Roman Gushchin wrote:
> > On Tue, Dec 22, 2020 at 08:06:06PM -0800, Andrew Morton wrote:
> > > On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin <guro@fb.com> wrote:
> > > 
> > > > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up
> > > 
> > > i386 allmodconfig:
> > > 
> > > In file included from ./include/vdso/const.h:5,
> > >                  from ./include/linux/const.h:4,
> > >                  from ./include/linux/bits.h:5,
> > >                  from ./include/linux/bitops.h:6,
> > >                  from ./include/linux/kernel.h:11,
> > >                  from ./include/asm-generic/bug.h:20,
> > >                  from ./arch/x86/include/asm/bug.h:93,
> > >                  from ./include/linux/bug.h:5,
> > >                  from ./include/linux/mmdebug.h:5,
> > >                  from ./include/linux/mm.h:9,
> > >                  from ./include/linux/memblock.h:13,
> > >                  from mm/cma.c:24:
> > > mm/cma.c: In function ‘cma_declare_contiguous_nid’:
> > > ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow]
> > >  #define __AC(X,Y) (X##Y)
> > >                    ^~~~~~
> > > ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’
> > >  #define _AC(X,Y) __AC(X,Y)
> > >                   ^~~~
> > > ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’
> > >  #define SZ_4G    _AC(0x100000000, ULL)
> > >                   ^~~
> > > mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’
> > >     addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
> > >                                                      ^~~~~
> > > 
> > 
> > I thought that (!memblock_bottom_up() && memblock_end >= SZ_4G + size)
> > can't be true on a 32-bit platform, so the whole if clause can be compiled out.
> > Maybe it's because memblock_end can be equal to SZ_4G and if the size == 0...
> > 
> > I have no better idea than wrapping everything into
> > #if BITS_PER_LONG > 32
> > #endif.
> 
> 32-bit systems can have more than 32 bit in the physical address.
> I think a better option would be to use CONFIG_PHYS_ADDR_T_64BIT

I agree. An updated fixup below.

Andrew, can you, please, replace the previous fixup with this one?

Thanks!

--

diff --git a/mm/cma.c b/mm/cma.c
index 4fe74c9d83b0..0ba69cd16aeb 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -344,12 +344,14 @@ int __init cma_declare_contiguous_nid(phys_addr_t base,
                 * Avoid using first 4GB to not interfere with constrained zones
                 * like DMA/DMA32.
                 */
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
                if (!memblock_bottom_up() && memblock_end >= SZ_4G + size) {
                        memblock_set_bottom_up(true);
                        addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
                                                        limit, nid, true);
                        memblock_set_bottom_up(false);
                }
+#endif
 
                if (!addr) {
                        addr = memblock_alloc_range_nid(size, alignment, base,

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-20  6:49   ` Mike Rapoport
@ 2021-01-22  4:37       ` Thiago Jung Bauermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-01-22  4:37 UTC (permalink / raw)
  To: rppt
  Cc: akpm, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk,
	Satheesh Rajendran, kernel-team, linux-kernel, linux-mm,
	linuxppc-dev, mhocko, riel, Thiago Jung Bauermann

Mike Rapoport <rppt@kernel.org> writes:

> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
patch. This happens on some ppc64le bare metal (powernv) server machines with
CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
to solve this issue in a different way:

https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/

Since this patch solves that problem, is it possible to include it in the next
feasible v5.11-rcX, with the following tag?

Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

This is because reverting the commit above also solves the problem on the
machines where I've seen this issue.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-01-22  4:37       ` Thiago Jung Bauermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-01-22  4:37 UTC (permalink / raw)
  To: rppt
  Cc: riel, kernel-team, Ram Pai, linux-kernel, guro, linux-mm,
	Satheesh Rajendran, Konrad Rzeszutek Wilk, iamjoonsoo.kim,
	mhocko, linuxppc-dev, akpm, Thiago Jung Bauermann

Mike Rapoport <rppt@kernel.org> writes:

> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
patch. This happens on some ppc64le bare metal (powernv) server machines with
CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
to solve this issue in a different way:

https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/

Since this patch solves that problem, is it possible to include it in the next
feasible v5.11-rcX, with the following tag?

Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

This is because reverting the commit above also solves the problem on the
machines where I've seen this issue.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-01-22  4:37       ` Thiago Jung Bauermann
@ 2021-01-24  2:09         ` Andrew Morton
  -1 siblings, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2021-01-24  2:09 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: rppt, guro, iamjoonsoo.kim, Ram Pai, Konrad Rzeszutek Wilk,
	Satheesh Rajendran, kernel-team, linux-kernel, linux-mm,
	linuxppc-dev, mhocko, riel

On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:

> Mike Rapoport <rppt@kernel.org> writes:
> 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > 
> > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> 
> I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> patch. This happens on some ppc64le bare metal (powernv) server machines with
> CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> to solve this issue in a different way:
> 
> https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> 
> Since this patch solves that problem, is it possible to include it in the next
> feasible v5.11-rcX, with the following tag?

We could do this, if we're confident that this patch doesn't depend on
[1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

> Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

I added that.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-01-24  2:09         ` Andrew Morton
  0 siblings, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2021-01-24  2:09 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: riel, kernel-team, Ram Pai, linux-kernel, mhocko, linux-mm,
	Satheesh Rajendran, Konrad Rzeszutek Wilk, iamjoonsoo.kim,
	linuxppc-dev, guro, rppt

On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:

> Mike Rapoport <rppt@kernel.org> writes:
> 
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > 
> > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> 
> I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> patch. This happens on some ppc64le bare metal (powernv) server machines with
> CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> to solve this issue in a different way:
> 
> https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> 
> Since this patch solves that problem, is it possible to include it in the next
> feasible v5.11-rcX, with the following tag?

We could do this, if we're confident that this patch doesn't depend on
[1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

> Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")

I added that.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-01-24  2:09         ` Andrew Morton
@ 2021-01-24  7:34           ` Mike Rapoport
  -1 siblings, 0 replies; 49+ messages in thread
From: Mike Rapoport @ 2021-01-24  7:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Thiago Jung Bauermann, guro, iamjoonsoo.kim, Ram Pai,
	Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team,
	linux-kernel, linux-mm, linuxppc-dev, mhocko, riel

On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
> 
> > Mike Rapoport <rppt@kernel.org> writes:
> > 
> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > 
> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> > patch. This happens on some ppc64le bare metal (powernv) server machines with
> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> > to solve this issue in a different way:
> > 
> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> > 
> > Since this patch solves that problem, is it possible to include it in the next
> > feasible v5.11-rcX, with the following tag?
> 
> We could do this, if we're confident that this patch doesn't depend on
> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

A think it does not depend on cma bottom-up allocation, it's rather the other
way around: without this CMA bottom-up allocation could fail with KASLR
enabled.

Still, this patch may need updates to the way x86 does early reservations:

https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org
 
> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
> 
> I added that.
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-01-24  7:34           ` Mike Rapoport
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Rapoport @ 2021-01-24  7:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: riel, kernel-team, Ram Pai, linux-kernel, mhocko, linux-mm,
	Satheesh Rajendran, Konrad Rzeszutek Wilk, iamjoonsoo.kim,
	linuxppc-dev, guro, Thiago Jung Bauermann

On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
> 
> > Mike Rapoport <rppt@kernel.org> writes:
> > 
> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > 
> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
> > patch. This happens on some ppc64le bare metal (powernv) server machines with
> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
> > to solve this issue in a different way:
> > 
> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
> > 
> > Since this patch solves that problem, is it possible to include it in the next
> > feasible v5.11-rcX, with the following tag?
> 
> We could do this, if we're confident that this patch doesn't depend on
> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...

A think it does not depend on cma bottom-up allocation, it's rather the other
way around: without this CMA bottom-up allocation could fail with KASLR
enabled.

Still, this patch may need updates to the way x86 does early reservations:

https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org
 
> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
> 
> I added that.
> 
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-01-24  7:34           ` Mike Rapoport
@ 2021-01-26  0:30             ` Thiago Jung Bauermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-01-26  0:30 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, guro, iamjoonsoo.kim, Ram Pai,
	Konrad Rzeszutek Wilk, Satheesh Rajendran, kernel-team,
	linux-kernel, linux-mm, linuxppc-dev, mhocko, riel


Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this,

Thanks!

>> if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I agree. Conceptually, this could have been patch 1 in this series.

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

Ah, I wasn't aware of this. Thanks for fixing those issues. That series
seems to be well accepted.

>> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
>> 
>> I added that.

Thanks!
-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-01-26  0:30             ` Thiago Jung Bauermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-01-26  0:30 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: riel, kernel-team, Ram Pai, linux-kernel, guro, linux-mm,
	Satheesh Rajendran, Konrad Rzeszutek Wilk, iamjoonsoo.kim,
	mhocko, linuxppc-dev, Andrew Morton


Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this,

Thanks!

>> if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I agree. Conceptually, this could have been patch 1 in this series.

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

Ah, I wasn't aware of this. Thanks for fixing those issues. That series
seems to be well accepted.

>> > Fixes: 8fabc623238e ("powerpc: Ensure that swiotlb buffer is allocated from low memory")
>> 
>> I added that.

Thanks!
-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-01-24  7:34           ` Mike Rapoport
@ 2021-02-08 23:58             ` Thiago Jung Bauermann
  -1 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-02-08 23:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andrew Morton, riel, kernel-team, Ram Pai, linux-kernel, mhocko,
	linux-mm, Satheesh Rajendran, Konrad Rzeszutek Wilk,
	iamjoonsoo.kim, guro, linuxppc-dev


Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this, if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I noticed that this patch is now upstream as:

2dcb39645441 memblock: do not start bottom-up allocations with kernel_end

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

... but the patches from this link still aren't. Isn't this a potential
problem for x86?

The patch series on the link above is now superseded by v2:

https://lore.kernel.org/linux-mm/20210128105711.10428-1-rppt@kernel.org/

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-02-08 23:58             ` Thiago Jung Bauermann
  0 siblings, 0 replies; 49+ messages in thread
From: Thiago Jung Bauermann @ 2021-02-08 23:58 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: riel, iamjoonsoo.kim, Ram Pai, linux-kernel, mhocko, linux-mm,
	Satheesh Rajendran, guro, Konrad Rzeszutek Wilk, Andrew Morton,
	linuxppc-dev, kernel-team


Mike Rapoport <rppt@kernel.org> writes:

> On Sat, Jan 23, 2021 at 06:09:11PM -0800, Andrew Morton wrote:
>> On Fri, 22 Jan 2021 01:37:14 -0300 Thiago Jung Bauermann <bauerman@linux.ibm.com> wrote:
>> 
>> > Mike Rapoport <rppt@kernel.org> writes:
>> > 
>> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
>> > > 
>> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
>> > 
>> > I've seen a couple of spurious triggers of the WARN_ONCE() removed by this
>> > patch. This happens on some ppc64le bare metal (powernv) server machines with
>> > CONFIG_SWIOTLB=y and crashkernel=4G, as described in a candidate patch I posted
>> > to solve this issue in a different way:
>> > 
>> > https://lore.kernel.org/linuxppc-dev/20201218062103.76102-1-bauerman@linux.ibm.com/
>> > 
>> > Since this patch solves that problem, is it possible to include it in the next
>> > feasible v5.11-rcX, with the following tag?
>> 
>> We could do this, if we're confident that this patch doesn't depend on
>> [1/2] "mm: cma: allocate cma areas bottom-up"?  I think it is...
>
> A think it does not depend on cma bottom-up allocation, it's rather the other
> way around: without this CMA bottom-up allocation could fail with KASLR
> enabled.

I noticed that this patch is now upstream as:

2dcb39645441 memblock: do not start bottom-up allocations with kernel_end

> Still, this patch may need updates to the way x86 does early reservations:
>
> https://lore.kernel.org/lkml/20210115083255.12744-1-rppt@kernel.org

... but the patches from this link still aren't. Isn't this a potential
problem for x86?

The patch series on the link above is now superseded by v2:

https://lore.kernel.org/linux-mm/20210128105711.10428-1-rppt@kernel.org/

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
  2020-12-19 14:52     ` Wonhyuk Yang
  2020-12-20  6:49   ` Mike Rapoport
@ 2021-02-28  4:18   ` Florian Fainelli
  2021-02-28  9:00     ` Mike Rapoport
  2021-03-23 18:19   ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport
  3 siblings, 1 reply; 49+ messages in thread
From: Florian Fainelli @ 2021-02-28  4:18 UTC (permalink / raw)
  To: Roman Gushchin, Andrew Morton, Mike Rapoport, linux-mm,
	Kamal Dasu, linux-mips, Thomas Bogendoerfer, Paul Cercueil,
	Serge Semin, Jiaxun Yang, rppt, iamjoonsoo.kim, riel
  Cc: Joonsoo Kim, Rik van Riel, Michal Hocko, linux-kernel, kernel-team



On 12/17/2020 12:12 PM, Roman Gushchin wrote:
> With kaslr the kernel image is placed at a random place, so starting
> the bottom-up allocation with the kernel_end can result in an
> allocation failure and a warning like this one:
> 
> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> [    0.002921] ------------[ cut here ]------------
> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> [    0.002937] Modules linked in:
> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> [    0.002956] Call Trace:
> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> [    0.002969]  ? native_set_fixmap+0x82/0xd0
> [    0.002971]  ? flat_get_apic_id+0x5/0x10
> [    0.002973]  ? register_lapic_address+0x8e/0x97
> [    0.002975]  ? setup_arch+0x8a5/0xc3f
> [    0.002978]  ? start_kernel+0x66/0x547
> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> [    0.002988] ---[ end trace f151227d0b39be70 ]---
> 
> At the same time, the kernel image is protected with memblock_reserve(),
> so we can just start searching at PAGE_SIZE. In this case the
> bottom-up allocation has the same chances to success as a top-down
> allocation, so there is no reason to fallback in the case of a
> failure. All together it simplifies the logic.
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Hi Roman, Thomas and other linux-mips folks,

Kamal and myself have been unable to boot v5.11 on MIPS since this
commit, reverting it makes our MIPS platforms boot successfully. We do
not see a warning like this one in the commit message, instead what
happens appear to be a corrupted Device Tree which prevents the parsing
of the "rdb" node and leading to the interrupt controllers not being
registered, and the system eventually not booting.

The Device Tree is built-into the kernel image and resides at
arch/mips/boot/dts/brcm/bcm97435svmb.dts.

Do you have any idea what could be wrong with MIPS specifically here?

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-02-28  4:18   ` Florian Fainelli
@ 2021-02-28  9:00     ` Mike Rapoport
  2021-02-28 18:19       ` Florian Fainelli
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Rapoport @ 2021-02-28  9:00 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, linux-mips,
	Thomas Bogendoerfer, Paul Cercueil, Serge Semin, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team

Hi Florian,

On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> 
> On 12/17/2020 12:12 PM, Roman Gushchin wrote:
> > With kaslr the kernel image is placed at a random place, so starting
> > the bottom-up allocation with the kernel_end can result in an
> > allocation failure and a warning like this one:
> > 
> > [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
> > [    0.002921] ------------[ cut here ]------------
> > [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
> > [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
> > [    0.002937] Modules linked in:
> > [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
> > [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
> > [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
> > [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
> > [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
> > [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
> > [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
> > [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
> > [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
> > [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
> > [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
> > [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
> > [    0.002956] Call Trace:
> > [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
> > [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
> > [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
> > [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
> > [    0.002969]  ? native_set_fixmap+0x82/0xd0
> > [    0.002971]  ? flat_get_apic_id+0x5/0x10
> > [    0.002973]  ? register_lapic_address+0x8e/0x97
> > [    0.002975]  ? setup_arch+0x8a5/0xc3f
> > [    0.002978]  ? start_kernel+0x66/0x547
> > [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
> > [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
> > [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
> > [    0.002988] ---[ end trace f151227d0b39be70 ]---
> > 
> > At the same time, the kernel image is protected with memblock_reserve(),
> > so we can just start searching at PAGE_SIZE. In this case the
> > bottom-up allocation has the same chances to success as a top-down
> > allocation, so there is no reason to fallback in the case of a
> > failure. All together it simplifies the logic.
> > 
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> 
> Hi Roman, Thomas and other linux-mips folks,
> 
> Kamal and myself have been unable to boot v5.11 on MIPS since this
> commit, reverting it makes our MIPS platforms boot successfully. We do
> not see a warning like this one in the commit message, instead what
> happens appear to be a corrupted Device Tree which prevents the parsing
> of the "rdb" node and leading to the interrupt controllers not being
> registered, and the system eventually not booting.
> 
> The Device Tree is built-into the kernel image and resides at
> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> 
> Do you have any idea what could be wrong with MIPS specifically here?

Apparently there is a memblock allocation in one of the functions called
from arch_mem_init() between plat_mem_setup() and
early_init_fdt_reserve_self().

If you have serial available that early we can try to track it down with
forcing memblock_debug in mm/memblock.c to 1:

diff --git a/mm/memblock.c b/mm/memblock.c
index afaefa8fc6ab..83034245f8d5 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory;
                        pr_info(fmt, ##__VA_ARGS__);                    \
        } while (0)
 
-static int memblock_debug __initdata_memblock;
+static int memblock_debug __initdata_memblock = 1;
 static bool system_has_some_mirror __initdata_memblock = false;
 static int memblock_can_resize __initdata_memblock;
 static int memblock_memory_in_slab __initdata_memblock = 0;


Regardless, I think that moving DT self reservation just after
plat_mem_setup() is safe and it'll make things more robust.

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 279be0153f8b..f476b99a7bcd 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p)
 {
 	/* call board setup routine */
 	plat_mem_setup();
+	early_init_fdt_reserve_self();
+	early_init_fdt_scan_reserved_mem();
 	memblock_set_bottom_up(true);
 
 	bootcmdline_init();
@@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p)
 
 	check_kernel_sections_mem();
 
-	early_init_fdt_reserve_self();
-	early_init_fdt_scan_reserved_mem();
-
 #ifndef CONFIG_NUMA
 	memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0);
 #endif
 
-- 
Sincerely yours,
Mike.

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-02-28  9:00     ` Mike Rapoport
@ 2021-02-28 18:19       ` Florian Fainelli
  2021-02-28 23:08         ` Serge Semin
  0 siblings, 1 reply; 49+ messages in thread
From: Florian Fainelli @ 2021-02-28 18:19 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu, linux-mips,
	Thomas Bogendoerfer, Paul Cercueil, Serge Semin, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team

Hi Mike,

On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> Hi Florian,
> 
> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>
>> On 12/17/2020 12:12 PM, Roman Gushchin wrote:
>>> With kaslr the kernel image is placed at a random place, so starting
>>> the bottom-up allocation with the kernel_end can result in an
>>> allocation failure and a warning like this one:
>>>
>>> [    0.002920] hugetlb_cma: reserve 2048 MiB, up to 2048 MiB per node
>>> [    0.002921] ------------[ cut here ]------------
>>> [    0.002922] memblock: bottom-up allocation failed, memory hotremove may be affected
>>> [    0.002937] WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x178/0x25a
>>> [    0.002937] Modules linked in:
>>> [    0.002939] CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #1169
>>> [    0.002940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
>>> [    0.002942] RIP: 0010:memblock_find_in_range_node+0x178/0x25a
>>> [    0.002944] Code: e9 6d ff ff ff 48 85 c0 0f 85 da 00 00 00 80 3d 9b 35 df 00 00 75 15 48 c7 c7 c0 75 59 88 c6 05 8b 35 df 00 01 e8 25 8a fa ff <0f> 0b 48 c7 44 24 20 ff ff ff ff 44 89 e6 44 89 ea 48 c7 c1 70 5c
>>> [    0.002945] RSP: 0000:ffffffff88803d18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
>>> [    0.002947] RAX: 0000000000000000 RBX: 0000000240000000 RCX: 00000000ffffdfff
>>> [    0.002948] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000046
>>> [    0.002948] RBP: 0000000100000000 R08: ffffffff88922788 R09: 0000000000009ffb
>>> [    0.002949] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000000000
>>> [    0.002950] R13: 0000000000000000 R14: 0000000080000000 R15: 00000001fb42c000
>>> [    0.002952] FS:  0000000000000000(0000) GS:ffffffff88f71000(0000) knlGS:0000000000000000
>>> [    0.002953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [    0.002954] CR2: ffffa080fb401000 CR3: 00000001fa80a000 CR4: 00000000000406b0
>>> [    0.002956] Call Trace:
>>> [    0.002961]  ? memblock_alloc_range_nid+0x8d/0x11e
>>> [    0.002963]  ? cma_declare_contiguous_nid+0x2c4/0x38c
>>> [    0.002964]  ? hugetlb_cma_reserve+0xdc/0x128
>>> [    0.002968]  ? flush_tlb_one_kernel+0xc/0x20
>>> [    0.002969]  ? native_set_fixmap+0x82/0xd0
>>> [    0.002971]  ? flat_get_apic_id+0x5/0x10
>>> [    0.002973]  ? register_lapic_address+0x8e/0x97
>>> [    0.002975]  ? setup_arch+0x8a5/0xc3f
>>> [    0.002978]  ? start_kernel+0x66/0x547
>>> [    0.002980]  ? load_ucode_bsp+0x4c/0xcd
>>> [    0.002982]  ? secondary_startup_64_no_verify+0xb0/0xbb
>>> [    0.002986] random: get_random_bytes called from __warn+0xab/0x110 with crng_init=0
>>> [    0.002988] ---[ end trace f151227d0b39be70 ]---
>>>
>>> At the same time, the kernel image is protected with memblock_reserve(),
>>> so we can just start searching at PAGE_SIZE. In this case the
>>> bottom-up allocation has the same chances to success as a top-down
>>> allocation, so there is no reason to fallback in the case of a
>>> failure. All together it simplifies the logic.
>>>
>>> Signed-off-by: Roman Gushchin <guro@fb.com>
>>
>> Hi Roman, Thomas and other linux-mips folks,
>>
>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>> commit, reverting it makes our MIPS platforms boot successfully. We do
>> not see a warning like this one in the commit message, instead what
>> happens appear to be a corrupted Device Tree which prevents the parsing
>> of the "rdb" node and leading to the interrupt controllers not being
>> registered, and the system eventually not booting.
>>
>> The Device Tree is built-into the kernel image and resides at
>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>
>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Apparently there is a memblock allocation in one of the functions called
> from arch_mem_init() between plat_mem_setup() and
> early_init_fdt_reserve_self().
> 
> If you have serial available that early we can try to track it down with
> forcing memblock_debug in mm/memblock.c to 1:
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index afaefa8fc6ab..83034245f8d5 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -151,7 +151,7 @@ static __refdata struct memblock_type *memblock_memory = &memblock.memory;
>                         pr_info(fmt, ##__VA_ARGS__);                    \
>         } while (0)
>  
> -static int memblock_debug __initdata_memblock;
> +static int memblock_debug __initdata_memblock = 1;
>  static bool system_has_some_mirror __initdata_memblock = false;
>  static int memblock_can_resize __initdata_memblock;
>  static int memblock_memory_in_slab __initdata_memblock = 0;
> 
> 
> Regardless, I think that moving DT self reservation just after
> plat_mem_setup() is safe and it'll make things more robust.
> 
> diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
> index 279be0153f8b..f476b99a7bcd 100644
> --- a/arch/mips/kernel/setup.c
> +++ b/arch/mips/kernel/setup.c
> @@ -623,6 +623,8 @@ static void __init arch_mem_init(char **cmdline_p)
>  {
>  	/* call board setup routine */
>  	plat_mem_setup();
> +	early_init_fdt_reserve_self();
> +	early_init_fdt_scan_reserved_mem();
>  	memblock_set_bottom_up(true);
>  
>  	bootcmdline_init();
> @@ -636,9 +638,6 @@ static void __init arch_mem_init(char **cmdline_p)
>  
>  	check_kernel_sections_mem();
>  
> -	early_init_fdt_reserve_self();
> -	early_init_fdt_scan_reserved_mem();
> -
>  #ifndef CONFIG_NUMA
>  	memblock_set_node(0, PHYS_ADDR_MAX, &memblock.memory, 0);
>  #endif

Thanks a lot for taking a look! The current/broken memblock=debug output
looks like this:

[    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
Feb 28 10:01:50 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
setup_arch+0x128/0x69c
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x1f8/0x69c
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
setup_arch+0x3fc/0x69c
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x01831400-0x032313ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0xa
[    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
bytes flags: 0x0
[    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
bytes flags: 0x0
[    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03231400-0x032323ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03233000-0x0327afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03245000-0x03244fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03257000-0x03256fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03269000-0x03268fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0327b000-0x0327afff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x03232400-0x0323240f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x03232480-0x0323248f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x03232500-0x0323257f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x03232580-0x032325db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x03232600-0x032328ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x03232900-0x03232c03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x0000f000-0x0000ffff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03231400-0x032323ff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] OF: Bad cell count for /rdb
[    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
[    0.000000] OF: of_irq_init: children remain, but no parents
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

and with your patch applied which unfortunately did not work we have the
following:

[    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #86 SMP Sun
Feb 28 10:04:54 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
setup_arch+0x60/0x6a4
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x200/0x6a4
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
setup_arch+0x404/0x6a4
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e8/0x6a4
[    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x01831400-0x032313ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0xa
[    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
bytes flags: 0x0
[    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
bytes flags: 0x0
[    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03231400-0x032323ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03233000-0x0327afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03245000-0x03244fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03257000-0x03256fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03269000-0x03268fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0327b000-0x0327afff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x03232400-0x0323240f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x03232480-0x0323248f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x03232500-0x0323257f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x03232580-0x032325db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x03232600-0x032328ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x03232900-0x03232c03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x0000f000-0x0000ffff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03231400-0x032323ff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] OF: Bad cell count for /rdb
[    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
[    0.000000] OF: of_irq_init: children remain, but no parents
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

With only the revert of f787b0b4502cde50c3583432d6cb9bd8306fc242
("memblock: do not start bottom-up allocations with kernel_end") and an
unmodified arch/mips/kernel/setup.c, this boots successfully:

[    0.000000] Linux version 5.11.0-gf787b0b4502c (florian@locahost)
(mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #88 SMP Sun
Feb 28 10:13:21 PST 2021
[    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
[    0.000000] FPU revision is: 00130001
[    0.000000] memblock_add: [0x00000000-0x0fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x20000000-0x4fffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] memblock_add: [0x90000000-0xcfffffff]
early_init_dt_scan_memory+0x160/0x1e0
[    0.000000] MIPS: machine is Broadcom BCM97435SVMB
[    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] memblock_reserve: [0x00aa9600-0x00aac0a0]
setup_arch+0x128/0x69c
[    0.000000] memblock_reserve: [0x00010000-0x018313cf]
setup_arch+0x1f8/0x69c
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x01831400-0x01833ea0]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x01833ea4-0x0183be4b]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
from=0x00000000 max_addr=0x00000000
early_init_dt_alloc_memory_arch+0x40/0x84
[    0.000000] memblock_reserve: [0x018313d0-0x018313e8]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_reserve: [0x0096c000-0x0096bfff]
setup_arch+0x3fc/0x69c
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183be80-0x0183be9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183bf00-0x0183bf1f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
[    0.000000] memblock_reserve: [0x0183bf80-0x0183bf9f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
bytes.
[    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
linesize 32 bytes
[    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183c000-0x0183cfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183d000-0x0183dfff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
[    0.000000] memblock_reserve: [0x0183e000-0x0183efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
[    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
[    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000000000000-0x00000000cfffffff]
[    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000
alloc_node_mem_map.constprop.135+0x6c/0xc8
[    0.000000] memblock_reserve: [0x0183f000-0x0323efff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0323f000-0x0323f01f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
[    0.000000] memblock_reserve: [0x0323f080-0x0323f1ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] MEMBLOCK configuration:
[    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
[    0.000000]  memory.cnt  = 0x3
[    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
bytes flags: 0x0
[    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
bytes flags: 0x0
[    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
bytes flags: 0x0
[    0.000000]  reserved.cnt  = 0x8
[    0.000000]  reserved[0x0]   [0x00010000-0x018313e8], 0x018213e9
bytes flags: 0x0
[    0.000000]  reserved[0x1]   [0x01831400-0x01833ea0], 0x00002aa1
bytes flags: 0x0
[    0.000000]  reserved[0x2]   [0x01833ea4-0x0183be4b], 0x00007fa8
bytes flags: 0x0
[    0.000000]  reserved[0x3]   [0x0183be80-0x0183be9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x4]   [0x0183bf00-0x0183bf1f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x5]   [0x0183bf80-0x0183bf9f], 0x00000020
bytes flags: 0x0
[    0.000000]  reserved[0x6]   [0x0183c000-0x0323f01f], 0x01a03020
bytes flags: 0x0
[    0.000000]  reserved[0x7]   [0x0323f080-0x0323f1ff], 0x00000180
bytes flags: 0x0
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
[    0.000000] memblock_reserve: [0x0323f200-0x0323f21d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
[    0.000000] memblock_reserve: [0x0323f280-0x0323f29d]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
[    0.000000] memblock_reserve: [0x03240000-0x03240fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
[    0.000000] memblock_reserve: [0x03241000-0x03241fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
[    0.000000] memblock_reserve: [0x03242000-0x03289fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03254000-0x03253fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03266000-0x03265fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x03278000-0x03277fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] memblock_free: [0x0328a000-0x03289fff]
pcpu_embed_first_chunk+0x7a0/0x884
[    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
[    0.000000] memblock_reserve: [0x0323f300-0x0323f303]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
[    0.000000] memblock_reserve: [0x0323f380-0x0323f383]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
[    0.000000] memblock_reserve: [0x0323f400-0x0323f40f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
[    0.000000] memblock_reserve: [0x0323f480-0x0323f48f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
[    0.000000] memblock_reserve: [0x0323f500-0x0323f57f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
[    0.000000] memblock_reserve: [0x0323f580-0x0323f5db]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
[    0.000000] memblock_reserve: [0x0323f600-0x0323f8ff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
[    0.000000] memblock_reserve: [0x0323f900-0x0323fc03]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
[    0.000000] memblock_reserve: [0x0323fc80-0x0323fd3f]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] memblock_free: [0x03240000-0x03240fff]
pcpu_embed_first_chunk+0x838/0x884
[    0.000000] memblock_free: [0x03241000-0x03241fff]
pcpu_embed_first_chunk+0x850/0x884
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
[    0.000000] Kernel command line: console=ttyS0,115200 earlycon
[    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x0328a000-0x032a9fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
bytes, linear)
[    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
[    0.000000] memblock_reserve: [0x032aa000-0x032b9fff]
memblock_alloc_range_nid+0xf8/0x198
[    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
bytes, linear)
[    0.000000] memblock_reserve: [0x00000000-0x000003ff]
trap_init+0x70/0x4e8
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 2045272K/2097152K available (8226K kernel code,
1078K rwdata, 1336K rodata, 13800K init, 260K bss, 51880K reserved, 0K
cma-reserved, 1835008K highmem)
[    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
is 25 jiffies.
[    0.000000] NR_IRQS: 256
[    0.000000] irq_bcm7038_l1: registered BCM7038 L1 intc
(/rdb/interrupt-controller@41b500, IRQs: 128)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@403000, parent irq: 52)
[    0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc
(/rdb/interrupt-controller@406780, parent IRQ(s): 2)
[    0.000000] irq_bcm7120_l2: registered BCM7120 L2 intc
(/rdb/interrupt-controller@409480, parent IRQ(s): 3)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@408440, parent irq: 54)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@41b000, parent irq: 24)
[    0.000000] irq_brcmstb_l2: registered L2 intc
(/rdb/interrupt-controller@41bd00, parent irq: 25)
[    0.000000] random: get_random_bytes called from
start_kernel+0x444/0x654 with crng_init=0
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 10882621761 ns
[    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
wraps every 8589934590000000ns

The DTB is located at this offset within vmlinux:

37084: 80aac0a1      0 OBJECT  GLOBAL DEFAULT       10
__dtb_bcm97435svmb_end
48909: 80aa9600      0 OBJECT  GLOBAL DEFAULT       10
__dtb_bcm97435svmb_begin

0x8000_0000 maps to physical address 0x0 on these MIPS platforms.
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-02-28 18:19       ` Florian Fainelli
@ 2021-02-28 23:08         ` Serge Semin
  2021-03-01  3:50             ` Florian Fainelli
  0 siblings, 1 reply; 49+ messages in thread
From: Serge Semin @ 2021-02-28 23:08 UTC (permalink / raw)
  To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	linux-mips, Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel,
	Michal Hocko, linux-kernel, kernel-team

Hi folks,
What you've got here seems a more complicated problem than it
could originally look like. Please, see my comments below.

(Note I've discarded some of the email logs, which of no interest
to the discovered problem. Please also note that I haven't got any
Broadcom hardware to test out a solution suggested below.)

On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> Hi Mike,
> 
> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > Hi Florian,
> > 
> > On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>

> >> [...]

> >>
> >> Hi Roman, Thomas and other linux-mips folks,
> >>
> >> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >> commit, reverting it makes our MIPS platforms boot successfully. We do
> >> not see a warning like this one in the commit message, instead what
> >> happens appear to be a corrupted Device Tree which prevents the parsing
> >> of the "rdb" node and leading to the interrupt controllers not being
> >> registered, and the system eventually not booting.
> >>
> >> The Device Tree is built-into the kernel image and resides at
> >> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>
> >> Do you have any idea what could be wrong with MIPS specifically here?

Most likely the problem you've discovered has been there for quite
some time. The patch you are referring to just caused it to be
triggered by extending the early allocation range. See before that
patch was accepted the early memory allocations had been performed
in the range:
[kernel_end, RAM_END].
The patch changed that, so the early allocations are done within
[RAM_START + PAGE_SIZE, RAM_END].

In normal situations it's safe to do that as long as all the critical
memory regions (including the memory residing a space below the
kernel) have been reserved. But as soon as a memory with some critical
structures haven't been reserved, the kernel may allocate it to be used
for instance for early initializations with obviously unpredictable but
most of the times unpleasant consequences.

> > 
> > Apparently there is a memblock allocation in one of the functions called
> > from arch_mem_init() between plat_mem_setup() and
> > early_init_fdt_reserve_self().

Mike, alas according to the log provided by Florian that's not the reason
of the problem. Please, see my considerations below.

> [...]
> 
> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> Feb 28 10:01:50 PST 2021
> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> [    0.000000] FPU revision is: 00130001

> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> early_init_dt_scan_memory+0x160/0x1e0
> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> early_init_dt_scan_memory+0x160/0x1e0
> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> early_init_dt_scan_memory+0x160/0x1e0

Here the memory has been added to the memblock allocator.

> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> [    0.000000] printk: bootconsole [ns16550a0] enabled

> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> setup_arch+0x128/0x69c

Here the fdt memory has been reserved. (Note it's built into the
kernel.)

> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> setup_arch+0x1f8/0x69c

Here the kernel itself together with built-in dtb have been reserved.
So far so good.

> [    0.000000] Initrd not found or empty - disabling initrd

> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> memblock_alloc_range_nid+0xf8/0x198

The log above most likely belongs to the call-chain:
setup_arch()
+-> arch_mem_init()
    +-> device_tree_init() - BMIPS specific method
        +-> unflatten_and_copy_device_tree()

So to speak here we've copied the fdt from the original space
[0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
it to [0x00003aa4-0x0000ba4b].

The problem is that a bit later the next call-chain is performed:
setup_arch()
+-> plat_smp_setup()
    +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
        +-> if (!board_ebase_setup)
                 board_ebase_setup = &bmips_ebase_setup;

So at the moment of the CPU traps initialization the bmips_ebase_setup()
method is called. What trap_init() does isn't compatible with the
allocation performed by the unflatten_and_copy_device_tree() method.
See the next comment.

> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> from=0x00000000 max_addr=0x00000000
> early_init_dt_alloc_memory_arch+0x40/0x84
> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> setup_arch+0x3fc/0x69c
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> bytes.
> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> linesize 32 bytes
> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> [    0.000000] Initmem setup node 0 [mem
> 0x0000000000000000-0x00000000cfffffff]
> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000
> alloc_node_mem_map.constprop.135+0x6c/0xc8
> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> [    0.000000]  memory.cnt  = 0x3
> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> bytes flags: 0x0
> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> bytes flags: 0x0
> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0xa
> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> bytes flags: 0x0
> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> bytes flags: 0x0
> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> bytes flags: 0x0
> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> bytes flags: 0x0
> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> bytes flags: 0x0
> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> pcpu_embed_first_chunk+0x7a0/0x884
> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> pcpu_embed_first_chunk+0x838/0x884
> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> pcpu_embed_first_chunk+0x850/0x884
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> bytes, linear)
> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> memblock_alloc_range_nid+0xf8/0x198
> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> bytes, linear)

> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> trap_init+0x70/0x4e8

Most likely someplace here the corruption has happened. The log above
has just reserved a memory for NMI/reset vectors:
arch/mips/kernel/traps.c: trap_init(void): Line 2373.

But then the board_ebase_setup() pointer is dereferenced and called,
which has been initialized with bmips_ebase_setup() earlier and which
overwrites the ebase variable with: 0x80001000 as this is
CPU_BMIPS5000 CPU. So any further calls of the functions like
set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
corruption of the memory above 0x80001000, which as we have discovered
belongs to fdt and unflattened device tree.

> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> cma-reserved, 1835008K highmem)
> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu:     RCU event tracing is enabled.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> is 25 jiffies.
> [    0.000000] NR_IRQS: 256

> [    0.000000] OF: Bad cell count for /rdb
> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> [    0.000000] OF: of_irq_init: children remain, but no parents

So here is the first time we have got the consequence of the corruption
popped up. Luckily it's just the "Bad cells count" error. We could have
got much less obvious log here up to getting a crash at some place
further...

> [    0.000000] random: get_random_bytes called from
> start_kernel+0x444/0x654 with crng_init=0
> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> wraps every 8589934590000000ns

> 
> and with your patch applied which unfortunately did not work we have the
> following:
>
> [...]

So a patch like this shall workaround the corruption:

--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
 
 	__dt_setup_arch(dtb);
 
+	memblock_reserve(0x0, 0x1000 + 0x100*64);
+
 	for (q = bmips_quirk_list; q->quirk_fn; q++) {
 		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
 					     q->compatible)) {

But the main question is how to fix the problem in general. At least
for Broadcom CPUs the reservation needs to be performed before
device_tree_init() is called, since the later is the very first
method which starts allocating from memblock. So the best candidate is
to use plat_mem_setup() for reservation right after the memory is
added to the memblock allocator by means of the __dt_setup_arch()
function invocation. In addition, we need take into account the amount
of memory each type of the Broadcom CPU needs for the exception
vectors. So a function like this could be used to reserve the
exception vectors memory:

static void bmips_ebase_reserve(void)
{
	phys_addr_t base, size = VECTORSPACING*64;

	switch (current_cpu_type()) {
	case CPU_BMIPS4350:
		return;
	case CPU_BMIPS3300:        
	case CPU_BMIPS4380:
		base = 0x0400;
		break;
	case CPU_BMIPS5000:
		base = 0x1000;
		break;
	default:
		return;
	}

	memblock_reserve(base, size);
}

Though I am not sure it's correct. At least on P5600 the vector spacing
is configurable.

Anyway all of that concerns the Broadcom CPUs. But the same problem we
can experience for some other platforms which developers weren't
careful enough in reserving all the critical memory sections in the
platform code. Especially after the introduced by Roman patch has been
merged into the kernel.

-Sergey

> 
> [...]
> -- 
> Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-02-28 23:08         ` Serge Semin
@ 2021-03-01  3:50             ` Florian Fainelli
  0 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2021-03-01  3:50 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

Hi Serge,

On 2/28/2021 3:08 PM, Serge Semin wrote:
> Hi folks,
> What you've got here seems a more complicated problem than it
> could originally look like. Please, see my comments below.
> 
> (Note I've discarded some of the email logs, which of no interest
> to the discovered problem. Please also note that I haven't got any
> Broadcom hardware to test out a solution suggested below.)
> 
> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>> Hi Mike,
>>
>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>> Hi Florian,
>>>
>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>
> 
>>>> [...]
> 
>>>>
>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>
>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>> not see a warning like this one in the commit message, instead what
>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>> registered, and the system eventually not booting.
>>>>
>>>> The Device Tree is built-into the kernel image and resides at
>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>
>>>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Most likely the problem you've discovered has been there for quite
> some time. The patch you are referring to just caused it to be
> triggered by extending the early allocation range. See before that
> patch was accepted the early memory allocations had been performed
> in the range:
> [kernel_end, RAM_END].
> The patch changed that, so the early allocations are done within
> [RAM_START + PAGE_SIZE, RAM_END].
> 
> In normal situations it's safe to do that as long as all the critical
> memory regions (including the memory residing a space below the
> kernel) have been reserved. But as soon as a memory with some critical
> structures haven't been reserved, the kernel may allocate it to be used
> for instance for early initializations with obviously unpredictable but
> most of the times unpleasant consequences.
> 
>>>
>>> Apparently there is a memblock allocation in one of the functions called
>>> from arch_mem_init() between plat_mem_setup() and
>>> early_init_fdt_reserve_self().
> 
> Mike, alas according to the log provided by Florian that's not the reason
> of the problem. Please, see my considerations below.
> 
>> [...]
>>
>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>> Feb 28 10:01:50 PST 2021
>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>> [    0.000000] FPU revision is: 00130001
> 
>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
> 
> Here the memory has been added to the memblock allocator.
> 
>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> 
>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>> setup_arch+0x128/0x69c
> 
> Here the fdt memory has been reserved. (Note it's built into the
> kernel.)
> 
>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>> setup_arch+0x1f8/0x69c
> 
> Here the kernel itself together with built-in dtb have been reserved.
> So far so good.
> 
>> [    0.000000] Initrd not found or empty - disabling initrd
> 
>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>> memblock_alloc_range_nid+0xf8/0x198
> 
> The log above most likely belongs to the call-chain:
> setup_arch()
> +-> arch_mem_init()
>     +-> device_tree_init() - BMIPS specific method
>         +-> unflatten_and_copy_device_tree()
> 
> So to speak here we've copied the fdt from the original space
> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> it to [0x00003aa4-0x0000ba4b].
> 
> The problem is that a bit later the next call-chain is performed:
> setup_arch()
> +-> plat_smp_setup()
>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>         +-> if (!board_ebase_setup)
>                  board_ebase_setup = &bmips_ebase_setup;
> 
> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> method is called. What trap_init() does isn't compatible with the
> allocation performed by the unflatten_and_copy_device_tree() method.
> See the next comment.
> 
>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>> setup_arch+0x3fc/0x69c
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>> bytes.
>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>> linesize 32 bytes
>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000000000-0x00000000cfffffff]
>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000
>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] MEMBLOCK configuration:
>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>> [    0.000000]  memory.cnt  = 0x3
>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>> bytes flags: 0x0
>> [    0.000000]  reserved.cnt  = 0xa
>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>> bytes flags: 0x0
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>> pcpu_embed_first_chunk+0x838/0x884
>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>> pcpu_embed_first_chunk+0x850/0x884
>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>> bytes, linear)
>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>> bytes, linear)
> 
>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>> trap_init+0x70/0x4e8
> 
> Most likely someplace here the corruption has happened. The log above
> has just reserved a memory for NMI/reset vectors:
> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> 
> But then the board_ebase_setup() pointer is dereferenced and called,
> which has been initialized with bmips_ebase_setup() earlier and which
> overwrites the ebase variable with: 0x80001000 as this is
> CPU_BMIPS5000 CPU. So any further calls of the functions like
> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> corruption of the memory above 0x80001000, which as we have discovered
> belongs to fdt and unflattened device tree.
> 
>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>> cma-reserved, 1835008K highmem)
>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> [    0.000000] rcu: Hierarchical RCU implementation.
>> [    0.000000] rcu:     RCU event tracing is enabled.
>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>> is 25 jiffies.
>> [    0.000000] NR_IRQS: 256
> 
>> [    0.000000] OF: Bad cell count for /rdb
>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>> [    0.000000] OF: of_irq_init: children remain, but no parents
> 
> So here is the first time we have got the consequence of the corruption
> popped up. Luckily it's just the "Bad cells count" error. We could have
> got much less obvious log here up to getting a crash at some place
> further...
> 
>> [    0.000000] random: get_random_bytes called from
>> start_kernel+0x444/0x654 with crng_init=0
>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>> wraps every 8589934590000000ns
> 
>>
>> and with your patch applied which unfortunately did not work we have the
>> following:
>>
>> [...]
> 
> So a patch like this shall workaround the corruption:
> 
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {

This patch works, thanks a lot for the troubleshooting and analysis! How
about the following which would be more generic and works as well and
should be more universal since it does not require each architecture to
provide an appropriate call to memblock_reserve():

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..b0a173b500e8 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2367,10 +2367,7 @@ void __init trap_init(void)

        if (!cpu_has_mips_r2_r6) {
                ebase = CAC_BASE;
-               ebase_pa = virt_to_phys((void *)ebase);
                vec_size = 0x400;
-
-               memblock_reserve(ebase_pa, vec_size);
        } else {
                if (cpu_has_veic || cpu_has_vint)
                        vec_size = 0x200 + VECTORSPACING*64;
@@ -2410,6 +2407,14 @@ void __init trap_init(void)

        if (board_ebase_setup)
                board_ebase_setup();
+
+       /* board_ebase_setup() can change the exception base address
+        * reserve it now after changes were made.
+        */
+       if (!cpu_has_mips_r2_r6) {
+               ebase_pa = virt_to_phys((void *)ebase);
+               memblock_reserve(ebase_pa, vec_size);
+       }
        per_cpu_trap_init(true);
        memblock_set_bottom_up(false);
-- 
Florian

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-03-01  3:50             ` Florian Fainelli
  0 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2021-03-01  3:50 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

Hi Serge,

On 2/28/2021 3:08 PM, Serge Semin wrote:
> Hi folks,
> What you've got here seems a more complicated problem than it
> could originally look like. Please, see my comments below.
> 
> (Note I've discarded some of the email logs, which of no interest
> to the discovered problem. Please also note that I haven't got any
> Broadcom hardware to test out a solution suggested below.)
> 
> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>> Hi Mike,
>>
>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>> Hi Florian,
>>>
>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>
> 
>>>> [...]
> 
>>>>
>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>
>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>> not see a warning like this one in the commit message, instead what
>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>> registered, and the system eventually not booting.
>>>>
>>>> The Device Tree is built-into the kernel image and resides at
>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>
>>>> Do you have any idea what could be wrong with MIPS specifically here?
> 
> Most likely the problem you've discovered has been there for quite
> some time. The patch you are referring to just caused it to be
> triggered by extending the early allocation range. See before that
> patch was accepted the early memory allocations had been performed
> in the range:
> [kernel_end, RAM_END].
> The patch changed that, so the early allocations are done within
> [RAM_START + PAGE_SIZE, RAM_END].
> 
> In normal situations it's safe to do that as long as all the critical
> memory regions (including the memory residing a space below the
> kernel) have been reserved. But as soon as a memory with some critical
> structures haven't been reserved, the kernel may allocate it to be used
> for instance for early initializations with obviously unpredictable but
> most of the times unpleasant consequences.
> 
>>>
>>> Apparently there is a memblock allocation in one of the functions called
>>> from arch_mem_init() between plat_mem_setup() and
>>> early_init_fdt_reserve_self().
> 
> Mike, alas according to the log provided by Florian that's not the reason
> of the problem. Please, see my considerations below.
> 
>> [...]
>>
>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>> Feb 28 10:01:50 PST 2021
>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>> [    0.000000] FPU revision is: 00130001
> 
>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>> early_init_dt_scan_memory+0x160/0x1e0
> 
> Here the memory has been added to the memblock allocator.
> 
>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> 
>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>> setup_arch+0x128/0x69c
> 
> Here the fdt memory has been reserved. (Note it's built into the
> kernel.)
> 
>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>> setup_arch+0x1f8/0x69c
> 
> Here the kernel itself together with built-in dtb have been reserved.
> So far so good.
> 
>> [    0.000000] Initrd not found or empty - disabling initrd
> 
>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>> memblock_alloc_range_nid+0xf8/0x198
> 
> The log above most likely belongs to the call-chain:
> setup_arch()
> +-> arch_mem_init()
>     +-> device_tree_init() - BMIPS specific method
>         +-> unflatten_and_copy_device_tree()
> 
> So to speak here we've copied the fdt from the original space
> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> it to [0x00003aa4-0x0000ba4b].
> 
> The problem is that a bit later the next call-chain is performed:
> setup_arch()
> +-> plat_smp_setup()
>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>         +-> if (!board_ebase_setup)
>                  board_ebase_setup = &bmips_ebase_setup;
> 
> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> method is called. What trap_init() does isn't compatible with the
> allocation performed by the unflatten_and_copy_device_tree() method.
> See the next comment.
> 
>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>> from=0x00000000 max_addr=0x00000000
>> early_init_dt_alloc_memory_arch+0x40/0x84
>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>> setup_arch+0x3fc/0x69c
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>> bytes.
>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>> linesize 32 bytes
>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Zone ranges:
>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000000000-0x00000000cfffffff]
>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000
>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] MEMBLOCK configuration:
>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>> [    0.000000]  memory.cnt  = 0x3
>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>> bytes flags: 0x0
>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>> bytes flags: 0x0
>> [    0.000000]  reserved.cnt  = 0xa
>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>> bytes flags: 0x0
>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>> bytes flags: 0x0
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>> pcpu_embed_first_chunk+0x7a0/0x884
>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>> pcpu_embed_first_chunk+0x838/0x884
>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>> pcpu_embed_first_chunk+0x850/0x884
>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>> bytes, linear)
>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>> memblock_alloc_range_nid+0xf8/0x198
>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>> bytes, linear)
> 
>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>> trap_init+0x70/0x4e8
> 
> Most likely someplace here the corruption has happened. The log above
> has just reserved a memory for NMI/reset vectors:
> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> 
> But then the board_ebase_setup() pointer is dereferenced and called,
> which has been initialized with bmips_ebase_setup() earlier and which
> overwrites the ebase variable with: 0x80001000 as this is
> CPU_BMIPS5000 CPU. So any further calls of the functions like
> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> corruption of the memory above 0x80001000, which as we have discovered
> belongs to fdt and unflattened device tree.
> 
>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>> cma-reserved, 1835008K highmem)
>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> [    0.000000] rcu: Hierarchical RCU implementation.
>> [    0.000000] rcu:     RCU event tracing is enabled.
>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>> is 25 jiffies.
>> [    0.000000] NR_IRQS: 256
> 
>> [    0.000000] OF: Bad cell count for /rdb
>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>> [    0.000000] OF: of_irq_init: children remain, but no parents
> 
> So here is the first time we have got the consequence of the corruption
> popped up. Luckily it's just the "Bad cells count" error. We could have
> got much less obvious log here up to getting a crash at some place
> further...
> 
>> [    0.000000] random: get_random_bytes called from
>> start_kernel+0x444/0x654 with crng_init=0
>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>> wraps every 8589934590000000ns
> 
>>
>> and with your patch applied which unfortunately did not work we have the
>> following:
>>
>> [...]
> 
> So a patch like this shall workaround the corruption:
> 
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {

This patch works, thanks a lot for the troubleshooting and analysis! How
about the following which would be more generic and works as well and
should be more universal since it does not require each architecture to
provide an appropriate call to memblock_reserve():

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..b0a173b500e8 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2367,10 +2367,7 @@ void __init trap_init(void)

        if (!cpu_has_mips_r2_r6) {
                ebase = CAC_BASE;
-               ebase_pa = virt_to_phys((void *)ebase);
                vec_size = 0x400;
-
-               memblock_reserve(ebase_pa, vec_size);
        } else {
                if (cpu_has_veic || cpu_has_vint)
                        vec_size = 0x200 + VECTORSPACING*64;
@@ -2410,6 +2407,14 @@ void __init trap_init(void)

        if (board_ebase_setup)
                board_ebase_setup();
+
+       /* board_ebase_setup() can change the exception base address
+        * reserve it now after changes were made.
+        */
+       if (!cpu_has_mips_r2_r6) {
+               ebase_pa = virt_to_phys((void *)ebase);
+               memblock_reserve(ebase_pa, vec_size);
+       }
        per_cpu_trap_init(true);
        memblock_set_bottom_up(false);
-- 
Florian


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  3:50             ` Florian Fainelli
@ 2021-03-01  9:22               ` Serge Semin
  -1 siblings, 0 replies; 49+ messages in thread
From: Serge Semin @ 2021-03-01  9:22 UTC (permalink / raw)
  To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >> setup_arch+0x3fc/0x69c
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >> bytes.
> >> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >> linesize 32 bytes
> >> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >> [    0.000000] Initmem setup node 0 [mem
> >> 0x0000000000000000-0x00000000cfffffff]
> >> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000
> >> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] MEMBLOCK configuration:
> >> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >> [    0.000000]  memory.cnt  = 0x3
> >> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved.cnt  = 0xa
> >> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >> bytes flags: 0x0
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >> pcpu_embed_first_chunk+0x838/0x884
> >> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >> pcpu_embed_first_chunk+0x850/0x884
> >> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >> bytes, linear)
> >> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 

> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():

Hm, are you sure it's working? If so, my analysis hasn't been quite
correct. My suggestion was based on the memory initializations,
allocations and reservations trace. So here is the sequence of most
crucial of them:
1) Memblock initialization:
   start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
   (At this point I suggested to place the exceptions memory
    reservation.)
2) Base FDT memory reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
   reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
4) Reserve kernel itself, some critical sections like initrd and
   crash-kernel:
   start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
5) Copy and unflatten the built-into the kernel device tree
   (BMIPS-platform code):
   start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
   This is the very first time an allocation from the memblock pool
   is performed. Since we haven't reserved a memory for the exception
   vectors yet, the memblock allocator is free to return that memory
   range for any other use. Needless to say if we try to use that memory
   later without consulting with memblock, we may and in our case
   will get into troubles.
6) Many random early memblock allocations for kernel use before
   buddy and sl*b allocators are up and running...
   Note if for some fortunate reason the allocations made in 5) didn't
   overlap the exceptions memory, here we have much more chances to
   do that with obviously fatal consequences of the ranges independent
   usage.
7) Trap/exception vectors initialization and !memory reservation! for
   them:
   start_kernel()->trap_init()
   Only at this point we get to reserve the memory for the vectors.
8) Init and run buddy/sl*b allocators:
   start_kernel()->mm_init()->...mem_init()...

There are a lot of allocations done in 5) and 6) before the
trap_init() is called in 7). You can see that in your log. That's why
I have doubts that your patch worked well. Most likely you've
forgotten to revert the workaround suggested by me in the previous
message. Could you make sure that you didn't and re-test your patch
again? If it still works then I might have confused something and it's
strange that my patch worked in the first place...

A food for thoughts for everyone (Thomas, Mark, please join the
discussion). What we've got here is a bit bigger problem. AFAICS
if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
performs the allocation above the very first PAGE_SIZE memory chunk
(see that method code for details). So we are currently on a safe side
for some older MIPS platforms. But the platform with VEIC/VINT may get
into the same troubles here if they didn't reserve exception memory
early enough before the kernel starts random allocations from
memblock. So we either need to provide a generic workaround for that
or make sure each platform gets to reserve vectors itself for instance
in the plat_mem_setup() method.

-Sergey

> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }
>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);
> -- 
> Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-03-01  9:22               ` Serge Semin
  0 siblings, 0 replies; 49+ messages in thread
From: Serge Semin @ 2021-03-01  9:22 UTC (permalink / raw)
  To: Florian Fainelli, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >> setup_arch+0x3fc/0x69c
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >> bytes.
> >> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >> linesize 32 bytes
> >> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Zone ranges:
> >> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >> [    0.000000] Movable zone start for each node
> >> [    0.000000] Early memory node ranges
> >> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >> [    0.000000] Initmem setup node 0 [mem
> >> 0x0000000000000000-0x00000000cfffffff]
> >> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000
> >> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] MEMBLOCK configuration:
> >> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >> [    0.000000]  memory.cnt  = 0x3
> >> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >> bytes flags: 0x0
> >> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved.cnt  = 0xa
> >> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >> bytes flags: 0x0
> >> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >> bytes flags: 0x0
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >> pcpu_embed_first_chunk+0x7a0/0x884
> >> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >> pcpu_embed_first_chunk+0x838/0x884
> >> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >> pcpu_embed_first_chunk+0x850/0x884
> >> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >> bytes, linear)
> >> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 

> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():

Hm, are you sure it's working? If so, my analysis hasn't been quite
correct. My suggestion was based on the memory initializations,
allocations and reservations trace. So here is the sequence of most
crucial of them:
1) Memblock initialization:
   start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
   (At this point I suggested to place the exceptions memory
    reservation.)
2) Base FDT memory reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
   reservation:
   start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
4) Reserve kernel itself, some critical sections like initrd and
   crash-kernel:
   start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
5) Copy and unflatten the built-into the kernel device tree
   (BMIPS-platform code):
   start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
   This is the very first time an allocation from the memblock pool
   is performed. Since we haven't reserved a memory for the exception
   vectors yet, the memblock allocator is free to return that memory
   range for any other use. Needless to say if we try to use that memory
   later without consulting with memblock, we may and in our case
   will get into troubles.
6) Many random early memblock allocations for kernel use before
   buddy and sl*b allocators are up and running...
   Note if for some fortunate reason the allocations made in 5) didn't
   overlap the exceptions memory, here we have much more chances to
   do that with obviously fatal consequences of the ranges independent
   usage.
7) Trap/exception vectors initialization and !memory reservation! for
   them:
   start_kernel()->trap_init()
   Only at this point we get to reserve the memory for the vectors.
8) Init and run buddy/sl*b allocators:
   start_kernel()->mm_init()->...mem_init()...

There are a lot of allocations done in 5) and 6) before the
trap_init() is called in 7). You can see that in your log. That's why
I have doubts that your patch worked well. Most likely you've
forgotten to revert the workaround suggested by me in the previous
message. Could you make sure that you didn't and re-test your patch
again? If it still works then I might have confused something and it's
strange that my patch worked in the first place...

A food for thoughts for everyone (Thomas, Mark, please join the
discussion). What we've got here is a bit bigger problem. AFAICS
if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
performs the allocation above the very first PAGE_SIZE memory chunk
(see that method code for details). So we are currently on a safe side
for some older MIPS platforms. But the platform with VEIC/VINT may get
into the same troubles here if they didn't reserve exception memory
early enough before the kernel starts random allocations from
memblock. So we either need to provide a generic workaround for that
or make sure each platform gets to reserve vectors itself for instance
in the plat_mem_setup() method.

-Sergey

> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }
>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);
> -- 
> Florian


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  3:50             ` Florian Fainelli
@ 2021-03-01  9:45               ` Mike Rapoport
  -1 siblings, 0 replies; 49+ messages in thread
From: Mike Rapoport @ 2021-03-01  9:45 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Serge Semin, Thomas Bogendoerfer, Serge Semin, Roman Gushchin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84

...

> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 
> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }

With this it's still possible to have memblock allocations around ebase_pa
before it is reserved.

I think we have two options here to solve it in more or less generic way:

* split the reservation of ebase from traps_init() and move it earlier to
setup_arch(). I didn't check what board_ebase_setup() do, if they need to
allocate memory it would not work.

* add an API to memblock to set lower limit for allocations and then set
the lower limit, to e.g. kernel load address in arch_mem_init(). This may
add complexity for configurations with relocatable kernel and kaslr.

>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-03-01  9:45               ` Mike Rapoport
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Rapoport @ 2021-03-01  9:45 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Serge Semin, Thomas Bogendoerfer, Serge Semin, Roman Gushchin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> Hi Serge,
> 
> On 2/28/2021 3:08 PM, Serge Semin wrote:
> > Hi folks,
> > What you've got here seems a more complicated problem than it
> > could originally look like. Please, see my comments below.
> > 
> > (Note I've discarded some of the email logs, which of no interest
> > to the discovered problem. Please also note that I haven't got any
> > Broadcom hardware to test out a solution suggested below.)
> > 
> > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >> Hi Mike,
> >>
> >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>> Hi Florian,
> >>>
> >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>
> > 
> >>>> [...]
> > 
> >>>>
> >>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>
> >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>> not see a warning like this one in the commit message, instead what
> >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>> registered, and the system eventually not booting.
> >>>>
> >>>> The Device Tree is built-into the kernel image and resides at
> >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>
> >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > 
> > Most likely the problem you've discovered has been there for quite
> > some time. The patch you are referring to just caused it to be
> > triggered by extending the early allocation range. See before that
> > patch was accepted the early memory allocations had been performed
> > in the range:
> > [kernel_end, RAM_END].
> > The patch changed that, so the early allocations are done within
> > [RAM_START + PAGE_SIZE, RAM_END].
> > 
> > In normal situations it's safe to do that as long as all the critical
> > memory regions (including the memory residing a space below the
> > kernel) have been reserved. But as soon as a memory with some critical
> > structures haven't been reserved, the kernel may allocate it to be used
> > for instance for early initializations with obviously unpredictable but
> > most of the times unpleasant consequences.
> > 
> >>>
> >>> Apparently there is a memblock allocation in one of the functions called
> >>> from arch_mem_init() between plat_mem_setup() and
> >>> early_init_fdt_reserve_self().
> > 
> > Mike, alas according to the log provided by Florian that's not the reason
> > of the problem. Please, see my considerations below.
> > 
> >> [...]
> >>
> >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >> Feb 28 10:01:50 PST 2021
> >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >> [    0.000000] FPU revision is: 00130001
> > 
> >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >> early_init_dt_scan_memory+0x160/0x1e0
> > 
> > Here the memory has been added to the memblock allocator.
> > 
> >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > 
> >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >> setup_arch+0x128/0x69c
> > 
> > Here the fdt memory has been reserved. (Note it's built into the
> > kernel.)
> > 
> >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >> setup_arch+0x1f8/0x69c
> > 
> > Here the kernel itself together with built-in dtb have been reserved.
> > So far so good.
> > 
> >> [    0.000000] Initrd not found or empty - disabling initrd
> > 
> >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >> memblock_alloc_range_nid+0xf8/0x198
> >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84
> >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >> memblock_alloc_range_nid+0xf8/0x198
> > 
> > The log above most likely belongs to the call-chain:
> > setup_arch()
> > +-> arch_mem_init()
> >     +-> device_tree_init() - BMIPS specific method
> >         +-> unflatten_and_copy_device_tree()
> > 
> > So to speak here we've copied the fdt from the original space
> > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > it to [0x00003aa4-0x0000ba4b].
> > 
> > The problem is that a bit later the next call-chain is performed:
> > setup_arch()
> > +-> plat_smp_setup()
> >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >         +-> if (!board_ebase_setup)
> >                  board_ebase_setup = &bmips_ebase_setup;
> > 
> > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > method is called. What trap_init() does isn't compatible with the
> > allocation performed by the unflatten_and_copy_device_tree() method.
> > See the next comment.
> > 
> >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >> from=0x00000000 max_addr=0x00000000
> >> early_init_dt_alloc_memory_arch+0x40/0x84

...

> >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >> bytes, linear)
> > 
> >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >> trap_init+0x70/0x4e8
> > 
> > Most likely someplace here the corruption has happened. The log above
> > has just reserved a memory for NMI/reset vectors:
> > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > 
> > But then the board_ebase_setup() pointer is dereferenced and called,
> > which has been initialized with bmips_ebase_setup() earlier and which
> > overwrites the ebase variable with: 0x80001000 as this is
> > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > corruption of the memory above 0x80001000, which as we have discovered
> > belongs to fdt and unflattened device tree.
> > 
> >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >> cma-reserved, 1835008K highmem)
> >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >> [    0.000000] rcu: Hierarchical RCU implementation.
> >> [    0.000000] rcu:     RCU event tracing is enabled.
> >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >> is 25 jiffies.
> >> [    0.000000] NR_IRQS: 256
> > 
> >> [    0.000000] OF: Bad cell count for /rdb
> >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > 
> > So here is the first time we have got the consequence of the corruption
> > popped up. Luckily it's just the "Bad cells count" error. We could have
> > got much less obvious log here up to getting a crash at some place
> > further...
> > 
> >> [    0.000000] random: get_random_bytes called from
> >> start_kernel+0x444/0x654 with crng_init=0
> >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >> wraps every 8589934590000000ns
> > 
> >>
> >> and with your patch applied which unfortunately did not work we have the
> >> following:
> >>
> >> [...]
> > 
> > So a patch like this shall workaround the corruption:
> > 
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >  
> >  	__dt_setup_arch(dtb);
> >  
> > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > +
> >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >  					     q->compatible)) {
> 
> This patch works, thanks a lot for the troubleshooting and analysis! How
> about the following which would be more generic and works as well and
> should be more universal since it does not require each architecture to
> provide an appropriate call to memblock_reserve():
> 
> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> index e0352958e2f7..b0a173b500e8 100644
> --- a/arch/mips/kernel/traps.c
> +++ b/arch/mips/kernel/traps.c
> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> 
>         if (!cpu_has_mips_r2_r6) {
>                 ebase = CAC_BASE;
> -               ebase_pa = virt_to_phys((void *)ebase);
>                 vec_size = 0x400;
> -
> -               memblock_reserve(ebase_pa, vec_size);
>         } else {
>                 if (cpu_has_veic || cpu_has_vint)
>                         vec_size = 0x200 + VECTORSPACING*64;
> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> 
>         if (board_ebase_setup)
>                 board_ebase_setup();
> +
> +       /* board_ebase_setup() can change the exception base address
> +        * reserve it now after changes were made.
> +        */
> +       if (!cpu_has_mips_r2_r6) {
> +               ebase_pa = virt_to_phys((void *)ebase);
> +               memblock_reserve(ebase_pa, vec_size);
> +       }

With this it's still possible to have memblock allocations around ebase_pa
before it is reserved.

I think we have two options here to solve it in more or less generic way:

* split the reservation of ebase from traps_init() and move it earlier to
setup_arch(). I didn't check what board_ebase_setup() do, if they need to
allocate memory it would not work.

* add an API to memblock to set lower limit for allocations and then set
the lower limit, to e.g. kernel load address in arch_mem_init(). This may
add complexity for configurations with relocatable kernel and kaslr.

>         per_cpu_trap_init(true);
>         memblock_set_bottom_up(false);

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  9:45               ` Mike Rapoport
@ 2021-03-02  3:55                 ` Roman Gushchin
  -1 siblings, 0 replies; 49+ messages in thread
From: Roman Gushchin @ 2021-03-02  3:55 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Florian Fainelli, Serge Semin, Thomas Bogendoerfer, Serge Semin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > Hi Serge,
> > 
> > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > Hi folks,
> > > What you've got here seems a more complicated problem than it
> > > could originally look like. Please, see my comments below.
> > > 
> > > (Note I've discarded some of the email logs, which of no interest
> > > to the discovered problem. Please also note that I haven't got any
> > > Broadcom hardware to test out a solution suggested below.)
> > > 
> > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > >> Hi Mike,
> > >>
> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > >>> Hi Florian,
> > >>>
> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > >>>>
> > > 
> > >>>> [...]
> > > 
> > >>>>
> > >>>> Hi Roman, Thomas and other linux-mips folks,
> > >>>>
> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > >>>> not see a warning like this one in the commit message, instead what
> > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > >>>> registered, and the system eventually not booting.
> > >>>>
> > >>>> The Device Tree is built-into the kernel image and resides at
> > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > >>>>
> > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > 
> > > Most likely the problem you've discovered has been there for quite
> > > some time. The patch you are referring to just caused it to be
> > > triggered by extending the early allocation range. See before that
> > > patch was accepted the early memory allocations had been performed
> > > in the range:
> > > [kernel_end, RAM_END].
> > > The patch changed that, so the early allocations are done within
> > > [RAM_START + PAGE_SIZE, RAM_END].
> > > 
> > > In normal situations it's safe to do that as long as all the critical
> > > memory regions (including the memory residing a space below the
> > > kernel) have been reserved. But as soon as a memory with some critical
> > > structures haven't been reserved, the kernel may allocate it to be used
> > > for instance for early initializations with obviously unpredictable but
> > > most of the times unpleasant consequences.
> > > 
> > >>>
> > >>> Apparently there is a memblock allocation in one of the functions called
> > >>> from arch_mem_init() between plat_mem_setup() and
> > >>> early_init_fdt_reserve_self().
> > > 
> > > Mike, alas according to the log provided by Florian that's not the reason
> > > of the problem. Please, see my considerations below.
> > > 
> > >> [...]
> > >>
> > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > >> Feb 28 10:01:50 PST 2021
> > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > >> [    0.000000] FPU revision is: 00130001
> > > 
> > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > > 
> > > Here the memory has been added to the memblock allocator.
> > > 
> > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > 
> > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > >> setup_arch+0x128/0x69c
> > > 
> > > Here the fdt memory has been reserved. (Note it's built into the
> > > kernel.)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > >> setup_arch+0x1f8/0x69c
> > > 
> > > Here the kernel itself together with built-in dtb have been reserved.
> > > So far so good.
> > > 
> > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > > 
> > > The log above most likely belongs to the call-chain:
> > > setup_arch()
> > > +-> arch_mem_init()
> > >     +-> device_tree_init() - BMIPS specific method
> > >         +-> unflatten_and_copy_device_tree()
> > > 
> > > So to speak here we've copied the fdt from the original space
> > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > it to [0x00003aa4-0x0000ba4b].
> > > 
> > > The problem is that a bit later the next call-chain is performed:
> > > setup_arch()
> > > +-> plat_smp_setup()
> > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > >         +-> if (!board_ebase_setup)
> > >                  board_ebase_setup = &bmips_ebase_setup;
> > > 
> > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > method is called. What trap_init() does isn't compatible with the
> > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > See the next comment.
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> 
> ...
> 
> > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > >> bytes, linear)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > >> trap_init+0x70/0x4e8
> > > 
> > > Most likely someplace here the corruption has happened. The log above
> > > has just reserved a memory for NMI/reset vectors:
> > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > 
> > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > which has been initialized with bmips_ebase_setup() earlier and which
> > > overwrites the ebase variable with: 0x80001000 as this is
> > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > corruption of the memory above 0x80001000, which as we have discovered
> > > belongs to fdt and unflattened device tree.
> > > 
> > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > >> cma-reserved, 1835008K highmem)
> > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > >> is 25 jiffies.
> > >> [    0.000000] NR_IRQS: 256
> > > 
> > >> [    0.000000] OF: Bad cell count for /rdb
> > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > 
> > > So here is the first time we have got the consequence of the corruption
> > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > got much less obvious log here up to getting a crash at some place
> > > further...
> > > 
> > >> [    0.000000] random: get_random_bytes called from
> > >> start_kernel+0x444/0x654 with crng_init=0
> > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > >> wraps every 8589934590000000ns
> > > 
> > >>
> > >> and with your patch applied which unfortunately did not work we have the
> > >> following:
> > >>
> > >> [...]
> > > 
> > > So a patch like this shall workaround the corruption:
> > > 
> > > --- a/arch/mips/bmips/setup.c
> > > +++ b/arch/mips/bmips/setup.c
> > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > >  
> > >  	__dt_setup_arch(dtb);
> > >  
> > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > +
> > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > >  					     q->compatible)) {
> > 
> > This patch works, thanks a lot for the troubleshooting and analysis! How
> > about the following which would be more generic and works as well and
> > should be more universal since it does not require each architecture to
> > provide an appropriate call to memblock_reserve():
> > 
> > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > index e0352958e2f7..b0a173b500e8 100644
> > --- a/arch/mips/kernel/traps.c
> > +++ b/arch/mips/kernel/traps.c
> > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > 
> >         if (!cpu_has_mips_r2_r6) {
> >                 ebase = CAC_BASE;
> > -               ebase_pa = virt_to_phys((void *)ebase);
> >                 vec_size = 0x400;
> > -
> > -               memblock_reserve(ebase_pa, vec_size);
> >         } else {
> >                 if (cpu_has_veic || cpu_has_vint)
> >                         vec_size = 0x200 + VECTORSPACING*64;
> > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > 
> >         if (board_ebase_setup)
> >                 board_ebase_setup();
> > +
> > +       /* board_ebase_setup() can change the exception base address
> > +        * reserve it now after changes were made.
> > +        */
> > +       if (!cpu_has_mips_r2_r6) {
> > +               ebase_pa = virt_to_phys((void *)ebase);
> > +               memblock_reserve(ebase_pa, vec_size);
> > +       }

Hi folks!

First, I'm really sorry for breaking things and also being silent for last
couple of days: I was almost completely offline. Thank you for working on
this!

> 
> With this it's still possible to have memblock allocations around ebase_pa
> before it is reserved.
> 
> I think we have two options here to solve it in more or less generic way:
> 
> * split the reservation of ebase from traps_init() and move it earlier to
> setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> allocate memory it would not work.

It seems that it doesn't allocate any memory, so it sounds like a good option.
But doesn't the ebase initialization depend on the memblock allocator?

I see in trap_init():
    if (!cpu_has_mips_r2_r6) {
        ...
    } else {
        ...
	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
	...
	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
	    ebase = CKSEG0ADDR(ebase_pa);
        else
            ebase = (unsigned long)phys_to_virt(ebase_pa);


> 
> * add an API to memblock to set lower limit for allocations and then set
> the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> add complexity for configurations with relocatable kernel and kaslr.

This option looks more like a workaround to me, but maybe it's ok too.

Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-03-02  3:55                 ` Roman Gushchin
  0 siblings, 0 replies; 49+ messages in thread
From: Roman Gushchin @ 2021-03-02  3:55 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Florian Fainelli, Serge Semin, Thomas Bogendoerfer, Serge Semin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > Hi Serge,
> > 
> > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > Hi folks,
> > > What you've got here seems a more complicated problem than it
> > > could originally look like. Please, see my comments below.
> > > 
> > > (Note I've discarded some of the email logs, which of no interest
> > > to the discovered problem. Please also note that I haven't got any
> > > Broadcom hardware to test out a solution suggested below.)
> > > 
> > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > >> Hi Mike,
> > >>
> > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > >>> Hi Florian,
> > >>>
> > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > >>>>
> > > 
> > >>>> [...]
> > > 
> > >>>>
> > >>>> Hi Roman, Thomas and other linux-mips folks,
> > >>>>
> > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > >>>> not see a warning like this one in the commit message, instead what
> > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > >>>> registered, and the system eventually not booting.
> > >>>>
> > >>>> The Device Tree is built-into the kernel image and resides at
> > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > >>>>
> > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > 
> > > Most likely the problem you've discovered has been there for quite
> > > some time. The patch you are referring to just caused it to be
> > > triggered by extending the early allocation range. See before that
> > > patch was accepted the early memory allocations had been performed
> > > in the range:
> > > [kernel_end, RAM_END].
> > > The patch changed that, so the early allocations are done within
> > > [RAM_START + PAGE_SIZE, RAM_END].
> > > 
> > > In normal situations it's safe to do that as long as all the critical
> > > memory regions (including the memory residing a space below the
> > > kernel) have been reserved. But as soon as a memory with some critical
> > > structures haven't been reserved, the kernel may allocate it to be used
> > > for instance for early initializations with obviously unpredictable but
> > > most of the times unpleasant consequences.
> > > 
> > >>>
> > >>> Apparently there is a memblock allocation in one of the functions called
> > >>> from arch_mem_init() between plat_mem_setup() and
> > >>> early_init_fdt_reserve_self().
> > > 
> > > Mike, alas according to the log provided by Florian that's not the reason
> > > of the problem. Please, see my considerations below.
> > > 
> > >> [...]
> > >>
> > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > >> Feb 28 10:01:50 PST 2021
> > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > >> [    0.000000] FPU revision is: 00130001
> > > 
> > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > >> early_init_dt_scan_memory+0x160/0x1e0
> > > 
> > > Here the memory has been added to the memblock allocator.
> > > 
> > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > 
> > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > >> setup_arch+0x128/0x69c
> > > 
> > > Here the fdt memory has been reserved. (Note it's built into the
> > > kernel.)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > >> setup_arch+0x1f8/0x69c
> > > 
> > > Here the kernel itself together with built-in dtb have been reserved.
> > > So far so good.
> > > 
> > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > >> memblock_alloc_range_nid+0xf8/0x198
> > > 
> > > The log above most likely belongs to the call-chain:
> > > setup_arch()
> > > +-> arch_mem_init()
> > >     +-> device_tree_init() - BMIPS specific method
> > >         +-> unflatten_and_copy_device_tree()
> > > 
> > > So to speak here we've copied the fdt from the original space
> > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > it to [0x00003aa4-0x0000ba4b].
> > > 
> > > The problem is that a bit later the next call-chain is performed:
> > > setup_arch()
> > > +-> plat_smp_setup()
> > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > >         +-> if (!board_ebase_setup)
> > >                  board_ebase_setup = &bmips_ebase_setup;
> > > 
> > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > method is called. What trap_init() does isn't compatible with the
> > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > See the next comment.
> > > 
> > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > >> from=0x00000000 max_addr=0x00000000
> > >> early_init_dt_alloc_memory_arch+0x40/0x84
> 
> ...
> 
> > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > >> bytes, linear)
> > > 
> > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > >> trap_init+0x70/0x4e8
> > > 
> > > Most likely someplace here the corruption has happened. The log above
> > > has just reserved a memory for NMI/reset vectors:
> > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > 
> > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > which has been initialized with bmips_ebase_setup() earlier and which
> > > overwrites the ebase variable with: 0x80001000 as this is
> > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > corruption of the memory above 0x80001000, which as we have discovered
> > > belongs to fdt and unflattened device tree.
> > > 
> > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > >> cma-reserved, 1835008K highmem)
> > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > >> is 25 jiffies.
> > >> [    0.000000] NR_IRQS: 256
> > > 
> > >> [    0.000000] OF: Bad cell count for /rdb
> > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > 
> > > So here is the first time we have got the consequence of the corruption
> > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > got much less obvious log here up to getting a crash at some place
> > > further...
> > > 
> > >> [    0.000000] random: get_random_bytes called from
> > >> start_kernel+0x444/0x654 with crng_init=0
> > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > >> wraps every 8589934590000000ns
> > > 
> > >>
> > >> and with your patch applied which unfortunately did not work we have the
> > >> following:
> > >>
> > >> [...]
> > > 
> > > So a patch like this shall workaround the corruption:
> > > 
> > > --- a/arch/mips/bmips/setup.c
> > > +++ b/arch/mips/bmips/setup.c
> > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > >  
> > >  	__dt_setup_arch(dtb);
> > >  
> > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > +
> > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > >  					     q->compatible)) {
> > 
> > This patch works, thanks a lot for the troubleshooting and analysis! How
> > about the following which would be more generic and works as well and
> > should be more universal since it does not require each architecture to
> > provide an appropriate call to memblock_reserve():
> > 
> > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > index e0352958e2f7..b0a173b500e8 100644
> > --- a/arch/mips/kernel/traps.c
> > +++ b/arch/mips/kernel/traps.c
> > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > 
> >         if (!cpu_has_mips_r2_r6) {
> >                 ebase = CAC_BASE;
> > -               ebase_pa = virt_to_phys((void *)ebase);
> >                 vec_size = 0x400;
> > -
> > -               memblock_reserve(ebase_pa, vec_size);
> >         } else {
> >                 if (cpu_has_veic || cpu_has_vint)
> >                         vec_size = 0x200 + VECTORSPACING*64;
> > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > 
> >         if (board_ebase_setup)
> >                 board_ebase_setup();
> > +
> > +       /* board_ebase_setup() can change the exception base address
> > +        * reserve it now after changes were made.
> > +        */
> > +       if (!cpu_has_mips_r2_r6) {
> > +               ebase_pa = virt_to_phys((void *)ebase);
> > +               memblock_reserve(ebase_pa, vec_size);
> > +       }

Hi folks!

First, I'm really sorry for breaking things and also being silent for last
couple of days: I was almost completely offline. Thank you for working on
this!

> 
> With this it's still possible to have memblock allocations around ebase_pa
> before it is reserved.
> 
> I think we have two options here to solve it in more or less generic way:
> 
> * split the reservation of ebase from traps_init() and move it earlier to
> setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> allocate memory it would not work.

It seems that it doesn't allocate any memory, so it sounds like a good option.
But doesn't the ebase initialization depend on the memblock allocator?

I see in trap_init():
    if (!cpu_has_mips_r2_r6) {
        ...
    } else {
        ...
	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
	...
	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
	    ebase = CKSEG0ADDR(ebase_pa);
        else
            ebase = (unsigned long)phys_to_virt(ebase_pa);


> 
> * add an API to memblock to set lower limit for allocations and then set
> the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> add complexity for configurations with relocatable kernel and kaslr.

This option looks more like a workaround to me, but maybe it's ok too.

Thanks!


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-01  9:22               ` Serge Semin
@ 2021-03-02  4:09                 ` Florian Fainelli
  -1 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2021-03-02  4:09 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE



On 3/1/2021 1:22 AM, Serge Semin wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
>> Hi Serge,
>>
>> On 2/28/2021 3:08 PM, Serge Semin wrote:
>>> Hi folks,
>>> What you've got here seems a more complicated problem than it
>>> could originally look like. Please, see my comments below.
>>>
>>> (Note I've discarded some of the email logs, which of no interest
>>> to the discovered problem. Please also note that I haven't got any
>>> Broadcom hardware to test out a solution suggested below.)
>>>
>>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>>>> Hi Mike,
>>>>
>>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>>>
>>>
>>>>>> [...]
>>>
>>>>>>
>>>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>>>
>>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>>>> not see a warning like this one in the commit message, instead what
>>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>>>> registered, and the system eventually not booting.
>>>>>>
>>>>>> The Device Tree is built-into the kernel image and resides at
>>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>>>
>>>>>> Do you have any idea what could be wrong with MIPS specifically here?
>>>
>>> Most likely the problem you've discovered has been there for quite
>>> some time. The patch you are referring to just caused it to be
>>> triggered by extending the early allocation range. See before that
>>> patch was accepted the early memory allocations had been performed
>>> in the range:
>>> [kernel_end, RAM_END].
>>> The patch changed that, so the early allocations are done within
>>> [RAM_START + PAGE_SIZE, RAM_END].
>>>
>>> In normal situations it's safe to do that as long as all the critical
>>> memory regions (including the memory residing a space below the
>>> kernel) have been reserved. But as soon as a memory with some critical
>>> structures haven't been reserved, the kernel may allocate it to be used
>>> for instance for early initializations with obviously unpredictable but
>>> most of the times unpleasant consequences.
>>>
>>>>>
>>>>> Apparently there is a memblock allocation in one of the functions called
>>>>> from arch_mem_init() between plat_mem_setup() and
>>>>> early_init_fdt_reserve_self().
>>>
>>> Mike, alas according to the log provided by Florian that's not the reason
>>> of the problem. Please, see my considerations below.
>>>
>>>> [...]
>>>>
>>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>>>> Feb 28 10:01:50 PST 2021
>>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>>>> [    0.000000] FPU revision is: 00130001
>>>
>>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>
>>> Here the memory has been added to the memblock allocator.
>>>
>>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>>>
>>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>>>> setup_arch+0x128/0x69c
>>>
>>> Here the fdt memory has been reserved. (Note it's built into the
>>> kernel.)
>>>
>>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>>>> setup_arch+0x1f8/0x69c
>>>
>>> Here the kernel itself together with built-in dtb have been reserved.
>>> So far so good.
>>>
>>>> [    0.000000] Initrd not found or empty - disabling initrd
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>
>>> The log above most likely belongs to the call-chain:
>>> setup_arch()
>>> +-> arch_mem_init()
>>>     +-> device_tree_init() - BMIPS specific method
>>>         +-> unflatten_and_copy_device_tree()
>>>
>>> So to speak here we've copied the fdt from the original space
>>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
>>> it to [0x00003aa4-0x0000ba4b].
>>>
>>> The problem is that a bit later the next call-chain is performed:
>>> setup_arch()
>>> +-> plat_smp_setup()
>>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>>>         +-> if (!board_ebase_setup)
>>>                  board_ebase_setup = &bmips_ebase_setup;
>>>
>>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
>>> method is called. What trap_init() does isn't compatible with the
>>> allocation performed by the unflatten_and_copy_device_tree() method.
>>> See the next comment.
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>>>> setup_arch+0x3fc/0x69c
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>>>> bytes.
>>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>>>> linesize 32 bytes
>>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000] Early memory node ranges
>>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>>>> [    0.000000] Initmem setup node 0 [mem
>>>> 0x0000000000000000-0x00000000cfffffff]
>>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000
>>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] MEMBLOCK configuration:
>>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>>>> [    0.000000]  memory.cnt  = 0x3
>>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved.cnt  = 0xa
>>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>>>> bytes flags: 0x0
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>>>> pcpu_embed_first_chunk+0x838/0x884
>>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>>>> pcpu_embed_first_chunk+0x850/0x884
>>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>>>> bytes, linear)
>>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>>>> bytes, linear)
>>>
>>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>>>> trap_init+0x70/0x4e8
>>>
>>> Most likely someplace here the corruption has happened. The log above
>>> has just reserved a memory for NMI/reset vectors:
>>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
>>>
>>> But then the board_ebase_setup() pointer is dereferenced and called,
>>> which has been initialized with bmips_ebase_setup() earlier and which
>>> overwrites the ebase variable with: 0x80001000 as this is
>>> CPU_BMIPS5000 CPU. So any further calls of the functions like
>>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
>>> corruption of the memory above 0x80001000, which as we have discovered
>>> belongs to fdt and unflattened device tree.
>>>
>>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>>>> cma-reserved, 1835008K highmem)
>>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>>>> [    0.000000] rcu: Hierarchical RCU implementation.
>>>> [    0.000000] rcu:     RCU event tracing is enabled.
>>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>>>> is 25 jiffies.
>>>> [    0.000000] NR_IRQS: 256
>>>
>>>> [    0.000000] OF: Bad cell count for /rdb
>>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>>>> [    0.000000] OF: of_irq_init: children remain, but no parents
>>>
>>> So here is the first time we have got the consequence of the corruption
>>> popped up. Luckily it's just the "Bad cells count" error. We could have
>>> got much less obvious log here up to getting a crash at some place
>>> further...
>>>
>>>> [    0.000000] random: get_random_bytes called from
>>>> start_kernel+0x444/0x654 with crng_init=0
>>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>>>> wraps every 8589934590000000ns
>>>
>>>>
>>>> and with your patch applied which unfortunately did not work we have the
>>>> following:
>>>>
>>>> [...]
>>>
>>> So a patch like this shall workaround the corruption:
>>>
>>> --- a/arch/mips/bmips/setup.c
>>> +++ b/arch/mips/bmips/setup.c
>>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>>>  
>>>  	__dt_setup_arch(dtb);
>>>  
>>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
>>> +
>>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>>>  					     q->compatible)) {
>>
> 
>> This patch works, thanks a lot for the troubleshooting and analysis! How
>> about the following which would be more generic and works as well and
>> should be more universal since it does not require each architecture to
>> provide an appropriate call to memblock_reserve():
> 
> Hm, are you sure it's working?

I was until I noticed that I was working on top of a revert of Roman's
patch sorry about the brain fart here.

> If so, my analysis hasn't been quite
> correct. My suggestion was based on the memory initializations,
> allocations and reservations trace. So here is the sequence of most
> crucial of them:
> 1) Memblock initialization:
>    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
>    (At this point I suggested to place the exceptions memory
>     reservation.)
> 2) Base FDT memory reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
>    reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> 4) Reserve kernel itself, some critical sections like initrd and
>    crash-kernel:
>    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> 5) Copy and unflatten the built-into the kernel device tree
>    (BMIPS-platform code):
>    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
>    This is the very first time an allocation from the memblock pool
>    is performed. Since we haven't reserved a memory for the exception
>    vectors yet, the memblock allocator is free to return that memory
>    range for any other use. Needless to say if we try to use that memory
>    later without consulting with memblock, we may and in our case
>    will get into troubles.
> 6) Many random early memblock allocations for kernel use before
>    buddy and sl*b allocators are up and running...
>    Note if for some fortunate reason the allocations made in 5) didn't
>    overlap the exceptions memory, here we have much more chances to
>    do that with obviously fatal consequences of the ranges independent
>    usage.
> 7) Trap/exception vectors initialization and !memory reservation! for
>    them:
>    start_kernel()->trap_init()
>    Only at this point we get to reserve the memory for the vectors.
> 8) Init and run buddy/sl*b allocators:
>    start_kernel()->mm_init()->...mem_init()...
> 
> There are a lot of allocations done in 5) and 6) before the
> trap_init() is called in 7). You can see that in your log. That's why
> I have doubts that your patch worked well. Most likely you've
> forgotten to revert the workaround suggested by me in the previous
> message. Could you make sure that you didn't and re-test your patch
> again? If it still works then I might have confused something and it's
> strange that my patch worked in the first place...

I would like to submit a fix for 5.12-rc1 and get it back ported into
5.11 so we have BMIPS machines boot again, that will be essentially your
earlier proposed fix.

BMIPS is the only "legacy" MIPS platform that defines an exception base,
so while this problem may certainly exist with other platforms, I do
wonder how likely it is there, though?

> 
> A food for thoughts for everyone (Thomas, Mark, please join the
> discussion). What we've got here is a bit bigger problem. AFAICS
> if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> performs the allocation above the very first PAGE_SIZE memory chunk
> (see that method code for details). So we are currently on a safe side
> for some older MIPS platforms. But the platform with VEIC/VINT may get
> into the same troubles here if they didn't reserve exception memory
> early enough before the kernel starts random allocations from
> memblock. So we either need to provide a generic workaround for that
> or make sure each platform gets to reserve vectors itself for instance
> in the plat_mem_setup() method.
> 
> -Sergey
> 
>>
>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>> index e0352958e2f7..b0a173b500e8 100644
>> --- a/arch/mips/kernel/traps.c
>> +++ b/arch/mips/kernel/traps.c
>> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
>>
>>         if (!cpu_has_mips_r2_r6) {
>>                 ebase = CAC_BASE;
>> -               ebase_pa = virt_to_phys((void *)ebase);
>>                 vec_size = 0x400;
>> -
>> -               memblock_reserve(ebase_pa, vec_size);
>>         } else {
>>                 if (cpu_has_veic || cpu_has_vint)
>>                         vec_size = 0x200 + VECTORSPACING*64;
>> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
>>
>>         if (board_ebase_setup)
>>                 board_ebase_setup();
>> +
>> +       /* board_ebase_setup() can change the exception base address
>> +        * reserve it now after changes were made.
>> +        */
>> +       if (!cpu_has_mips_r2_r6) {
>> +               ebase_pa = virt_to_phys((void *)ebase);
>> +               memblock_reserve(ebase_pa, vec_size);
>> +       }
>>         per_cpu_trap_init(true);
>>         memblock_set_bottom_up(false);
>> -- 
>> Florian

-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
@ 2021-03-02  4:09                 ` Florian Fainelli
  0 siblings, 0 replies; 49+ messages in thread
From: Florian Fainelli @ 2021-03-02  4:09 UTC (permalink / raw)
  To: Serge Semin, Mike Rapoport, Thomas Bogendoerfer
  Cc: Serge Semin, Roman Gushchin, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE



On 3/1/2021 1:22 AM, Serge Semin wrote:
> On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
>> Hi Serge,
>>
>> On 2/28/2021 3:08 PM, Serge Semin wrote:
>>> Hi folks,
>>> What you've got here seems a more complicated problem than it
>>> could originally look like. Please, see my comments below.
>>>
>>> (Note I've discarded some of the email logs, which of no interest
>>> to the discovered problem. Please also note that I haven't got any
>>> Broadcom hardware to test out a solution suggested below.)
>>>
>>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
>>>> Hi Mike,
>>>>
>>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
>>>>> Hi Florian,
>>>>>
>>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
>>>>>>
>>>
>>>>>> [...]
>>>
>>>>>>
>>>>>> Hi Roman, Thomas and other linux-mips folks,
>>>>>>
>>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
>>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
>>>>>> not see a warning like this one in the commit message, instead what
>>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
>>>>>> of the "rdb" node and leading to the interrupt controllers not being
>>>>>> registered, and the system eventually not booting.
>>>>>>
>>>>>> The Device Tree is built-into the kernel image and resides at
>>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
>>>>>>
>>>>>> Do you have any idea what could be wrong with MIPS specifically here?
>>>
>>> Most likely the problem you've discovered has been there for quite
>>> some time. The patch you are referring to just caused it to be
>>> triggered by extending the early allocation range. See before that
>>> patch was accepted the early memory allocations had been performed
>>> in the range:
>>> [kernel_end, RAM_END].
>>> The patch changed that, so the early allocations are done within
>>> [RAM_START + PAGE_SIZE, RAM_END].
>>>
>>> In normal situations it's safe to do that as long as all the critical
>>> memory regions (including the memory residing a space below the
>>> kernel) have been reserved. But as soon as a memory with some critical
>>> structures haven't been reserved, the kernel may allocate it to be used
>>> for instance for early initializations with obviously unpredictable but
>>> most of the times unpleasant consequences.
>>>
>>>>>
>>>>> Apparently there is a memblock allocation in one of the functions called
>>>>> from arch_mem_init() between plat_mem_setup() and
>>>>> early_init_fdt_reserve_self().
>>>
>>> Mike, alas according to the log provided by Florian that's not the reason
>>> of the problem. Please, see my considerations below.
>>>
>>>> [...]
>>>>
>>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
>>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
>>>> Feb 28 10:01:50 PST 2021
>>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
>>>> [    0.000000] FPU revision is: 00130001
>>>
>>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
>>>> early_init_dt_scan_memory+0x160/0x1e0
>>>
>>> Here the memory has been added to the memblock allocator.
>>>
>>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
>>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
>>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
>>>
>>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
>>>> setup_arch+0x128/0x69c
>>>
>>> Here the fdt memory has been reserved. (Note it's built into the
>>> kernel.)
>>>
>>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
>>>> setup_arch+0x1f8/0x69c
>>>
>>> Here the kernel itself together with built-in dtb have been reserved.
>>> So far so good.
>>>
>>>> [    0.000000] Initrd not found or empty - disabling initrd
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>
>>> The log above most likely belongs to the call-chain:
>>> setup_arch()
>>> +-> arch_mem_init()
>>>     +-> device_tree_init() - BMIPS specific method
>>>         +-> unflatten_and_copy_device_tree()
>>>
>>> So to speak here we've copied the fdt from the original space
>>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
>>> it to [0x00003aa4-0x0000ba4b].
>>>
>>> The problem is that a bit later the next call-chain is performed:
>>> setup_arch()
>>> +-> plat_smp_setup()
>>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
>>>         +-> if (!board_ebase_setup)
>>>                  board_ebase_setup = &bmips_ebase_setup;
>>>
>>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
>>> method is called. What trap_init() does isn't compatible with the
>>> allocation performed by the unflatten_and_copy_device_tree() method.
>>> See the next comment.
>>>
>>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
>>>> from=0x00000000 max_addr=0x00000000
>>>> early_init_dt_alloc_memory_arch+0x40/0x84
>>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
>>>> setup_arch+0x3fc/0x69c
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
>>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
>>>> bytes.
>>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
>>>> linesize 32 bytes
>>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
>>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Zone ranges:
>>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
>>>> [    0.000000] Movable zone start for each node
>>>> [    0.000000] Early memory node ranges
>>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
>>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
>>>> [    0.000000] Initmem setup node 0 [mem
>>>> 0x0000000000000000-0x00000000cfffffff]
>>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000
>>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
>>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
>>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
>>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] MEMBLOCK configuration:
>>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
>>>> [    0.000000]  memory.cnt  = 0x3
>>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved.cnt  = 0xa
>>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
>>>> bytes flags: 0x0
>>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
>>>> bytes flags: 0x0
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
>>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
>>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
>>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
>>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
>>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
>>>> pcpu_embed_first_chunk+0x7a0/0x884
>>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
>>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
>>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
>>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
>>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
>>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
>>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
>>>> pcpu_embed_first_chunk+0x838/0x884
>>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
>>>> pcpu_embed_first_chunk+0x850/0x884
>>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
>>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
>>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
>>>> bytes, linear)
>>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
>>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
>>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
>>>> memblock_alloc_range_nid+0xf8/0x198
>>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
>>>> bytes, linear)
>>>
>>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
>>>> trap_init+0x70/0x4e8
>>>
>>> Most likely someplace here the corruption has happened. The log above
>>> has just reserved a memory for NMI/reset vectors:
>>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
>>>
>>> But then the board_ebase_setup() pointer is dereferenced and called,
>>> which has been initialized with bmips_ebase_setup() earlier and which
>>> overwrites the ebase variable with: 0x80001000 as this is
>>> CPU_BMIPS5000 CPU. So any further calls of the functions like
>>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
>>> corruption of the memory above 0x80001000, which as we have discovered
>>> belongs to fdt and unflattened device tree.
>>>
>>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
>>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
>>>> cma-reserved, 1835008K highmem)
>>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>>>> [    0.000000] rcu: Hierarchical RCU implementation.
>>>> [    0.000000] rcu:     RCU event tracing is enabled.
>>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
>>>> is 25 jiffies.
>>>> [    0.000000] NR_IRQS: 256
>>>
>>>> [    0.000000] OF: Bad cell count for /rdb
>>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
>>>> [    0.000000] OF: of_irq_init: children remain, but no parents
>>>
>>> So here is the first time we have got the consequence of the corruption
>>> popped up. Luckily it's just the "Bad cells count" error. We could have
>>> got much less obvious log here up to getting a crash at some place
>>> further...
>>>
>>>> [    0.000000] random: get_random_bytes called from
>>>> start_kernel+0x444/0x654 with crng_init=0
>>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
>>>> wraps every 8589934590000000ns
>>>
>>>>
>>>> and with your patch applied which unfortunately did not work we have the
>>>> following:
>>>>
>>>> [...]
>>>
>>> So a patch like this shall workaround the corruption:
>>>
>>> --- a/arch/mips/bmips/setup.c
>>> +++ b/arch/mips/bmips/setup.c
>>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
>>>  
>>>  	__dt_setup_arch(dtb);
>>>  
>>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
>>> +
>>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>>>  					     q->compatible)) {
>>
> 
>> This patch works, thanks a lot for the troubleshooting and analysis! How
>> about the following which would be more generic and works as well and
>> should be more universal since it does not require each architecture to
>> provide an appropriate call to memblock_reserve():
> 
> Hm, are you sure it's working?

I was until I noticed that I was working on top of a revert of Roman's
patch sorry about the brain fart here.

> If so, my analysis hasn't been quite
> correct. My suggestion was based on the memory initializations,
> allocations and reservations trace. So here is the sequence of most
> crucial of them:
> 1) Memblock initialization:
>    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
>    (At this point I suggested to place the exceptions memory
>     reservation.)
> 2) Base FDT memory reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
>    reservation:
>    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> 4) Reserve kernel itself, some critical sections like initrd and
>    crash-kernel:
>    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> 5) Copy and unflatten the built-into the kernel device tree
>    (BMIPS-platform code):
>    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
>    This is the very first time an allocation from the memblock pool
>    is performed. Since we haven't reserved a memory for the exception
>    vectors yet, the memblock allocator is free to return that memory
>    range for any other use. Needless to say if we try to use that memory
>    later without consulting with memblock, we may and in our case
>    will get into troubles.
> 6) Many random early memblock allocations for kernel use before
>    buddy and sl*b allocators are up and running...
>    Note if for some fortunate reason the allocations made in 5) didn't
>    overlap the exceptions memory, here we have much more chances to
>    do that with obviously fatal consequences of the ranges independent
>    usage.
> 7) Trap/exception vectors initialization and !memory reservation! for
>    them:
>    start_kernel()->trap_init()
>    Only at this point we get to reserve the memory for the vectors.
> 8) Init and run buddy/sl*b allocators:
>    start_kernel()->mm_init()->...mem_init()...
> 
> There are a lot of allocations done in 5) and 6) before the
> trap_init() is called in 7). You can see that in your log. That's why
> I have doubts that your patch worked well. Most likely you've
> forgotten to revert the workaround suggested by me in the previous
> message. Could you make sure that you didn't and re-test your patch
> again? If it still works then I might have confused something and it's
> strange that my patch worked in the first place...

I would like to submit a fix for 5.12-rc1 and get it back ported into
5.11 so we have BMIPS machines boot again, that will be essentially your
earlier proposed fix.

BMIPS is the only "legacy" MIPS platform that defines an exception base,
so while this problem may certainly exist with other platforms, I do
wonder how likely it is there, though?

> 
> A food for thoughts for everyone (Thomas, Mark, please join the
> discussion). What we've got here is a bit bigger problem. AFAICS
> if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> performs the allocation above the very first PAGE_SIZE memory chunk
> (see that method code for details). So we are currently on a safe side
> for some older MIPS platforms. But the platform with VEIC/VINT may get
> into the same troubles here if they didn't reserve exception memory
> early enough before the kernel starts random allocations from
> memblock. So we either need to provide a generic workaround for that
> or make sure each platform gets to reserve vectors itself for instance
> in the plat_mem_setup() method.
> 
> -Sergey
> 
>>
>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
>> index e0352958e2f7..b0a173b500e8 100644
>> --- a/arch/mips/kernel/traps.c
>> +++ b/arch/mips/kernel/traps.c
>> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
>>
>>         if (!cpu_has_mips_r2_r6) {
>>                 ebase = CAC_BASE;
>> -               ebase_pa = virt_to_phys((void *)ebase);
>>                 vec_size = 0x400;
>> -
>> -               memblock_reserve(ebase_pa, vec_size);
>>         } else {
>>                 if (cpu_has_veic || cpu_has_vint)
>>                         vec_size = 0x200 + VECTORSPACING*64;
>> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
>>
>>         if (board_ebase_setup)
>>                 board_ebase_setup();
>> +
>> +       /* board_ebase_setup() can change the exception base address
>> +        * reserve it now after changes were made.
>> +        */
>> +       if (!cpu_has_mips_r2_r6) {
>> +               ebase_pa = virt_to_phys((void *)ebase);
>> +               memblock_reserve(ebase_pa, vec_size);
>> +       }
>>         per_cpu_trap_init(true);
>>         memblock_set_bottom_up(false);
>> -- 
>> Florian

-- 
Florian


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-01  9:22               ` Serge Semin
  (?)
  (?)
@ 2021-03-02  4:19               ` Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
                                   ` (3 more replies)
  -1 siblings, 4 replies; 49+ messages in thread
From: Florian Fainelli @ 2021-03-02  4:19 UTC (permalink / raw)
  To: linux-mips
  Cc: rppt, fancer.lancer, guro, akpm, paul, Florian Fainelli,
	Serge Semin, Kamal Dasu, Thomas Bogendoerfer, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

BMIPS is one of the few platforms that do change the exception base.
After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
with kernel_end") we started seeing BMIPS boards fail to boot with the
built-in FDT being corrupted.

Before the cited commit, early allocations would be in the [kernel_end,
RAM_END] range, but after commit they would be within [RAM_START +
PAGE_SIZE, RAM_END].

The custom exception base handler that is installed by
bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
memory region allocated by unflatten_and_copy_device_tree() thus
corrupting the FDT used by the kernel.

To fix this, we need to perform an early reservation of the custom
exception that is going to be installed and this needs to happen at
plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
finds a space that is suitable, away from reserved memory.

Huge thanks to Serget for analysing and proposing a solution to this
issue.

Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Thomas,

This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
by the stable team for 5.11. We should find a safer way to avoid these
problems for 5.13 maybe.

 arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
 arch/mips/include/asm/traps.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
index 31bcfa4e08b9..0088bd45b892 100644
--- a/arch/mips/bmips/setup.c
+++ b/arch/mips/bmips/setup.c
@@ -149,6 +149,26 @@ void __init plat_time_init(void)
 	mips_hpt_frequency = freq;
 }
 
+static void __init bmips_ebase_reserve(void)
+{
+	phys_addr_t base, size = VECTORSPACING * 64;
+
+	switch (current_cpu_type()) {
+	default:
+	case CPU_BMIPS4350:
+		return;
+	case CPU_BMIPS3300:
+	case CPU_BMIPS4380:
+		base = 0x0400;
+		break;
+	case CPU_BMIPS5000:
+		base = 0x1000;
+		break;
+	}
+
+	memblock_reserve(base, size);
+}
+
 void __init plat_mem_setup(void)
 {
 	void *dtb;
@@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
 
 	__dt_setup_arch(dtb);
 
+	bmips_ebase_reserve();
+
 	for (q = bmips_quirk_list; q->quirk_fn; q++) {
 		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
 					     q->compatible)) {
diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..0ba6bb7f9618 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -14,6 +14,8 @@
 #define MIPS_BE_FIXUP	1		/* return to the fixup code */
 #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
 
+#define VECTORSPACING 0x100	/* for EI/VI mode */
+
 extern void (*board_be_init)(void);
 extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
@ 2021-03-02  8:09                 ` Mike Rapoport
  2021-03-02 13:54                 ` Serge Semin
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Mike Rapoport @ 2021-03-02  8:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.
> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  
> +#define VECTORSPACING 0x100	/* for EI/VI mode */
> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-02  3:55                 ` Roman Gushchin
  (?)
@ 2021-03-02 13:08                 ` Serge Semin
  -1 siblings, 0 replies; 49+ messages in thread
From: Serge Semin @ 2021-03-02 13:08 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Serge Semin, Mike Rapoport, Florian Fainelli,
	Thomas Bogendoerfer, Andrew Morton, linux-mm, Kamal Dasu,
	Paul Cercueil, Jiaxun Yang, iamjoonsoo.kim, riel, Michal Hocko,
	linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 07:55:21PM -0800, Roman Gushchin wrote:
> On Mon, Mar 01, 2021 at 11:45:42AM +0200, Mike Rapoport wrote:
> > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> > > Hi Serge,
> > > 
> > > On 2/28/2021 3:08 PM, Serge Semin wrote:
> > > > Hi folks,
> > > > What you've got here seems a more complicated problem than it
> > > > could originally look like. Please, see my comments below.
> > > > 
> > > > (Note I've discarded some of the email logs, which of no interest
> > > > to the discovered problem. Please also note that I haven't got any
> > > > Broadcom hardware to test out a solution suggested below.)
> > > > 
> > > > On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> > > >> Hi Mike,
> > > >>
> > > >> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> > > >>> Hi Florian,
> > > >>>
> > > >>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> > > >>>>
> > > > 
> > > >>>> [...]
> > > > 
> > > >>>>
> > > >>>> Hi Roman, Thomas and other linux-mips folks,
> > > >>>>
> > > >>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> > > >>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> > > >>>> not see a warning like this one in the commit message, instead what
> > > >>>> happens appear to be a corrupted Device Tree which prevents the parsing
> > > >>>> of the "rdb" node and leading to the interrupt controllers not being
> > > >>>> registered, and the system eventually not booting.
> > > >>>>
> > > >>>> The Device Tree is built-into the kernel image and resides at
> > > >>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> > > >>>>
> > > >>>> Do you have any idea what could be wrong with MIPS specifically here?
> > > > 
> > > > Most likely the problem you've discovered has been there for quite
> > > > some time. The patch you are referring to just caused it to be
> > > > triggered by extending the early allocation range. See before that
> > > > patch was accepted the early memory allocations had been performed
> > > > in the range:
> > > > [kernel_end, RAM_END].
> > > > The patch changed that, so the early allocations are done within
> > > > [RAM_START + PAGE_SIZE, RAM_END].
> > > > 
> > > > In normal situations it's safe to do that as long as all the critical
> > > > memory regions (including the memory residing a space below the
> > > > kernel) have been reserved. But as soon as a memory with some critical
> > > > structures haven't been reserved, the kernel may allocate it to be used
> > > > for instance for early initializations with obviously unpredictable but
> > > > most of the times unpleasant consequences.
> > > > 
> > > >>>
> > > >>> Apparently there is a memblock allocation in one of the functions called
> > > >>> from arch_mem_init() between plat_mem_setup() and
> > > >>> early_init_fdt_reserve_self().
> > > > 
> > > > Mike, alas according to the log provided by Florian that's not the reason
> > > > of the problem. Please, see my considerations below.
> > > > 
> > > >> [...]
> > > >>
> > > >> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> > > >> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> > > >> Feb 28 10:01:50 PST 2021
> > > >> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> > > >> [    0.000000] FPU revision is: 00130001
> > > > 
> > > >> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > >> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > >> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> > > >> early_init_dt_scan_memory+0x160/0x1e0
> > > > 
> > > > Here the memory has been added to the memblock allocator.
> > > > 
> > > >> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> > > >> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> > > >> [    0.000000] printk: bootconsole [ns16550a0] enabled
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> > > >> setup_arch+0x128/0x69c
> > > > 
> > > > Here the fdt memory has been reserved. (Note it's built into the
> > > > kernel.)
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> > > >> setup_arch+0x1f8/0x69c
> > > > 
> > > > Here the kernel itself together with built-in dtb have been reserved.
> > > > So far so good.
> > > > 
> > > >> [    0.000000] Initrd not found or empty - disabling initrd
> > > > 
> > > >> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > > >> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> > > >> memblock_alloc_range_nid+0xf8/0x198
> > > >> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > > >> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> > > >> memblock_alloc_range_nid+0xf8/0x198
> > > > 
> > > > The log above most likely belongs to the call-chain:
> > > > setup_arch()
> > > > +-> arch_mem_init()
> > > >     +-> device_tree_init() - BMIPS specific method
> > > >         +-> unflatten_and_copy_device_tree()
> > > > 
> > > > So to speak here we've copied the fdt from the original space
> > > > [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> > > > it to [0x00003aa4-0x0000ba4b].
> > > > 
> > > > The problem is that a bit later the next call-chain is performed:
> > > > setup_arch()
> > > > +-> plat_smp_setup()
> > > >     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> > > >         +-> if (!board_ebase_setup)
> > > >                  board_ebase_setup = &bmips_ebase_setup;
> > > > 
> > > > So at the moment of the CPU traps initialization the bmips_ebase_setup()
> > > > method is called. What trap_init() does isn't compatible with the
> > > > allocation performed by the unflatten_and_copy_device_tree() method.
> > > > See the next comment.
> > > > 
> > > >> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> > > >> from=0x00000000 max_addr=0x00000000
> > > >> early_init_dt_alloc_memory_arch+0x40/0x84
> > 
> > ...
> > 
> > > >> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> > > >> bytes, linear)
> > > > 
> > > >> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> > > >> trap_init+0x70/0x4e8
> > > > 
> > > > Most likely someplace here the corruption has happened. The log above
> > > > has just reserved a memory for NMI/reset vectors:
> > > > arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> > > > 
> > > > But then the board_ebase_setup() pointer is dereferenced and called,
> > > > which has been initialized with bmips_ebase_setup() earlier and which
> > > > overwrites the ebase variable with: 0x80001000 as this is
> > > > CPU_BMIPS5000 CPU. So any further calls of the functions like
> > > > set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> > > > corruption of the memory above 0x80001000, which as we have discovered
> > > > belongs to fdt and unflattened device tree.
> > > > 
> > > >> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > > >> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> > > >> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> > > >> cma-reserved, 1835008K highmem)
> > > >> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > > >> [    0.000000] rcu: Hierarchical RCU implementation.
> > > >> [    0.000000] rcu:     RCU event tracing is enabled.
> > > >> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > > >> is 25 jiffies.
> > > >> [    0.000000] NR_IRQS: 256
> > > > 
> > > >> [    0.000000] OF: Bad cell count for /rdb
> > > >> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> > > >> [    0.000000] OF: of_irq_init: children remain, but no parents
> > > > 
> > > > So here is the first time we have got the consequence of the corruption
> > > > popped up. Luckily it's just the "Bad cells count" error. We could have
> > > > got much less obvious log here up to getting a crash at some place
> > > > further...
> > > > 
> > > >> [    0.000000] random: get_random_bytes called from
> > > >> start_kernel+0x444/0x654 with crng_init=0
> > > >> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> > > >> wraps every 8589934590000000ns
> > > > 
> > > >>
> > > >> and with your patch applied which unfortunately did not work we have the
> > > >> following:
> > > >>
> > > >> [...]
> > > > 
> > > > So a patch like this shall workaround the corruption:
> > > > 
> > > > --- a/arch/mips/bmips/setup.c
> > > > +++ b/arch/mips/bmips/setup.c
> > > > @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> > > >  
> > > >  	__dt_setup_arch(dtb);
> > > >  
> > > > +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> > > > +
> > > >  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> > > >  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> > > >  					     q->compatible)) {
> > > 
> > > This patch works, thanks a lot for the troubleshooting and analysis! How
> > > about the following which would be more generic and works as well and
> > > should be more universal since it does not require each architecture to
> > > provide an appropriate call to memblock_reserve():
> > > 
> > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> > > index e0352958e2f7..b0a173b500e8 100644
> > > --- a/arch/mips/kernel/traps.c
> > > +++ b/arch/mips/kernel/traps.c
> > > @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> > > 
> > >         if (!cpu_has_mips_r2_r6) {
> > >                 ebase = CAC_BASE;
> > > -               ebase_pa = virt_to_phys((void *)ebase);
> > >                 vec_size = 0x400;
> > > -
> > > -               memblock_reserve(ebase_pa, vec_size);
> > >         } else {
> > >                 if (cpu_has_veic || cpu_has_vint)
> > >                         vec_size = 0x200 + VECTORSPACING*64;
> > > @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> > > 
> > >         if (board_ebase_setup)
> > >                 board_ebase_setup();
> > > +
> > > +       /* board_ebase_setup() can change the exception base address
> > > +        * reserve it now after changes were made.
> > > +        */
> > > +       if (!cpu_has_mips_r2_r6) {
> > > +               ebase_pa = virt_to_phys((void *)ebase);
> > > +               memblock_reserve(ebase_pa, vec_size);
> > > +       }
> 
> Hi folks!
> 
> First, I'm really sorry for breaking things and also being silent for last
> couple of days: I was almost completely offline. Thank you for working on
> this!
> 
> > 
> > With this it's still possible to have memblock allocations around ebase_pa
> > before it is reserved.
> > 
> > I think we have two options here to solve it in more or less generic way:
> > 
> > * split the reservation of ebase from traps_init() and move it earlier to
> > setup_arch(). I didn't check what board_ebase_setup() do, if they need to
> > allocate memory it would not work.
> 

> It seems that it doesn't allocate any memory, so it sounds like a good option.
> But doesn't the ebase initialization depend on the memblock allocator?
> 
> I see in trap_init():
>     if (!cpu_has_mips_r2_r6) {
>         ...
>     } else {
>         ...
> 	ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
> 	...
> 	if (!IS_ENABLED(CONFIG_EVA) && !WARN_ON(ebase_pa >= 0x20000000))
> 	    ebase = CKSEG0ADDR(ebase_pa);
>         else
>             ebase = (unsigned long)phys_to_virt(ebase_pa);

Yeap, this seems like the best option for now. Of course we need to
reserve the memory only if the system needs that like in case of non
MIPS R2-R5 archs. In addition a custom ebase value must be taken into
account. The later is the hardest part to achieve. ebase is a global
variable. So we need to thoroughly scan all the MIPS platforms which
update it and make sure it's done before the reservation is
performed. 

> 
> 
> > 
> > * add an API to memblock to set lower limit for allocations and then set
> > the lower limit, to e.g. kernel load address in arch_mem_init(). This may
> > add complexity for configurations with relocatable kernel and kaslr.
> 
> This option looks more like a workaround to me, but maybe it's ok too.

Agree. The first one is better.

-Sergey

> 
> Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end
  2021-03-02  4:09                 ` Florian Fainelli
  (?)
@ 2021-03-02 13:26                 ` Serge Semin
  -1 siblings, 0 replies; 49+ messages in thread
From: Serge Semin @ 2021-03-02 13:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Serge Semin, Mike Rapoport, Thomas Bogendoerfer, Roman Gushchin,
	Andrew Morton, linux-mm, Kamal Dasu, Paul Cercueil, Jiaxun Yang,
	iamjoonsoo.kim, riel, Michal Hocko, linux-kernel, kernel-team,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE

On Mon, Mar 01, 2021 at 08:09:52PM -0800, Florian Fainelli wrote:
> 
> 
> On 3/1/2021 1:22 AM, Serge Semin wrote:
> > On Sun, Feb 28, 2021 at 07:50:45PM -0800, Florian Fainelli wrote:
> >> Hi Serge,
> >>
> >> On 2/28/2021 3:08 PM, Serge Semin wrote:
> >>> Hi folks,
> >>> What you've got here seems a more complicated problem than it
> >>> could originally look like. Please, see my comments below.
> >>>
> >>> (Note I've discarded some of the email logs, which of no interest
> >>> to the discovered problem. Please also note that I haven't got any
> >>> Broadcom hardware to test out a solution suggested below.)
> >>>
> >>> On Sun, Feb 28, 2021 at 10:19:51AM -0800, Florian Fainelli wrote:
> >>>> Hi Mike,
> >>>>
> >>>> On 2/28/2021 1:00 AM, Mike Rapoport wrote:
> >>>>> Hi Florian,
> >>>>>
> >>>>> On Sat, Feb 27, 2021 at 08:18:47PM -0800, Florian Fainelli wrote:
> >>>>>>
> >>>
> >>>>>> [...]
> >>>
> >>>>>>
> >>>>>> Hi Roman, Thomas and other linux-mips folks,
> >>>>>>
> >>>>>> Kamal and myself have been unable to boot v5.11 on MIPS since this
> >>>>>> commit, reverting it makes our MIPS platforms boot successfully. We do
> >>>>>> not see a warning like this one in the commit message, instead what
> >>>>>> happens appear to be a corrupted Device Tree which prevents the parsing
> >>>>>> of the "rdb" node and leading to the interrupt controllers not being
> >>>>>> registered, and the system eventually not booting.
> >>>>>>
> >>>>>> The Device Tree is built-into the kernel image and resides at
> >>>>>> arch/mips/boot/dts/brcm/bcm97435svmb.dts.
> >>>>>>
> >>>>>> Do you have any idea what could be wrong with MIPS specifically here?
> >>>
> >>> Most likely the problem you've discovered has been there for quite
> >>> some time. The patch you are referring to just caused it to be
> >>> triggered by extending the early allocation range. See before that
> >>> patch was accepted the early memory allocations had been performed
> >>> in the range:
> >>> [kernel_end, RAM_END].
> >>> The patch changed that, so the early allocations are done within
> >>> [RAM_START + PAGE_SIZE, RAM_END].
> >>>
> >>> In normal situations it's safe to do that as long as all the critical
> >>> memory regions (including the memory residing a space below the
> >>> kernel) have been reserved. But as soon as a memory with some critical
> >>> structures haven't been reserved, the kernel may allocate it to be used
> >>> for instance for early initializations with obviously unpredictable but
> >>> most of the times unpleasant consequences.
> >>>
> >>>>>
> >>>>> Apparently there is a memblock allocation in one of the functions called
> >>>>> from arch_mem_init() between plat_mem_setup() and
> >>>>> early_init_fdt_reserve_self().
> >>>
> >>> Mike, alas according to the log provided by Florian that's not the reason
> >>> of the problem. Please, see my considerations below.
> >>>
> >>>> [...]
> >>>>
> >>>> [    0.000000] Linux version 5.11.0-g5695e5161974 (florian@localhost)
> >>>> (mipsel-linux-gcc (GCC) 8.3.0, GNU ld (GNU Binutils) 2.32) #84 SMP Sun
> >>>> Feb 28 10:01:50 PST 2021
> >>>> [    0.000000] CPU0 revision is: 00025b00 (Broadcom BMIPS5200)
> >>>> [    0.000000] FPU revision is: 00130001
> >>>
> >>>> [    0.000000] memblock_add: [0x00000000-0x0fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [    0.000000] memblock_add: [0x20000000-0x4fffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>> [    0.000000] memblock_add: [0x90000000-0xcfffffff]
> >>>> early_init_dt_scan_memory+0x160/0x1e0
> >>>
> >>> Here the memory has been added to the memblock allocator.
> >>>
> >>>> [    0.000000] MIPS: machine is Broadcom BCM97435SVMB
> >>>> [    0.000000] earlycon: ns16550a0 at MMIO32 0x10406b00 (options '')
> >>>> [    0.000000] printk: bootconsole [ns16550a0] enabled
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00aa7600-0x00aaa0a0]
> >>>> setup_arch+0x128/0x69c
> >>>
> >>> Here the fdt memory has been reserved. (Note it's built into the
> >>> kernel.)
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00010000-0x018313cf]
> >>>> setup_arch+0x1f8/0x69c
> >>>
> >>> Here the kernel itself together with built-in dtb have been reserved.
> >>> So far so good.
> >>>
> >>>> [    0.000000] Initrd not found or empty - disabling initrd
> >>>
> >>>> [    0.000000] memblock_alloc_try_nid: 10913 bytes align=0x40 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x00001000-0x00003aa0]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32680 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x00003aa4-0x0000ba4b]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>
> >>> The log above most likely belongs to the call-chain:
> >>> setup_arch()
> >>> +-> arch_mem_init()
> >>>     +-> device_tree_init() - BMIPS specific method
> >>>         +-> unflatten_and_copy_device_tree()
> >>>
> >>> So to speak here we've copied the fdt from the original space
> >>> [0x00aa7600-0x00aaa0a0] into [0x00001000-0x00003aa0] and unflattened
> >>> it to [0x00003aa4-0x0000ba4b].
> >>>
> >>> The problem is that a bit later the next call-chain is performed:
> >>> setup_arch()
> >>> +-> plat_smp_setup()
> >>>     +-> mp_ops->smp_setup(); - registered by prom_init()->register_bmips_smp_ops();
> >>>         +-> if (!board_ebase_setup)
> >>>                  board_ebase_setup = &bmips_ebase_setup;
> >>>
> >>> So at the moment of the CPU traps initialization the bmips_ebase_setup()
> >>> method is called. What trap_init() does isn't compatible with the
> >>> allocation performed by the unflatten_and_copy_device_tree() method.
> >>> See the next comment.
> >>>
> >>>> [    0.000000] memblock_alloc_try_nid: 25 bytes align=0x4 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> early_init_dt_alloc_memory_arch+0x40/0x84
> >>>> [    0.000000] memblock_reserve: [0x0000ba4c-0x0000ba64]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_reserve: [0x0096a000-0x00969fff]
> >>>> setup_arch+0x3fc/0x69c
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000ba80-0x0000ba9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000bb00-0x0000bb1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 setup_arch+0x4e0/0x69c
> >>>> [    0.000000] memblock_reserve: [0x0000bb80-0x0000bb9f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 64
> >>>> bytes.
> >>>> [    0.000000] Primary data cache 32kB, 4-way, VIPT, no aliases,
> >>>> linesize 32 bytes
> >>>> [    0.000000] MIPS secondary cache 512kB, 8-way, linesize 128 bytes.
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000c000-0x0000cfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000d000-0x0000dfff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0xffffffff fixrange_init+0x90/0xf4
> >>>> [    0.000000] memblock_reserve: [0x0000e000-0x0000efff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Zone ranges:
> >>>> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [    0.000000]   HighMem  [mem 0x0000000010000000-0x00000000cfffffff]
> >>>> [    0.000000] Movable zone start for each node
> >>>> [    0.000000] Early memory node ranges
> >>>> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000000fffffff]
> >>>> [    0.000000]   node   0: [mem 0x0000000020000000-0x000000004fffffff]
> >>>> [    0.000000]   node   0: [mem 0x0000000090000000-0x00000000cfffffff]
> >>>> [    0.000000] Initmem setup node 0 [mem
> >>>> 0x0000000000000000-0x00000000cfffffff]
> >>>> [    0.000000] memblock_alloc_try_nid: 27262976 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000
> >>>> alloc_node_mem_map.constprop.135+0x6c/0xc8
> >>>> [    0.000000] memblock_reserve: [0x01831400-0x032313ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 32 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [    0.000000] memblock_reserve: [0x0000bc00-0x0000bc1f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 384 bytes align=0x80 nid=0
> >>>> from=0x00000000 max_addr=0x00000000 setup_usemap+0x64/0x98
> >>>> [    0.000000] memblock_reserve: [0x0000bc80-0x0000bdff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] MEMBLOCK configuration:
> >>>> [    0.000000]  memory size = 0x80000000 reserved size = 0x0322f032
> >>>> [    0.000000]  memory.cnt  = 0x3
> >>>> [    0.000000]  memory[0x0]     [0x00000000-0x0fffffff], 0x10000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  memory[0x1]     [0x20000000-0x4fffffff], 0x30000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  memory[0x2]     [0x90000000-0xcfffffff], 0x40000000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved.cnt  = 0xa
> >>>> [    0.000000]  reserved[0x0]   [0x00001000-0x00003aa0], 0x00002aa1
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x1]   [0x00003aa4-0x0000ba64], 0x00007fc1
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x2]   [0x0000ba80-0x0000ba9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x3]   [0x0000bb00-0x0000bb1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x4]   [0x0000bb80-0x0000bb9f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x5]   [0x0000bc00-0x0000bc1f], 0x00000020
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x6]   [0x0000bc80-0x0000bdff], 0x00000180
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x7]   [0x0000c000-0x0000efff], 0x00003000
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x8]   [0x00010000-0x018313cf], 0x018213d0
> >>>> bytes flags: 0x0
> >>>> [    0.000000]  reserved[0x9]   [0x01831400-0x032313ff], 0x01a00000
> >>>> bytes flags: 0x0
> >>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x12c/0x654
> >>>> [    0.000000] memblock_reserve: [0x0000be00-0x0000be1d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 30 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 start_kernel+0x150/0x654
> >>>> [    0.000000] memblock_reserve: [0x0000be80-0x0000be9d]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x3b0/0x884
> >>>> [    0.000000] memblock_reserve: [0x0000f000-0x0000ffff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_embed_first_chunk+0x5a4/0x884
> >>>> [    0.000000] memblock_reserve: [0x03231400-0x032323ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 294912 bytes align=0x1000 nid=-1
> >>>> from=0x01000000 max_addr=0x00000000 pcpu_dfl_fc_alloc+0x24/0x30
> >>>> [    0.000000] memblock_reserve: [0x03233000-0x0327afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_free: [0x03245000-0x03244fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x03257000-0x03256fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x03269000-0x03268fff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] memblock_free: [0x0327b000-0x0327afff]
> >>>> pcpu_embed_first_chunk+0x7a0/0x884
> >>>> [    0.000000] percpu: Embedded 18 pages/cpu s50704 r0 d23024 u73728
> >>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x178/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x0000bf00-0x0000bf03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 4 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1a8/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x0000bf80-0x0000bf83]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x1dc/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232400-0x0323240f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 16 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x20c/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232480-0x0323248f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 128 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_setup_first_chunk+0x558/0x6ec
> >>>> [    0.000000] memblock_reserve: [0x03232500-0x0323257f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 92 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x8c/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232580-0x032325db]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 768 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0xe0/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232600-0x032328ff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 772 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x124/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232900-0x03232c03]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_alloc_try_nid: 192 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 pcpu_alloc_first_chunk+0x158/0x294
> >>>> [    0.000000] memblock_reserve: [0x03232c80-0x03232d3f]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] memblock_free: [0x0000f000-0x0000ffff]
> >>>> pcpu_embed_first_chunk+0x838/0x884
> >>>> [    0.000000] memblock_free: [0x03231400-0x032323ff]
> >>>> pcpu_embed_first_chunk+0x850/0x884
> >>>> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 523776
> >>>> [    0.000000] Kernel command line: console=ttyS0,115200 earlycon
> >>>> [    0.000000] memblock_alloc_try_nid: 131072 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [    0.000000] memblock_reserve: [0x0327b000-0x0329afff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072
> >>>> bytes, linear)
> >>>> [    0.000000] memblock_alloc_try_nid: 65536 bytes align=0x80 nid=-1
> >>>> from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1f8/0x33c
> >>>> [    0.000000] memblock_reserve: [0x0329b000-0x032aafff]
> >>>> memblock_alloc_range_nid+0xf8/0x198
> >>>> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536
> >>>> bytes, linear)
> >>>
> >>>> [    0.000000] memblock_reserve: [0x00000000-0x000003ff]
> >>>> trap_init+0x70/0x4e8
> >>>
> >>> Most likely someplace here the corruption has happened. The log above
> >>> has just reserved a memory for NMI/reset vectors:
> >>> arch/mips/kernel/traps.c: trap_init(void): Line 2373.
> >>>
> >>> But then the board_ebase_setup() pointer is dereferenced and called,
> >>> which has been initialized with bmips_ebase_setup() earlier and which
> >>> overwrites the ebase variable with: 0x80001000 as this is
> >>> CPU_BMIPS5000 CPU. So any further calls of the functions like
> >>> set_handler()/set_except_vector()/set_vi_srs_handler()/etc may cause a
> >>> corruption of the memory above 0x80001000, which as we have discovered
> >>> belongs to fdt and unflattened device tree.
> >>>
> >>>> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> >>>> [    0.000000] Memory: 2045268K/2097152K available (8226K kernel code,
> >>>> 1070K rwdata, 1336K rodata, 13808K init, 260K bss, 51884K reserved, 0K
> >>>> cma-reserved, 1835008K highmem)
> >>>> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> >>>> [    0.000000] rcu: Hierarchical RCU implementation.
> >>>> [    0.000000] rcu:     RCU event tracing is enabled.
> >>>> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> >>>> is 25 jiffies.
> >>>> [    0.000000] NR_IRQS: 256
> >>>
> >>>> [    0.000000] OF: Bad cell count for /rdb
> >>>> [    0.000000] irq_bcm7038_l1: failed to remap intc L1 registers
> >>>> [    0.000000] OF: of_irq_init: children remain, but no parents
> >>>
> >>> So here is the first time we have got the consequence of the corruption
> >>> popped up. Luckily it's just the "Bad cells count" error. We could have
> >>> got much less obvious log here up to getting a crash at some place
> >>> further...
> >>>
> >>>> [    0.000000] random: get_random_bytes called from
> >>>> start_kernel+0x444/0x654 with crng_init=0
> >>>> [    0.000000] sched_clock: 32 bits at 250 Hz, resolution 4000000ns,
> >>>> wraps every 8589934590000000ns
> >>>
> >>>>
> >>>> and with your patch applied which unfortunately did not work we have the
> >>>> following:
> >>>>
> >>>> [...]
> >>>
> >>> So a patch like this shall workaround the corruption:
> >>>
> >>> --- a/arch/mips/bmips/setup.c
> >>> +++ b/arch/mips/bmips/setup.c
> >>> @@ -174,6 +174,8 @@ void __init plat_mem_setup(void)
> >>>  
> >>>  	__dt_setup_arch(dtb);
> >>>  
> >>> +	memblock_reserve(0x0, 0x1000 + 0x100*64);
> >>> +
> >>>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
> >>>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
> >>>  					     q->compatible)) {
> >>
> > 
> >> This patch works, thanks a lot for the troubleshooting and analysis! How
> >> about the following which would be more generic and works as well and
> >> should be more universal since it does not require each architecture to
> >> provide an appropriate call to memblock_reserve():
> > 
> > Hm, are you sure it's working?
> 
> I was until I noticed that I was working on top of a revert of Roman's
> patch sorry about the brain fart here.
> 
> > If so, my analysis hasn't been quite
> > correct. My suggestion was based on the memory initializations,
> > allocations and reservations trace. So here is the sequence of most
> > crucial of them:
> > 1) Memblock initialization:
> >    start_kernel()->setup_arch()->arch_mem_init()->plat_mem_setup()->__dt_setup_arch()
> >    (At this point I suggested to place the exceptions memory
> >     reservation.)
> > 2) Base FDT memory reservation:
> >    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_reserve_self()
> > 3) FDT "reserved-memory" nodes parsing and corresponding memory ranges
> >    reservation:
> >    start_kernel()->setup_arch()->arch_mem_init()->early_init_fdt_scan_reserved_mem()
> > 4) Reserve kernel itself, some critical sections like initrd and
> >    crash-kernel:
> >    start_kernel()->setup_arch()->arch_mem_init()->bootmem_init()...
> > 5) Copy and unflatten the built-into the kernel device tree
> >    (BMIPS-platform code):
> >    start_kernel()->setup_arch()->arch_mem_init()->device_tree_init()
> >    This is the very first time an allocation from the memblock pool
> >    is performed. Since we haven't reserved a memory for the exception
> >    vectors yet, the memblock allocator is free to return that memory
> >    range for any other use. Needless to say if we try to use that memory
> >    later without consulting with memblock, we may and in our case
> >    will get into troubles.
> > 6) Many random early memblock allocations for kernel use before
> >    buddy and sl*b allocators are up and running...
> >    Note if for some fortunate reason the allocations made in 5) didn't
> >    overlap the exceptions memory, here we have much more chances to
> >    do that with obviously fatal consequences of the ranges independent
> >    usage.
> > 7) Trap/exception vectors initialization and !memory reservation! for
> >    them:
> >    start_kernel()->trap_init()
> >    Only at this point we get to reserve the memory for the vectors.
> > 8) Init and run buddy/sl*b allocators:
> >    start_kernel()->mm_init()->...mem_init()...
> > 
> > There are a lot of allocations done in 5) and 6) before the
> > trap_init() is called in 7). You can see that in your log. That's why
> > I have doubts that your patch worked well. Most likely you've
> > forgotten to revert the workaround suggested by me in the previous
> > message. Could you make sure that you didn't and re-test your patch
> > again? If it still works then I might have confused something and it's
> > strange that my patch worked in the first place...
> 

> I would like to submit a fix for 5.12-rc1 and get it back ported into
> 5.11 so we have BMIPS machines boot again, that will be essentially your
> earlier proposed fix.
> 
> BMIPS is the only "legacy" MIPS platform that defines an exception base,
> so while this problem may certainly exist with other platforms, I do
> wonder how likely it is there, though?

Hm, at least we can be sure that the problem exists for each platform,
which conforms to the !cpu_has_mips_r2_r6 condition and which have VEIC/
VINT capability. Those platforms may get out of the first PAGE_SIZE
memory in initializing the exceptions table thus corrupting the memory
possibly allocated for something else. In my case the problem doesn't
manifest itself because the CPU is MIPS32r5.

-Sergey

> 
> > 
> > A food for thoughts for everyone (Thomas, Mark, please join the
> > discussion). What we've got here is a bit bigger problem. AFAICS
> > if bottom-up allocation is enabled (it's our case) memblock_find_in_range_node()
> > performs the allocation above the very first PAGE_SIZE memory chunk
> > (see that method code for details). So we are currently on a safe side
> > for some older MIPS platforms. But the platform with VEIC/VINT may get
> > into the same troubles here if they didn't reserve exception memory
> > early enough before the kernel starts random allocations from
> > memblock. So we either need to provide a generic workaround for that
> > or make sure each platform gets to reserve vectors itself for instance
> > in the plat_mem_setup() method.
> > 
> > -Sergey
> > 
> >>
> >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
> >> index e0352958e2f7..b0a173b500e8 100644
> >> --- a/arch/mips/kernel/traps.c
> >> +++ b/arch/mips/kernel/traps.c
> >> @@ -2367,10 +2367,7 @@ void __init trap_init(void)
> >>
> >>         if (!cpu_has_mips_r2_r6) {
> >>                 ebase = CAC_BASE;
> >> -               ebase_pa = virt_to_phys((void *)ebase);
> >>                 vec_size = 0x400;
> >> -
> >> -               memblock_reserve(ebase_pa, vec_size);
> >>         } else {
> >>                 if (cpu_has_veic || cpu_has_vint)
> >>                         vec_size = 0x200 + VECTORSPACING*64;
> >> @@ -2410,6 +2407,14 @@ void __init trap_init(void)
> >>
> >>         if (board_ebase_setup)
> >>                 board_ebase_setup();
> >> +
> >> +       /* board_ebase_setup() can change the exception base address
> >> +        * reserve it now after changes were made.
> >> +        */
> >> +       if (!cpu_has_mips_r2_r6) {
> >> +               ebase_pa = virt_to_phys((void *)ebase);
> >> +               memblock_reserve(ebase_pa, vec_size);
> >> +       }
> >>         per_cpu_trap_init(true);
> >>         memblock_set_bottom_up(false);
> >> -- 
> >> Florian
> 
> -- 
> Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
@ 2021-03-02 13:54                 ` Serge Semin
  2021-03-02 19:04                 ` Roman Gushchin
  2021-03-02 23:54                 ` Thomas Bogendoerfer
  3 siblings, 0 replies; 49+ messages in thread
From: Serge Semin @ 2021-03-02 13:54 UTC (permalink / raw)
  To: Florian Fainelli, Thomas Bogendoerfer
  Cc: Serge Semin, Mike Rapoport, linux-mips, guro, akpm, paul,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")

> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>

I'd change the order of these two tags... 

> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---

> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.

Thomas, could you join the discussion? If we had a more clever
solution to reserve the exceptions table for each possibly affected
platform this patch could have been omitted.

> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  

> +#define VECTORSPACING 0x100	/* for EI/VI mode */

What about the same macro declared in arch/mips/kernel/traps.c? I'd suggest
to remove it from there and explicitly #include this header file into
the arch/mips/bmips/setup.c file.

-Sergey

> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
  2021-03-02  8:09                 ` Mike Rapoport
  2021-03-02 13:54                 ` Serge Semin
@ 2021-03-02 19:04                 ` Roman Gushchin
  2021-03-02 23:54                 ` Thomas Bogendoerfer
  3 siblings, 0 replies; 49+ messages in thread
From: Roman Gushchin @ 2021-03-02 19:04 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, akpm, paul, Serge Semin,
	Kamal Dasu, Thomas Bogendoerfer, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thank you!

> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.
> 
>  arch/mips/bmips/setup.c       | 22 ++++++++++++++++++++++
>  arch/mips/include/asm/traps.h |  2 ++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 31bcfa4e08b9..0088bd45b892 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -149,6 +149,26 @@ void __init plat_time_init(void)
>  	mips_hpt_frequency = freq;
>  }
>  
> +static void __init bmips_ebase_reserve(void)
> +{
> +	phys_addr_t base, size = VECTORSPACING * 64;
> +
> +	switch (current_cpu_type()) {
> +	default:
> +	case CPU_BMIPS4350:
> +		return;
> +	case CPU_BMIPS3300:
> +	case CPU_BMIPS4380:
> +		base = 0x0400;
> +		break;
> +	case CPU_BMIPS5000:
> +		base = 0x1000;
> +		break;
> +	}
> +
> +	memblock_reserve(base, size);
> +}
> +
>  void __init plat_mem_setup(void)
>  {
>  	void *dtb;
> @@ -169,6 +189,8 @@ void __init plat_mem_setup(void)
>  
>  	__dt_setup_arch(dtb);
>  
> +	bmips_ebase_reserve();
> +
>  	for (q = bmips_quirk_list; q->quirk_fn; q++) {
>  		if (of_flat_dt_is_compatible(of_get_flat_dt_root(),
>  					     q->compatible)) {
> diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
> index 6aa8f126a43d..0ba6bb7f9618 100644
> --- a/arch/mips/include/asm/traps.h
> +++ b/arch/mips/include/asm/traps.h
> @@ -14,6 +14,8 @@
>  #define MIPS_BE_FIXUP	1		/* return to the fixup code */
>  #define MIPS_BE_FATAL	2		/* treat as an unrecoverable error */
>  
> +#define VECTORSPACING 0x100	/* for EI/VI mode */
> +
>  extern void (*board_be_init)(void);
>  extern int (*board_be_handler)(struct pt_regs *regs, int is_fixup);
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
                                   ` (2 preceding siblings ...)
  2021-03-02 19:04                 ` Roman Gushchin
@ 2021-03-02 23:54                 ` Thomas Bogendoerfer
  2021-03-03  1:30                   ` Florian Fainelli
  3 siblings, 1 reply; 49+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-02 23:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> BMIPS is one of the few platforms that do change the exception base.
> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> with kernel_end") we started seeing BMIPS boards fail to boot with the
> built-in FDT being corrupted.
> 
> Before the cited commit, early allocations would be in the [kernel_end,
> RAM_END] range, but after commit they would be within [RAM_START +
> PAGE_SIZE, RAM_END].
> 
> The custom exception base handler that is installed by
> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> memory region allocated by unflatten_and_copy_device_tree() thus
> corrupting the FDT used by the kernel.
> 
> To fix this, we need to perform an early reservation of the custom
> exception that is going to be installed and this needs to happen at
> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> finds a space that is suitable, away from reserved memory.
> 
> Huge thanks to Serget for analysing and proposing a solution to this
> issue.
> 
> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Thomas,
> 
> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> by the stable team for 5.11. We should find a safer way to avoid these
> problems for 5.13 maybe.

let's try to make it in one ago. Hwo about reserving vector space in
cpu_probe, if it's known there and leave the rest to trap_init() ?

Below patch got a quick test on IP22 (real hardware) and malta (qemu).
Not sure, if I got all BMIPS parts correct, so please check/test.
BTW. do we really need to EXPORT_SYMBOL ebase ?

Thomas,


diff --git a/arch/mips/include/asm/setup.h b/arch/mips/include/asm/setup.h
index bb36a400203d..3ef62c23c34f 100644
--- a/arch/mips/include/asm/setup.h
+++ b/arch/mips/include/asm/setup.h
@@ -23,7 +23,7 @@ typedef void (*vi_handler_t)(void);
 extern void *set_vi_handler(int n, vi_handler_t addr);
 
 extern void *set_except_vector(int n, void *addr);
-extern unsigned long ebase;
+extern unsigned long ebase, ebase_size;
 extern unsigned int hwrena;
 extern void per_cpu_trap_init(bool);
 extern void cpu_cache_init(void);
diff --git a/arch/mips/include/asm/traps.h b/arch/mips/include/asm/traps.h
index 6aa8f126a43d..f7d59831aae3 100644
--- a/arch/mips/include/asm/traps.h
+++ b/arch/mips/include/asm/traps.h
@@ -26,6 +26,8 @@ extern void (*board_cache_error_setup)(void);
 extern int register_nmi_notifier(struct notifier_block *nb);
 extern char except_vec_nmi[];
 
+#define VECTORSPACING 0x100	/* for EI/VI mode */
+
 #define nmi_notifier(fn, pri)						\
 ({									\
 	static struct notifier_block fn##_nb = {			\
diff --git a/arch/mips/kernel/cpu-probe.c b/arch/mips/kernel/cpu-probe.c
index 9a89637b4ecf..eef1a4e304da 100644
--- a/arch/mips/kernel/cpu-probe.c
+++ b/arch/mips/kernel/cpu-probe.c
@@ -13,6 +13,7 @@
 #include <linux/smp.h>
 #include <linux/stddef.h>
 #include <linux/export.h>
+#include <linux/memblock.h>
 
 #include <asm/bugs.h>
 #include <asm/cpu.h>
@@ -25,7 +26,9 @@
 #include <asm/watch.h>
 #include <asm/elf.h>
 #include <asm/pgtable-bits.h>
+#include <asm/setup.h>
 #include <asm/spram.h>
+#include <asm/traps.h>
 #include <linux/uaccess.h>
 
 #include "fpu-probe.h"
@@ -1628,6 +1631,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 		c->cputype = CPU_BMIPS3300;
 		__cpu_name[cpu] = "Broadcom BMIPS3300";
 		set_elf_platform(cpu, "bmips3300");
+		ebase = 0x80000400;
+		ebase_size = VECTORSPACING * 64;
 		break;
 	case PRID_IMP_BMIPS43XX: {
 		int rev = c->processor_id & PRID_REV_MASK;
@@ -1638,6 +1643,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 			__cpu_name[cpu] = "Broadcom BMIPS4380";
 			set_elf_platform(cpu, "bmips4380");
 			c->options |= MIPS_CPU_RIXI;
+			ebase = 0x80000400;
+			ebase_size = VECTORSPACING * 64;
 		} else {
 			c->cputype = CPU_BMIPS4350;
 			__cpu_name[cpu] = "Broadcom BMIPS4350";
@@ -1654,6 +1661,8 @@ static inline void cpu_probe_broadcom(struct cpuinfo_mips *c, unsigned int cpu)
 			__cpu_name[cpu] = "Broadcom BMIPS5000";
 		set_elf_platform(cpu, "bmips5000");
 		c->options |= MIPS_CPU_ULRI | MIPS_CPU_RIXI;
+		ebase = 0x80001000;
+		ebase_size = VECTORSPACING * 64;
 		break;
 	}
 }
@@ -2133,6 +2142,13 @@ void cpu_probe(void)
 	if (cpu == 0)
 		__ua_limit = ~((1ull << cpu_vmbits) - 1);
 #endif
+
+	if (ebase_size == 0 && !cpu_has_mips_r2_r6) {
+		ebase = CAC_BASE;
+		ebase_size = 0x400;
+	}
+	if (ebase_size)
+		memblock_reserve(__pa((void *)ebase), ebase_size);
 }
 
 void cpu_report(void)
diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
index b6ef5f7312cf..ad3f2282a65a 100644
--- a/arch/mips/kernel/smp-bmips.c
+++ b/arch/mips/kernel/smp-bmips.c
@@ -528,10 +528,6 @@ static void bmips_set_reset_vec(int cpu, u32 val)
 
 void bmips_ebase_setup(void)
 {
-	unsigned long new_ebase = ebase;
-
-	BUG_ON(ebase != CKSEG0);
-
 	switch (current_cpu_type()) {
 	case CPU_BMIPS4350:
 		/*
@@ -554,7 +550,6 @@ void bmips_ebase_setup(void)
 		 * 0x8000_0000: reset/NMI (initially in kseg1)
 		 * 0x8000_0400: normal vectors
 		 */
-		new_ebase = 0x80000400;
 		bmips_set_reset_vec(0, RESET_FROM_KSEG0);
 		break;
 	case CPU_BMIPS5000:
@@ -562,16 +557,14 @@ void bmips_ebase_setup(void)
 		 * 0x8000_0000: reset/NMI (initially in kseg1)
 		 * 0x8000_1000: normal vectors
 		 */
-		new_ebase = 0x80001000;
 		bmips_set_reset_vec(0, RESET_FROM_KSEG0);
-		write_c0_ebase(new_ebase);
+		write_c0_ebase(ebase);
 		break;
 	default:
 		return;
 	}
 
 	board_nmi_handler_setup = &bmips_nmi_handler_setup;
-	ebase = new_ebase;
 }
 
 asmlinkage void __weak plat_wired_tlb_setup(void)
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index e0352958e2f7..21ba9d04683e 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -2009,10 +2009,10 @@ void __noreturn nmi_exception_handler(struct pt_regs *regs)
 	nmi_exit();
 }
 
-#define VECTORSPACING 0x100	/* for EI/VI mode */
-
 unsigned long ebase;
 EXPORT_SYMBOL_GPL(ebase);
+unsigned long ebase_size;
+EXPORT_SYMBOL_GPL(ebase_size);
 unsigned long exception_handlers[32];
 unsigned long vi_handlers[64];
 
@@ -2360,27 +2360,22 @@ void __init trap_init(void)
 	extern char except_vec3_generic;
 	extern char except_vec4;
 	extern char except_vec3_r4000;
-	unsigned long i, vec_size;
 	phys_addr_t ebase_pa;
+	unsigned long i;
 
 	check_wait();
 
-	if (!cpu_has_mips_r2_r6) {
-		ebase = CAC_BASE;
-		ebase_pa = virt_to_phys((void *)ebase);
-		vec_size = 0x400;
-
-		memblock_reserve(ebase_pa, vec_size);
-	} else {
+	if (cpu_has_mips_r2_r6) {
 		if (cpu_has_veic || cpu_has_vint)
-			vec_size = 0x200 + VECTORSPACING*64;
+			ebase_size = 0x200 + VECTORSPACING*64;
 		else
-			vec_size = PAGE_SIZE;
+			ebase_size = PAGE_SIZE;
 
-		ebase_pa = memblock_phys_alloc(vec_size, 1 << fls(vec_size));
+		ebase_pa = memblock_phys_alloc(ebase_size,
+					       1 << fls(ebase_size));
 		if (!ebase_pa)
 			panic("%s: Failed to allocate %lu bytes align=0x%x\n",
-			      __func__, vec_size, 1 << fls(vec_size));
+			      __func__, ebase_size, 1 << fls(ebase_size));
 
 		/*
 		 * Try to ensure ebase resides in KSeg0 if possible.
@@ -2534,7 +2529,7 @@ void __init trap_init(void)
 	else
 		set_handler(0x080, &except_vec3_generic, 0x80);
 
-	local_flush_icache_range(ebase, ebase + vec_size);
+	local_flush_icache_range(ebase, ebase + ebase_size);
 
 	sort_extable(__start___dbe_table, __stop___dbe_table);
 



-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-02 23:54                 ` Thomas Bogendoerfer
@ 2021-03-03  1:30                   ` Florian Fainelli
  2021-03-03  9:41                     ` Thomas Bogendoerfer
  0 siblings, 1 reply; 49+ messages in thread
From: Florian Fainelli @ 2021-03-03  1:30 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list



On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote:
> On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
>> BMIPS is one of the few platforms that do change the exception base.
>> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
>> with kernel_end") we started seeing BMIPS boards fail to boot with the
>> built-in FDT being corrupted.
>>
>> Before the cited commit, early allocations would be in the [kernel_end,
>> RAM_END] range, but after commit they would be within [RAM_START +
>> PAGE_SIZE, RAM_END].
>>
>> The custom exception base handler that is installed by
>> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
>> memory region allocated by unflatten_and_copy_device_tree() thus
>> corrupting the FDT used by the kernel.
>>
>> To fix this, we need to perform an early reservation of the custom
>> exception that is going to be installed and this needs to happen at
>> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
>> finds a space that is suitable, away from reserved memory.
>>
>> Huge thanks to Serget for analysing and proposing a solution to this
>> issue.
>>
>> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
>> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
>> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
>> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
>> ---
>> Thomas,
>>
>> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
>> by the stable team for 5.11. We should find a safer way to avoid these
>> problems for 5.13 maybe.
> 
> let's try to make it in one ago. Hwo about reserving vector space in
> cpu_probe, if it's known there and leave the rest to trap_init() ?
> 
> Below patch got a quick test on IP22 (real hardware) and malta (qemu).
> Not sure, if I got all BMIPS parts correct, so please check/test.

Works for me here:

Tested-by: Florian Fainelli <f.fainelli@gmail.com>

Thanks!

> BTW. do we really need to EXPORT_SYMBOL ebase ?

It seems like MIPS KVM support can be built as a module which is why
ebase was exported to modules with
878edf014e29de38c49153aba20273fbc9ae31af ("MIPS: KVM: Restore host EBase
from ebase variable")?
-- 
Florian

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03  1:30                   ` Florian Fainelli
@ 2021-03-03  9:41                     ` Thomas Bogendoerfer
  2021-03-03 17:45                       ` Maciej W. Rozycki
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-03  9:41 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: linux-mips, rppt, fancer.lancer, guro, akpm, paul, Serge Semin,
	Kamal Dasu, Yanteng Si, Huacai Chen,
	open list:BROADCOM BMIPS MIPS ARCHITECTURE, open list

On Tue, Mar 02, 2021 at 05:30:18PM -0800, Florian Fainelli wrote:
> 
> 
> On 3/2/2021 3:54 PM, Thomas Bogendoerfer wrote:
> > On Mon, Mar 01, 2021 at 08:19:38PM -0800, Florian Fainelli wrote:
> >> BMIPS is one of the few platforms that do change the exception base.
> >> After commit 2dcb39645441 ("memblock: do not start bottom-up allocations
> >> with kernel_end") we started seeing BMIPS boards fail to boot with the
> >> built-in FDT being corrupted.
> >>
> >> Before the cited commit, early allocations would be in the [kernel_end,
> >> RAM_END] range, but after commit they would be within [RAM_START +
> >> PAGE_SIZE, RAM_END].
> >>
> >> The custom exception base handler that is installed by
> >> bmips_ebase_setup() done for BMIPS5000 CPUs ends-up trampling on the
> >> memory region allocated by unflatten_and_copy_device_tree() thus
> >> corrupting the FDT used by the kernel.
> >>
> >> To fix this, we need to perform an early reservation of the custom
> >> exception that is going to be installed and this needs to happen at
> >> plat_mem_setup() time to ensure that unflatten_and_copy_device_tree()
> >> finds a space that is suitable, away from reserved memory.
> >>
> >> Huge thanks to Serget for analysing and proposing a solution to this
> >> issue.
> >>
> >> Fixes: Fixes: 2dcb39645441 ("memblock: do not start bottom-up allocations with kernel_end")
> >> Debugged-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>
> >> Reported-by: Kamal Dasu <kdasu.kdev@gmail.com>
> >> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> >> ---
> >> Thomas,
> >>
> >> This is intended as a stop-gap solution for 5.12-rc1 and to be picked up
> >> by the stable team for 5.11. We should find a safer way to avoid these
> >> problems for 5.13 maybe.
> > 
> > let's try to make it in one ago. Hwo about reserving vector space in
> > cpu_probe, if it's known there and leave the rest to trap_init() ?
> > 
> > Below patch got a quick test on IP22 (real hardware) and malta (qemu).
> > Not sure, if I got all BMIPS parts correct, so please check/test.
> 
> Works for me here:

perfect, I only forgot about R3k... I'll submit a formal patch submission
later today.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03  9:41                     ` Thomas Bogendoerfer
@ 2021-03-03 17:45                       ` Maciej W. Rozycki
  2021-03-03 18:15                         ` Thomas Bogendoerfer
  0 siblings, 1 reply; 49+ messages in thread
From: Maciej W. Rozycki @ 2021-03-03 17:45 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:

> perfect, I only forgot about R3k... I'll submit a formal patch submission
> later today.

 What's up with the R3k (the usual trigger for me) here?

  Maciej

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03 17:45                       ` Maciej W. Rozycki
@ 2021-03-03 18:15                         ` Thomas Bogendoerfer
  2021-03-03 21:50                           ` Maciej W. Rozycki
  0 siblings, 1 reply; 49+ messages in thread
From: Thomas Bogendoerfer @ 2021-03-03 18:15 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, Mar 03, 2021 at 06:45:52PM +0100, Maciej W. Rozycki wrote:
> On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:
> 
> > perfect, I only forgot about R3k... I'll submit a formal patch submission
> > later today.
> 
>  What's up with the R3k (the usual trigger for me) here?

I've moved r3k cpu_probe() to it's own file and when moving ebase
reservation to cpu_probe(), I need to add it there as well. So just
a mechanic step, I've missed.

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption
  2021-03-03 18:15                         ` Thomas Bogendoerfer
@ 2021-03-03 21:50                           ` Maciej W. Rozycki
  0 siblings, 0 replies; 49+ messages in thread
From: Maciej W. Rozycki @ 2021-03-03 21:50 UTC (permalink / raw)
  To: Thomas Bogendoerfer
  Cc: Florian Fainelli, linux-mips, rppt, fancer.lancer, guro,
	Andrew Morton, paul, Serge Semin, Kamal Dasu, Yanteng Si,
	Huacai Chen, open list:BROADCOM BMIPS MIPS ARCHITECTURE,
	open list

On Wed, 3 Mar 2021, Thomas Bogendoerfer wrote:

> >  What's up with the R3k (the usual trigger for me) here?
> 
> I've moved r3k cpu_probe() to it's own file and when moving ebase
> reservation to cpu_probe(), I need to add it there as well. So just
> a mechanic step, I've missed.

 Ah, right, I didn't notice the split.  Thanks for taking care of it!

  Maciej

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [tip: x86/boot] x86/setup: Consolidate early memory reservations
  2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
                     ` (2 preceding siblings ...)
  2021-02-28  4:18   ` Florian Fainelli
@ 2021-03-23 18:19   ` tip-bot2 for Mike Rapoport
  3 siblings, 0 replies; 49+ messages in thread
From: tip-bot2 for Mike Rapoport @ 2021-03-23 18:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Mike Rapoport, Borislav Petkov, Baoquan He, David Hildenbrand,
	x86, linux-kernel

The following commit has been merged into the x86/boot branch of tip:

Commit-ID:     a799c2bd29d19c565f37fa038b31a0a1d44d0e4d
Gitweb:        https://git.kernel.org/tip/a799c2bd29d19c565f37fa038b31a0a1d44d0e4d
Author:        Mike Rapoport <rppt@linux.ibm.com>
AuthorDate:    Tue, 02 Mar 2021 12:04:05 +02:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Tue, 23 Mar 2021 17:13:17 +01:00

x86/setup: Consolidate early memory reservations

The early reservations of memory areas used by the firmware, bootloader,
kernel text and data are spread over setup_arch(). Moreover, some of them
happen *after* memblock allocations, e.g trim_platform_memory_ranges() and
trim_low_memory_range() are called after reserve_real_mode() that allocates
memory.

There was no corruption of these memory regions because memblock always
allocates memory either from the end of memory (in top-down mode) or above
the kernel image (in bottom-up mode). However, the bottom up mode is going
to be updated to span the entire memory [1] to avoid limitations caused by
KASLR.

Consolidate early memory reservations in a dedicated function to improve
robustness against future changes. Having the early reservations in one
place also makes it clearer what memory must be reserved before memblock
allocations are allowed.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Link: [1] https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
Link: https://lkml.kernel.org/r/20210302100406.22059-2-rppt@kernel.org
---
 arch/x86/kernel/setup.c | 92 +++++++++++++++++++---------------------
 1 file changed, 44 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index d883176..3e3c603 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -645,18 +645,6 @@ static void __init trim_snb_memory(void)
 	}
 }
 
-/*
- * Here we put platform-specific memory range workarounds, i.e.
- * memory known to be corrupt or otherwise in need to be reserved on
- * specific platforms.
- *
- * If this gets used more widely it could use a real dispatch mechanism.
- */
-static void __init trim_platform_memory_ranges(void)
-{
-	trim_snb_memory();
-}
-
 static void __init trim_bios_range(void)
 {
 	/*
@@ -729,7 +717,38 @@ static void __init trim_low_memory_range(void)
 {
 	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
 }
-	
+
+static void __init early_reserve_memory(void)
+{
+	/*
+	 * Reserve the memory occupied by the kernel between _text and
+	 * __end_of_kernel_reserve symbols. Any kernel sections after the
+	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
+	 * separate memblock_reserve() or they will be discarded.
+	 */
+	memblock_reserve(__pa_symbol(_text),
+			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
+
+	/*
+	 * Make sure page 0 is always reserved because on systems with
+	 * L1TF its contents can be leaked to user processes.
+	 */
+	memblock_reserve(0, PAGE_SIZE);
+
+	early_reserve_initrd();
+
+	if (efi_enabled(EFI_BOOT))
+		efi_memblock_x86_reserve_range();
+
+	memblock_x86_reserve_range_setup_data();
+
+	reserve_ibft_region();
+	reserve_bios_regions();
+
+	trim_snb_memory();
+	trim_low_memory_range();
+}
+
 /*
  * Dump out kernel offset information on panic.
  */
@@ -764,29 +783,6 @@ dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
 
 void __init setup_arch(char **cmdline_p)
 {
-	/*
-	 * Reserve the memory occupied by the kernel between _text and
-	 * __end_of_kernel_reserve symbols. Any kernel sections after the
-	 * __end_of_kernel_reserve symbol must be explicitly reserved with a
-	 * separate memblock_reserve() or they will be discarded.
-	 */
-	memblock_reserve(__pa_symbol(_text),
-			 (unsigned long)__end_of_kernel_reserve - (unsigned long)_text);
-
-	/*
-	 * Make sure page 0 is always reserved because on systems with
-	 * L1TF its contents can be leaked to user processes.
-	 */
-	memblock_reserve(0, PAGE_SIZE);
-
-	early_reserve_initrd();
-
-	/*
-	 * At this point everything still needed from the boot loader
-	 * or BIOS or kernel text should be early reserved or marked not
-	 * RAM in e820. All other memory is free game.
-	 */
-
 #ifdef CONFIG_X86_32
 	memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
 
@@ -910,8 +906,18 @@ void __init setup_arch(char **cmdline_p)
 
 	parse_early_param();
 
-	if (efi_enabled(EFI_BOOT))
-		efi_memblock_x86_reserve_range();
+	/*
+	 * Do some memory reservations *before* memory is added to
+	 * memblock, so memblock allocations won't overwrite it.
+	 * Do it after early param, so we could get (unlikely) panic from
+	 * serial.
+	 *
+	 * After this point everything still needed from the boot loader or
+	 * firmware or kernel text should be early reserved or marked not
+	 * RAM in e820. All other memory is free game.
+	 */
+	early_reserve_memory();
+
 #ifdef CONFIG_MEMORY_HOTPLUG
 	/*
 	 * Memory used by the kernel cannot be hot-removed because Linux
@@ -938,9 +944,6 @@ void __init setup_arch(char **cmdline_p)
 
 	x86_report_nx();
 
-	/* after early param, so could get panic from serial */
-	memblock_x86_reserve_range_setup_data();
-
 	if (acpi_mps_check()) {
 #ifdef CONFIG_X86_LOCAL_APIC
 		disable_apic = 1;
@@ -1032,8 +1035,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	find_smp_config();
 
-	reserve_ibft_region();
-
 	early_alloc_pgt_buf();
 
 	/*
@@ -1054,8 +1055,6 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	sev_setup_arch();
 
-	reserve_bios_regions();
-
 	efi_fake_memmap();
 	efi_find_mirror();
 	efi_esrt_init();
@@ -1081,9 +1080,6 @@ void __init setup_arch(char **cmdline_p)
 
 	reserve_real_mode();
 
-	trim_platform_memory_ranges();
-	trim_low_memory_range();
-
 	init_mem_mapping();
 
 	idt_setup_early_pf();

^ permalink raw reply related	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2021-03-23 18:20 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-17 20:12 [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Roman Gushchin
2020-12-17 20:12 ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Roman Gushchin
2020-12-19 14:52   ` Wonhyuk Yang
2020-12-19 14:52     ` Wonhyuk Yang
2020-12-19 17:05     ` Roman Gushchin
2020-12-20  6:49   ` Mike Rapoport
2021-01-22  4:37     ` Thiago Jung Bauermann
2021-01-22  4:37       ` Thiago Jung Bauermann
2021-01-24  2:09       ` Andrew Morton
2021-01-24  2:09         ` Andrew Morton
2021-01-24  7:34         ` Mike Rapoport
2021-01-24  7:34           ` Mike Rapoport
2021-01-26  0:30           ` Thiago Jung Bauermann
2021-01-26  0:30             ` Thiago Jung Bauermann
2021-02-08 23:58           ` Thiago Jung Bauermann
2021-02-08 23:58             ` Thiago Jung Bauermann
2021-02-28  4:18   ` Florian Fainelli
2021-02-28  9:00     ` Mike Rapoport
2021-02-28 18:19       ` Florian Fainelli
2021-02-28 23:08         ` Serge Semin
2021-03-01  3:50           ` Florian Fainelli
2021-03-01  3:50             ` Florian Fainelli
2021-03-01  9:22             ` Serge Semin
2021-03-01  9:22               ` Serge Semin
2021-03-02  4:09               ` Florian Fainelli
2021-03-02  4:09                 ` Florian Fainelli
2021-03-02 13:26                 ` Serge Semin
2021-03-02  4:19               ` [PATCH] MIPS: BMIPS: Reserve exception base to prevent corruption Florian Fainelli
2021-03-02  8:09                 ` Mike Rapoport
2021-03-02 13:54                 ` Serge Semin
2021-03-02 19:04                 ` Roman Gushchin
2021-03-02 23:54                 ` Thomas Bogendoerfer
2021-03-03  1:30                   ` Florian Fainelli
2021-03-03  9:41                     ` Thomas Bogendoerfer
2021-03-03 17:45                       ` Maciej W. Rozycki
2021-03-03 18:15                         ` Thomas Bogendoerfer
2021-03-03 21:50                           ` Maciej W. Rozycki
2021-03-01  9:45             ` [PATCH v2 2/2] memblock: do not start bottom-up allocations with kernel_end Mike Rapoport
2021-03-01  9:45               ` Mike Rapoport
2021-03-02  3:55               ` Roman Gushchin
2021-03-02  3:55                 ` Roman Gushchin
2021-03-02 13:08                 ` Serge Semin
2021-03-23 18:19   ` [tip: x86/boot] x86/setup: Consolidate early memory reservations tip-bot2 for Mike Rapoport
2020-12-20  6:48 ` [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up Mike Rapoport
2020-12-21 17:05   ` Roman Gushchin
2020-12-23  4:06     ` Andrew Morton
2020-12-23 16:35       ` Roman Gushchin
2020-12-23 22:10         ` Mike Rapoport
2020-12-28 19:36           ` Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.